heisenfig Posted January 12, 2021 Share Posted January 12, 2021 (edited) Hey guys.. looks like i had two drives go dark. Both show "Unmountable: No file system". Hovering over the "x" shows "Device is disabled. Contents Emulated". There is no mount points for disk1 and disk2 and no browse icon to the right. If it's emulated, should i still be able to browse like it's available? I tried the xfs_repair -n on disk1, it shows an issue, so i ran w/o the -n and it ran for at least half an hour, (it didn't say I needed to use -L) finding all kinds of issues and reporting read/write failures. It finally said "done" but still would not mount. It also doesn't even report temperature. Not sure what the best step is next. Should I just disconnect the drive and assign a new drive and see if it rebuilds? Or should I first just unassign it and start the array and see if it shows me the emulated data? I've attached the diagnostics and a couple of screenshots. Let me know if anything else is needed. Much appreciated! nabit-diagnostics-20210112-1603.zip Edited January 12, 2021 by heisenfig Quote Link to comment
JonathanM Posted January 12, 2021 Share Posted January 12, 2021 17 minutes ago, heisenfig said: If it's emulated, should i still be able to browse like it's available? Yes, assuming parity was perfectly valid before the drives went offline. When was your last parity check with zero errors? Quote Link to comment
heisenfig Posted January 12, 2021 Author Share Posted January 12, 2021 umm.. apparently, i've been pretty neglegent. Looks like July 20 was the last one with 0 errors. I thought I had notifications setup to send me emails. I haven't looked at the dashboard in ages until suddenly the array was offline the other day. I'm guessing it's probably going to be just replace the drives and hope for the best. Quote Link to comment
heisenfig Posted January 12, 2021 Author Share Posted January 12, 2021 I just did a xfs_repair on disk2 and it no longer shows unmountable. it still shows 'disabled. contents emulated', but i have the browse icon next to it. Everything on it is under lost+found. I've temporarily disabled parity checks until things are under control. Quote Link to comment
JonathanM Posted January 12, 2021 Share Posted January 12, 2021 Rebuilding to new disks would be best, it's very possible the 2 disks that dropped will have better copies than the rebuild, so it would be helpful to keep those 2 old drives intact. Quote Link to comment
heisenfig Posted January 12, 2021 Author Share Posted January 12, 2021 Yes, was thinking to rebuild to new disks. Good news is that I tried the repair again on fish one and it now shows mountable, but still disabled like disk 2. I will replace drives and let it rebuild one at a time. Then go from there. Thanks Quote Link to comment
trurl Posted January 13, 2021 Share Posted January 13, 2021 1 hour ago, heisenfig said: I thought I had notifications setup to send me emails. Fix that Quote Link to comment
heisenfig Posted January 13, 2021 Author Share Posted January 13, 2021 I did. I set up push bullet Quote Link to comment
heisenfig Posted January 14, 2021 Author Share Posted January 14, 2021 Ran into a hiccup, I think. I swapped out Disk1 with a new pre-cleared drive, but upon starting it up, both Disk1 and Disk 2 went back to "unmountable". It says it's rebuilding drive, but going VERY slow. like 1.5MB/sec. Says it's going to take about 60 days to complete. It's been running 18 hours and only 1.1% done. This is making me think there's another problem, like a cable or sata port issue. Should I just stop the re-sync, swap out the cables on those 2 drives, repair the filesystem again, then start it up again? Quote Link to comment
trurl Posted January 14, 2021 Share Posted January 14, 2021 post new diagnostics Quote Link to comment
heisenfig Posted January 14, 2021 Author Share Posted January 14, 2021 ouch. started getting tons of errors on my usb flash (boot) drive. Tried to browse it and it shows no files. It showed 18,446,744,073,709,551,616 writes on the flash drive. Also, it showed up under Unassigned devices. Tried to stop array, but it couldn't unmount everything. Ended up having to just power it off. It did reboot normally though. I replaced the sata3 cable on disk1, as it looked in bad condition. Kind of a kink near one plug. I let it say off for a bit while I was working on other stuff and just booted it back up, so temp is low right now. I went thru all disks and did the 'xfs_repair -n'. All disks look good except the emulated disk1 and disk2. So i'm repairing emulated disk1 again right now with the -L option. So far, I have not see any more of the fatal i/o and hard reset errors I was seeing before. Could a single bad sata3 cable cause errors across multiple sata3 adapters? I just ordered a batch of new sata3 cables so I could just swap them all out. As I'm typing here, the temps are getting back up to normal and so far no drive errors in log. If it starts throwing errors again or doesn't seem like it's playing nice, I'll pull another diagnostic file. As always, the help here is appreciated. Quote Link to comment
heisenfig Posted January 14, 2021 Author Share Posted January 14, 2021 So far so good after replacing that 1 sata3 cable. The 2 emulated drives are repaired. Array started and all disk shares valid and available. Will have to sort thru lost+found later. Drive 1 is rebuilding to the new drive much faster. Says it will finish in about 21 hours instead of 90 days. And not a single error in the log file. I'm attaching a diagnostic that I ran earlier this morning that shows some of the hard drive soft and hard resets I was seeing. This was before the whole thing went haywire with the flash drive errors that I outlined above. nabit-diagnostics-20210114-0918.zip Quote Link to comment
JonathanM Posted January 14, 2021 Share Posted January 14, 2021 After things are stable, you can investigate the old drives, they may actually contain a better copy of your data. Quote Link to comment
heisenfig Posted January 15, 2021 Author Share Posted January 15, 2021 maybe.. i think the old drives though may be completely dead.. they weren't even reporting a drive temperature anymore. just showed "*". but yeah.. i'll check once i get things back to normal. Thanks! Quote Link to comment
heisenfig Posted January 18, 2021 Author Share Posted January 18, 2021 I got really lucky. Within hours of my Disk 1 rebuilding successfully, Disk 9 failed completely. That one is now just about finished rebuilding and another one is pre-clearing to replace Disk 2. Quote Link to comment
heisenfig Posted January 22, 2021 Author Share Posted January 22, 2021 (edited) Almost solved. but now 2 more disks disabled. Rebuilt disk 1 successfully, then disk 9 got disabled. Rebuilt disk 9 and disk 2 successfully and had everything green again for a few days. I had another drive pre-clearing connected to USB because I wanted to go ahead and replace my last 4TB drive. So I took that drive and connected it directly to a sata port. At the same time, I decided to get rid of the IcyDock that 5 of the drives are in. I believed that maybe part of the trouble i'm having. on the Icy Doc, the 5 drives each had their own pass-thru data port, but shared power from 3 sata power connections. So now, all drives are connected directly, but still divided up on the same connectors that they were on. Upon startng the array, I immediately started seeing read errors again. I think it was one too many drives and causing it to be power starved, causing write errors which leads to the drives getting disabled in the configuration. This time, it disabled Drive 9 (brand new drive from above) and Parity 2 (an old drive). I just read something that made me understand that those drives (and the ones from before) may not dead, but had just been disabled due to a read error. And if the read errors were because it wasn't getting good enough power, then those drives may all still be good. So, my theory now is that my power supply is getting weak. RaidMax RX-1000AE. I thought I had read that it was a 4-rail, but looking at the box, it clearly says "A strong single +12v rail for high-end system heavy load....". I did have some power issues like this before with the IcyDock and getting an additional modular cable from the PSU to the icydock seemed to fix it at the time. Do you think I need to replace the powersupply? It says it supplies 83 Amps on the +12v rail. I have 13 HHDs (2 parity, 11 data), 2 SSD (cache) and 6 fans. The graphics adapter in an nVidia 960. Is that just too much? I've attached a diagnostic. I'm still seeing some soft resets (extra drive not attached). Edit: Though the box says it's a single +12v rail, the specs online show 4 outputs divided up into 36A each. I'm assuming that max per channel with an overall max of 83 Amps. I'm pretty sure I have them split out between the 4 ports enough. I don't think I have more than 4 drives on any one port, but I'll double check. Output Current +3.3V - 24 A +5V - 30 A -12V - 0.5 A +5VSB - 3 A +12V1 - 36 A +12V2 - 36 A +12V3 - 36 A +12V4 - 36 A Edit 2: I just checked what's connected to each. +12V1: 4 hdd drives + 1 hdd fan +12V2: 3 hdd drives + 2 ssd drives +12V3: 3 hdd drives + case fans (2 front, 2 top, 1 back) +12V4: 3 hdd drives I just moved 1 drive from the first slot to the 4th slot to see if that's any better. But then saw a read and a write error to disk 10. Fortunately, it did not get disabled too. So moving power cables definitely makes a difference, but I don't think i'm anywhere near overpowering the PSU on any port or overall per the specs. So that leave power cables going bad, or the PSU going bad? thoughts? nabit-diagnostics-20210122-1401.zip Edited January 22, 2021 by heisenfig Quote Link to comment
heisenfig Posted January 22, 2021 Author Share Posted January 22, 2021 I went ahead and removed the nVidia card and replaced it with an el-cheapo card that didn't require extra PCI power. I don't know how much power that other card was really using, but this one should be at least a little less power. Quote Link to comment
JorgeB Posted January 23, 2021 Share Posted January 23, 2021 You should also replace all those Marvell controllers, they are known to sometimes drop disks without a reason with Unraid. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.