nickro8303 Posted February 7, 2019 Share Posted February 7, 2019 (edited) My unRAID server has so many issues right now I'm not sure where to start. 1. I rebooted the other day to clear up space on my docker container. After the reboot, I started receiving increasing crc errors from my cache drive. - Replaced the sata cable and moved to a different sata port. Errors still continue. 2. After changing the cables I booted up and one of the array drives shows "Unmountable: Unsupported partition layout". - Stopped the array, unassigned the drive, started the array and checked the emulated disk. - Stopped the array and reassigned the drive and started a Parity-Sync / Data-Rebuild. 3. Now I'm getting another error message saying a different disk has Unable to write to disk and read errors. 4. Also, /var/log is getting full (currently 100 % used) I'm really not sure what's causing all these issues as prior to this everything was running properly except my docker container was full. Any help with these issues would be greatly appreciated. Thanks in advance. tower-diagnostics-20190207-0941.zip Edited February 7, 2019 by nickro8303 Quote Link to comment
JorgeB Posted February 7, 2019 Share Posted February 7, 2019 Start here: Your SSD needs a new SATA cable. Onboard SATA ports 5 and 6 are set to IDE, this can cause serious issues on theses AMD chipsets, change to SATA/AHCI in the bios. After doing this reboot and post new diags. Quote Link to comment
nickro8303 Posted February 7, 2019 Author Share Posted February 7, 2019 1 hour ago, johnnie.black said: Start here: Your SSD needs a new SATA cable. Onboard SATA ports 5 and 6 are set to IDE, this can cause serious issues on theses AMD chipsets, change to SATA/AHCI in the bios. After doing this reboot and post new diags. Thanks for the info, I had no idea this was the case. I will change the settings when I get home from work. Already changed out the sata cable for a new one. Quote Link to comment
JorgeB Posted February 7, 2019 Share Posted February 7, 2019 There are also problems with disk3, it dropped offline, but with all the other errors difficult to analyze, it could be the Marvell controller where it's connected, when you reboot check connections on it, I forgot to say in case you don't know, don't let the rebuild finish as it's rebuilding garbage. Quote Link to comment
nickro8303 Posted February 7, 2019 Author Share Posted February 7, 2019 40 minutes ago, johnnie.black said: I forgot to say in case you don't know, don't let the rebuild finish as it's rebuilding garbage. Wow ok, stopped the rebuild. Quote Link to comment
nickro8303 Posted February 9, 2019 Author Share Posted February 9, 2019 Ok so I did the steps you outlined and disk 3 is back. Disk 4 is being emulated and my parity disk just went to red X. I'm not sure what to do now. I'm pretty sure I heard it clicking before it went to the red X. Posting diagnostics. tower-diagnostics-20190208-1853.zip Quote Link to comment
trurl Posted February 9, 2019 Share Posted February 9, 2019 Your parity isn't showing up in the SMART reports for those diagnostics. Check connections. It looked OK in previous diagnostics. I don't see how you can have a disabled parity when it already has another disk rebuilding. Quote Link to comment
JorgeB Posted February 9, 2019 Share Posted February 9, 2019 5 hours ago, trurl said: I don't see how you can have a disabled parity when it already has another disk rebuilding. Yeah, I wonder if this could be improved, Unraid won't disable two disks with single parity, but it will still disable one even if there's an invalid disk (disk being rebuilt), resulting in two invalid disks. OP: ATA errors on parity and disk1, possibly cable related, replace SATA and power cables on both disks, when done post new diags so we try again after re-enabling parity. Quote Link to comment
nickro8303 Posted February 9, 2019 Author Share Posted February 9, 2019 Could this be due to my motherboard going bad? I just replaced all the sata cables a few months ago and the parity drive is also new. I can't see how all these things could be going wrong at the same time. Quote Link to comment
JorgeB Posted February 9, 2019 Share Posted February 9, 2019 1 minute ago, nickro8303 said: Could this be due to my motherboard going bad? It's possible, replacing the cables one more time would be easier to rule them out, if issues persist it could be the board. Quote Link to comment
nickro8303 Posted February 10, 2019 Author Share Posted February 10, 2019 (edited) I found one new sata cable and replaced it with the one on the parity drive, and it's still showing a red X. I'm pretty sure it's dead as I know I heard a clicking sound the last time I booted up. Disk 4 is now showing up as "Unmountable: No file system". I ordered a set of new sata cables but they won't be here for few days. I'm almost certain this is not due to the cables though. How should I handle replacing the parity drive with Disk 4 in the state it's in? Am I just looking at losing the data on disk 4? tower-diagnostics-20190210-0944.zip Edited February 10, 2019 by nickro8303 Quote Link to comment
trurl Posted February 10, 2019 Share Posted February 10, 2019 Parity disk looks OK in those. I am guessing you will have to do the "invalidslot" command to rebuild disk4 again but wait till @johnnie.black replies. Quote Link to comment
JorgeB Posted February 10, 2019 Share Posted February 10, 2019 3 hours ago, nickro8303 said: I'm pretty sure it's dead as I know I heard a clicking sound the last time I booted up. If parity is really faileing you'll have a problem rebuilding disk4, but no harm in trying: -Tools -> New Config -> Retain current configuration: All -> Apply -All disks should remain assigned but re-assign any missing disk(s) if needed -Important - After checking the assignments leave the browser on that page, the "Main" page. -Open an SSH session/use the console and type: mdcmd set invalidslot 4 29 -Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box, disk4 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check. If there are any issues or errors during the rebuild grab and post new diags. Quote Link to comment
nickro8303 Posted February 11, 2019 Author Share Posted February 11, 2019 20 hours ago, johnnie.black said: If parity is really faileing you'll have a problem rebuilding disk4, but no harm in trying: -Tools -> New Config -> Retain current configuration: All -> Apply -All disks should remain assigned but re-assign any missing disk(s) if needed -Important - After checking the assignments leave the browser on that page, the "Main" page. -Open an SSH session/use the console and type: mdcmd set invalidslot 4 29 -Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box, disk4 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check. If there are any issues or errors during the rebuild grab and post new diags. Ok I followed the directions and started the array after applying the new config and running the command. It's now showing that disk 2 and 4 are unmountable. Parity-Sync/Data rebuild is in progress. Should I let it complete? tower-diagnostics-20190211-1003.zip Quote Link to comment
JorgeB Posted February 11, 2019 Share Posted February 11, 2019 6 minutes ago, nickro8303 said: Should I let it complete? Cancel now, you didn't use the right command, and are also rebuilding, i.e. overwriting disk 2: Feb 11 10:00:09 TOWER kernel: md: recovery thread: recon D2 D4 ... Quote Link to comment
JorgeB Posted February 11, 2019 Share Posted February 11, 2019 (edited) Looks like you typed: mdcmd set invalidslot 4 2 instead of: mdcmd set invalidslot 4 29 Not that, looks like a copy/paste issue, I see on the log: Feb 11 09:59:33 TOWER kernel: mdcmd (32): set invalidslot 4 29 but if I copy/paste from the syslog I get: Feb 11 09:59:33 TOWER kernel: mdcmd (32): set invalidslot 4 2 9 this resulted in disks 4 and 2 being set invalid, instead of disk4 and parity2 as it should be. Edited February 11, 2019 by johnnie.black Quote Link to comment
nickro8303 Posted February 11, 2019 Author Share Posted February 11, 2019 1 hour ago, johnnie.black said: Looks like you typed: mdcmd set invalidslot 4 2 instead of: mdcmd set invalidslot 4 29 Not that, looks like a copy/paste issue, I see on the log: Feb 11 09:59:33 TOWER kernel: mdcmd (32): set invalidslot 4 29 but if I copy/paste from the syslog I get: Feb 11 09:59:33 TOWER kernel: mdcmd (32): set invalidslot 4 2 9 this resulted in disks 4 and 2 being set invalid, instead of disk4 and parity2 as it should be. I see that. That's what I get for copy and pasting instead of typing I guess. Can I go ahead and start the process over again seeing as the parity drive seems to be working again? Quote Link to comment
trurl Posted February 11, 2019 Share Posted February 11, 2019 Could be johnnie has gone to test what happens when you force it to rebuild 2 data disks with only one parity. I don't recall seeing this before and the fact that both disks are unmountable doesn't seem like a good sign. Do you have backups? Quote Link to comment
nickro8303 Posted February 11, 2019 Author Share Posted February 11, 2019 4 minutes ago, trurl said: Could be johnnie has gone to test what happens when you force it to rebuild 2 data disks with only one parity. I don't recall seeing this before and the fact that both disks are unmountable doesn't seem like a good sign. Do you have backups? I do have backups of the important stuff but the majority of my data movies and tv shows can be recreated from physical media. Not really worried about losing that data. Just want to get the server back to stable. Quote Link to comment
JorgeB Posted February 11, 2019 Share Posted February 11, 2019 8 minutes ago, nickro8303 said: Can I go ahead and start the process over again seeing as the parity drive seems to be working again? You can try again, but data on both disks will have some (or a lot) of damage, most likely unfixable by xfs_repair, you might still be able to recover some data with a file recovery utility, like UFS explorer. Quote Link to comment
nickro8303 Posted February 11, 2019 Author Share Posted February 11, 2019 4 minutes ago, johnnie.black said: You can try again, but data on both disks will have some (or a lot) of damage, most likely unfixable by xfs_repair, you might still be able to recover some data with a file recovery utility, like UFS explorer. Ok question is which disk do I rebuild then? 2 or 4. I guess it doesn't really matter at this point. Quote Link to comment
JorgeB Posted February 11, 2019 Share Posted February 11, 2019 17 minutes ago, trurl said: Could be johnnie has gone to test what happens when you force it to rebuild 2 data disks with only one parity. I tried to replicate this and Unraid crashed, it didn't started the rebuild, but looking at the screenshot it did start for the OP, so depending for how log it ran I fear it overwrote part of disk2. 14 minutes ago, nickro8303 said: Ok question is which disk do I rebuild then? 2 or 4. I guess it doesn't really matter at this point. I would try disk4, because after the overwritten part disk2 will be OK, disk4 we don't now how it is. Quote Link to comment
nickro8303 Posted February 11, 2019 Author Share Posted February 11, 2019 I started the process again with disk 4 but 2 is still showing up as "Unmountable: No file system". tower-diagnostics-20190211-1153.zip Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.