October 8, 20205 yr Hello All, I have an array of 22 drives and 1 parity drive, all xfs formatted. Simultaneously Disk 2 is showing "Unmountable: no filesystem" and Disk 17 is showing disabled, with contents emulated. I would like to try to restore Disk 2 and also replace Disk 17. I would appreciate any feedback on in which order I should resolve these issues, or what approaches I may take. I have attached a syslog I just captured after stopping and restarting the array. Please let me know any additional information I can provide. syslog.txt
October 8, 20205 yr The syslog is good but the diagnostics would be better to have a proper advice. Please attach it to your next post (Tools/Diagnostics).
October 8, 20205 yr Author Thank you, ChatNoir. I have attached my diagnostics here. tower-diagnostics-20201008-0840.zip
October 8, 20205 yr Community Expert 13 hours ago, assur191 said: I have an array of 22 drives and 1 parity drive With that many you should consider dual parity. 13 hours ago, assur191 said: Disk 2 is showing "Unmountable: no filesystem" Disk3 is the unmountable disk according to those diagnostics. 13 hours ago, assur191 said: Disk 17 is showing disabled Not getting SMART for disk17. Do you have some reason to think it is actually bad and not just a bad connection? Syslog indicated corruption on disk17 also before it became disabled, but since the emulated disk is mounting maybe it is OK. You could check connections and see if we can get SMART for disk17. If it is OK you could rebuild to the same disk, but rebuilding to a new disk and keeping the original is always a good approach also since if the original disk is good you might be able to recover something from it if there is any problems with rebuild. I would be inclined to do the rebuild first so you at least get back to parity protection, then repair disk3 filesystem after that. Also, you should turn off mover logging in Scheduler since those are not anonymized and unless you are trying to diagnose a problem there best to not log those and it makes syslog easier without all that. And I see you have Marvell controllers, those might be the root of your trouble.
October 8, 20205 yr Community Expert 40 minutes ago, trurl said: you have Marvell controllers Yep, two SASLP, you should difinetely get rid of those, it's not clear the disk is OK, you need to reboot an post ne diags, but still they can cause more trouble when there are errors because the driver crashes. Also make sure scheduled parity checks are set to non correct.
October 9, 20205 yr Author Thanks all for your feedback. I rebooted the server and checked connections, and now I'm showing the message: Quote Unraid array errors: 08-10-2020 22:41 Notice [TOWER] - array turned good Array has 0 disks with read errors The drive is now showing "healthy" under SMART, where it was "error" before. Is there any way I can just re-enable the drive without rebuilding, then fix the filesystem on drive 3? I have attached the new diagnostics here. Also, regarding the Marvell controllers, are they any suggestions on what I should replace them with? tower-diagnostics-20201009-1018.zip
October 9, 20205 yr Community Expert 20 minutes ago, assur191 said: The drive is now showing "healthy" under SMART, It's not that healthy, in fact it appears to be failing, you can confirm by running an extended SMART test. 21 minutes ago, assur191 said: then fix the filesystem on drive 3? You can do that now. 21 minutes ago, assur191 said: Also, regarding the Marvell controllers, are they any suggestions on what I should replace them with? Any LSI with a SAS2008/2308/3008/3408 chipset in IT mode, e.g., 9201-8i, 9211-8i, 9207-8i, 9300-8i, 9400-8i, etc and clones, like the Dell H200/H310 and IBM M1015, these latter ones need to be crossflashed.
October 9, 20205 yr Community Expert 37 minutes ago, assur191 said: Is there any way I can just re-enable the drive without rebuilding You should rebuild unless you have some good reason to suspect rebuild will not produce a good result. A disabled disk is out-of-sync and rebuilding will get it back in sync. The alternative is to rebuild parity instead to get the array back in sync, but the disabled disk is the one that is out of sync. See this recent post for more details about this:
October 9, 20205 yr Author Thanks again for the responses. I will go ahead and rebuild disk 17 and repair the filesystem on disk 3. However, I just want to confirm that I should first rebuild 17, then repair 3. Will that order result in the least amount of lost data?
October 10, 20205 yr Community Expert 12 hours ago, assur191 said: Will that order result in the least amount of lost data? I would repair the fs first since that should be quick and make data on that disk available now, then rebuild, but either way you do it shouldn't be more or less risky.
Archived
This topic is now archived and is closed to further replies.