LFletcher Posted November 23, 2022 Share Posted November 23, 2022 Hi, Helping a friend who has some issues with his unRAID server. My understanding is that the parity drive had issues first of all and went offline (Parity 2). Then there was issues with a data drive (disk 3 - sdd) - this appears to have udma crc errors. I did SMART checks on the other drives and it appears Disk 5 (sdh) has issues as well, although it hasn't failed the drive (yet). I've got 3 14TB drives, which I had originally planned to replace both the parity drives with and also disk 3. Now that disk 5 also has potential issues I can get another 14TB drive to replace that. My question is, can the data from disk 3 be recovered or is it lost? If it can be recovered whats the correct order to do things in? Do I need to copy the data from disk 5 before that has any more issues? I've attached the diagnostics file as well. Thanks in advance. tower-diagnostics-20221123-1310.zip Quote Link to comment
JorgeB Posted November 23, 2022 Share Posted November 23, 2022 Parity2 also appears to be failing, disk3 looks OK, do you know if something was written to the emulated disk3 after it got disabled? Quote Link to comment
LFletcher Posted November 23, 2022 Author Share Posted November 23, 2022 Parity 2 passes a SMART test, but unRAID isn't happy with the results Disk 3 was unassigned when I booted the server up. When I allocated the drive thats when the udma crc message popup box came up. The drives are on a miniSAS backplane so it's unlikely to be a cable issue causing the crc errors. To the best of my knowledge I don't believe anything has been written to the emulated disk 3. Thanks Quote Link to comment
trurl Posted November 23, 2022 Share Posted November 23, 2022 1 hour ago, LFletcher said: Disk 3 was unassigned when I booted the server up. When I allocated the drive thats when the udma crc message popup box came up. Instead of "allocated" do you really mean you reassigned the disk? If you reassigned the disk and started the array it would have started rebuilding the disk, which seems to agree with your first screenshot since it was showing the drive invalid instead of disabled. Your diagnostics and that screenshot indicate the array is not started currently. Since you still have parity, it should be able to emulate disk3. Unassign disk3, start the array, and post new diagnostics. Quote Link to comment
LFletcher Posted November 23, 2022 Author Share Posted November 23, 2022 27 minutes ago, trurl said: Instead of "allocated" do you really mean you reassigned the disk? If you reassigned the disk and started the array it would have started rebuilding the disk, which seems to agree with your first screenshot since it was showing the drive invalid instead of disabled. Yes, the disk was unassigned when I started the server, so I reassigned it, but I hadn't started the array up until now. I have unassigned disk 3 and started the array. Disk 3 now resides in the unassigned devices section Disk 5 isn't happy though and states its unmountable I've attached the updated diagnostics file. tower-diagnostics-20221123-1623.zip Quote Link to comment
JorgeB Posted November 23, 2022 Share Posted November 23, 2022 Check filesystem on disk5 but xfs_repair will abort if there's a read error. Quote Link to comment
trurl Posted November 23, 2022 Share Posted November 23, 2022 3 hours ago, LFletcher said: Disk 3 now resides in the unassigned devices section The point of that exercise was to see that emulated disk3 mounts, which it does, and has 2.5TB of data, which is what it will rebuild, at least ideally. But rebuild of disk3 requires disk5 to be working well. It could be that you need to rebuild disk5 to a new disk instead, which is more complicated. How well that can work depends on 5 hours ago, JorgeB said: if something was written to the emulated disk3 after it got disabled Do you have backups of anything important and irreplaceable? Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Quote Link to comment
LFletcher Posted November 23, 2022 Author Share Posted November 23, 2022 OK, so I restarted the array in maintenance mode and ran the check with -nv and this was the output I assume I now need to run; -v /dev/md5 Quote Link to comment
LFletcher Posted November 23, 2022 Author Share Posted November 23, 2022 (edited) 1 hour ago, trurl said: Do you have backups of anything important and irreplaceable? Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? It's not my machine (I'm trying to sort it out for a friend), but it's safe to assume there won't be any backups. I know there are photos on the array, but I don't know where they are, or whether they are likely to be on any of the impacted drives. Obviously in an ideal world we'll be able to restore all of the drives without losing any data, but in an ideal world he would have paid more attention when the box started having issues (and given it to me sooner). What options do we have, assuming we have no backups to rely on and I need to try and save as much of the data as possible? All of the assistance I have been given so far is very much appreciated. Edited November 23, 2022 by LFletcher Quote Link to comment
trurl Posted November 24, 2022 Share Posted November 24, 2022 Check filesystem as before but without -n. If it asks for it also -L You don't necessarily need to know which disk anything is on if you have some idea of what user shares contain important data. If you think anything needs to be backed up before proceeding, best to copy the data somewhere off the array so nothing is changed on the array. Quote Link to comment
LFletcher Posted November 24, 2022 Author Share Posted November 24, 2022 I have copied the important stuff onto another (external) drive. Ran the -v command and got this; So then ran the -vL and got this; And also these notifications; Quote Link to comment
JorgeB Posted November 24, 2022 Share Posted November 24, 2022 Disk problem, IMHO best bet is to clone that disk with ddrescue then run xfs_repair again, you can then used the cloned disk with old disk3 since that one looks healthy and re-sync parity. Quote Link to comment
LFletcher Posted November 25, 2022 Author Share Posted November 25, 2022 Is there any way to speed up the ddrescue process? It's been running for about 30 hours and it less than 70% done of pass 1 (ignore the run time on the screen shot, I had to restart it after 24 hours, so this is the second run) Quote Link to comment
JorgeB Posted November 25, 2022 Share Posted November 25, 2022 That will depend on the state of the disk, not much more you can other than wait. Quote Link to comment
LFletcher Posted November 27, 2022 Author Share Posted November 27, 2022 ddrescue has now finished. I then ran the xfs_repair against the cloned drive; I've now run the following commands from the ddrescue faq; printf "unRAID " >~/fill.txt ddrescue -f --fill=- ~/fill.txt /dev/sdd /boot/ddrescue.log find /mnt/disks/Z2GBNVET -type f -exec grep -l "unRAID" '{}' ';' which is still in the process of running. When looking at the data on the mounted cloned drive everything appears to now be in a lost+found directory Shouldn't the cloned drive have a directory structure that mirrored the original disk? I assumed after the check I would have been able to unassign the old bad drive (Disk 5) and assign the cloned drive in it's place, restart the array and this part of the issue would be resolved. I guess with just the lost+folder that is not going to be the case or am I missing something? Quote Link to comment
trurl Posted November 27, 2022 Share Posted November 27, 2022 Filesystem repair will often result in some files in lost+found that it couldn't figure out. Could be a lot of lost+found depending on how bad the corruption is. Have you examined the files/folders in lost+found? Since it is a top level folder it is also a user share you can access on the network. Quote Link to comment
LFletcher Posted November 27, 2022 Author Share Posted November 27, 2022 (edited) I've been having a look. In reality it shouldn't be too difficult to work out what goes where as its either a movie or a tv series/episode. I suppose I didn't expect everything to be in a lost+found directory, but as I've never done this before - you live and learn. I'll wait for the scan of corrupt files to finish, but what is my next step? Is it to unassign Disk 5 (physically remove it from the server), and then assign this cloned disk in its place? If I do that will unRaid recreate the old folder structure and I'll just have to manually move things into the correct place or will I have to do something else? Also what are the next steps to sorting out the issues with both Disk 3 (which we unassigned earlier) and the Parity 2 drive which also has issues still? Thanks Edited November 27, 2022 by LFletcher Quote Link to comment
trurl Posted November 27, 2022 Share Posted November 27, 2022 1 hour ago, LFletcher said: will unRaid recreate the old folder structure There is nothing Unraid can do to improve your disk5 repair results. On 11/24/2022 at 6:12 AM, JorgeB said: best bet is to clone that disk with ddrescue then run xfs_repair again, you can then used the cloned disk with old disk3 since that one looks healthy and re-sync parity. That seems best to me also. New Config with all disks assigned including the cloned disk5, and rebuild parity. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.