tcharron Posted March 27, 2016 Share Posted March 27, 2016 So, I had a bad disk. I let it run as simulated from the parity for a while. Then, I shut down and brought the system back up. It was at this point that I had an empty array. The /boot/super.dat file is 0 bytes and is dated from almost two weeks ago. WHat I want to do is to use the good disks and the parity drive to rebuild the missing data drive. I know which drive is parity. How do I do this? Quote Link to comment
RobJ Posted March 27, 2016 Share Posted March 27, 2016 You do need to find a valid super.dat with the same drive configuration. It shouldn't matter how old it is, so long as it has the exact drive configuration, including the missing disk. Search those flash backups you made! Quote Link to comment
tcharron Posted March 27, 2016 Author Share Posted March 27, 2016 I don't think it exists, as I upgraded a drive from 3tb to 4tb about 10 days ago -- the day that the super.dat 0 byte file was created. I also replaced a failed drive. I haven't taken a backup since then. I do have a screen print of the array configuration from just before I did all this. Quote Link to comment
RobJ Posted March 27, 2016 Share Posted March 27, 2016 This is not good. You can't build a new config with a missing disk, so there's no way to rebuild the drive. The 2 options I can think of - * Email LimeTech support, explain the situation, include the screen print with the desired array config, and ask if Tom can generate a super.dat from it. * Re-install the bad or failed drive. In general, such drives are still usable, accessible, if not perfect. That would allow you hopefully to copy off all or most of the data to a safe place. Then create your new array, rebuild parity, and move the data back, wherever you want. By the way, you may also have issues with your flash drive, if the super.dat is corrupted. Try doing a ChkDsk (Check Disk, Scandisk, etc) on it. Quote Link to comment
tcharron Posted March 27, 2016 Author Share Posted March 27, 2016 Well, I figured I can get the data off the drives (even the one that failed, as I suspect it was just a bad cable). The problem is that the disk failure occurred a while ago (maybe 10 days or so), so the 'bad' drive will have stale data. I guess I'll email LimeTech and see if there's any workarounds. Super.dat is quite the weak point. It was clearly corrupted 10 days ago, but the array went on blindly just waiting for a shutdown/reboot. It would have been nice if it detected the imminent problem somewhere in that 10 days. Quote Link to comment
John_M Posted March 28, 2016 Share Posted March 28, 2016 This is not as uncommon as you might hope. This thread might help a little: https://lime-technology.com/forum/index.php?topic=47486.0 and it's worth searching for others. Quote Link to comment
JorgeB Posted March 28, 2016 Share Posted March 28, 2016 Those linked instructions should work, and you can try it if LT does not provide a better way, but since you didn't post a syslog so I can check there are a couple of requirements: 1-you have to be on v6.1.5 or above 2-new replacement disk should be precleared/unformatted and has to be the same size of the disk it's replacing (it may work with a larger disk but I didn't test it) Don't use the old disk for this, keep it intact in case you need it. You have an added difficulty since you don't know which one is your parity disk, don't try to identify it by setting all disks as data disks and looking for the unmountable one, I tested this a while ago and in most cases this changed parity enough so that this option would then be unsuccessful. This should work, on the console type: blkid /dev/sdX1 replace X with all your disks except the new replacement and the old failed disk. output for data disks will look something like this: /dev/sdm1: UUID="93d5bddd-5460-4c06-9799-887e731b132e" TYPE="xfs" Disk without any output or no fs "TYPE" output is your parity disk. If you identified your parity disk and want to continue then: -assign all disks, including parity and the new replacement disk, double check parity disk is on the parity slot -very important, check the box "parity is already valid" before starting array -start array, new replacement disk is going to appear as unmountable, it's ok for now -stop array, unassign replacement disk (select "no device") and start array -missing disk is going to be emulated and should now mount and have all data available -if all looks good, stop array, reassign replacement disk and start array to begin rebuild Quote Link to comment
tcharron Posted March 28, 2016 Author Share Posted March 28, 2016 Thanks for taking the time Johnny! I'm running 6.1.7 so this sounds like it could work. The other thread seems like the exact same problem I had. You said You have an added difficulty since you don't know which one is your parity disk, don't try to identify it by setting all disks as data disks and looking for the unmountable one, I tested this a while ago and in most cases this changed parity enough so that this option would then be unsuccessful. I do know which is the parity drive, but I may have broken things anyway... When I looked at the empty array the first time, I momentarily thought I'd reassign drives. I assigned the first data drive to the correct drive. When it didn't show up with the correct FS/size/used/free data (and I noticed this after about 3 seconds), I immediately changed it back to 'unassigned' and came to the forums. Hopefully that lapse in judgement didn't cause anything to be written to the drive. (?) I don't think that I have a spare drive matching the size of the one that was noted as failed (4TB). What I do have is a second unRaid server that has 10TB of free storage. The failed drive was probably only about 25% used. How can I mount that drive in read-only mode so that I can copy it's contents off the drive (and then use the drive as if it were a new one)? It was formatted as XFS. Quote Link to comment
JorgeB Posted March 28, 2016 Share Posted March 28, 2016 If I remember correctly starting the array with the parity drive on a disk slot writes something to it that makes this option fail, but you can still try it when you copy all data from the old disk, there's nothing to lose, just make sure you don't assign any data drive to the parity slot. Also, if the old disk is bad don't try to rebuild, if it works just copy the files you need from the emulated disk. As for mounting the old disk you can just assign it to any data slot on this server and start array, without assigning parity, and copy over LAN or mount it with unassigned devices plugin on the other server. Another option is booting from a Ubunt live usb drive on a windows pc, you will have access to the xfs disk and your windows disks, so you can easily copy all data. Quote Link to comment
tcharron Posted March 28, 2016 Author Share Posted March 28, 2016 If I remember correctly starting the array with the parity drive on a disk slot writes something to it that makes this option fail, but you can still try it when you copy all data from the old disk, there's nothing to lose, just make sure you don't assign any data drive to the parity slot. I don't think I did that. I'm pretty sure that when I momentarily added the drive to the array it was the old data drive #1 in slot #1. Also, if the old disk is bad don't try to rebuild, if it works just copy the files you need from the emulated disk. I'm in the process of copying that data to my second server. However, the old disk is stale. My goal here is to get what I can off of the drive in case my rebuild from parity has some kind of problem. As for mounting the old disk you can just assign it to any data slot on this server and start array, without assigning parity, and copy over LAN or mount it with unassigned devices plugin on the other server. Another option is booting from a Ubunt live usb drive on a windows pc, you will have access to the xfs disk and your windows disks, so you can easily copy all data. I ended up using a command line mount statement to mount it as read-only. I am in the process of copying this to my second unraid server. Even though it's stale, it's better than nothing. Once this drive's contents are copied off of it, am I required to preclear it? I really just want to start the rebuild -- preclearing it seems like a waste of time. Well, maybe a preclear will find a problem, but given my issue had been a cable problem I don't see much value in wasting a day or two doing a preclear -- i'd rather start the rebuild from parity. If it finds an error, i won't be any worse off. But if it doesn't find an error, I will have avoided a slow pre-clear. Quote Link to comment
JorgeB Posted March 28, 2016 Share Posted March 28, 2016 Once this drive's contents are copied off of it, am I required to preclear it? I really just want to start the rebuild -- preclearing it seems like a waste of time. Well, maybe a preclear will find a problem, but given my issue had been a cable problem I don't see much value in wasting a day or two doing a preclear -- i'd rather start the rebuild from parity. If it finds an error, i won't be any worse off. But if it doesn't find an error, I will have avoided a slow pre-clear. There's no need to preclear, you can use a previous formatted disk, and since you're using a known bad disk don't try to rebuild: -assign all disks, including parity and the old bad disk, double check parity disk is on the parity slot -very important, check the box "parity is already valid" before starting array -start array, old disk should mount and show whatever data it has on it -stop array, unassign old disk (select "no device") and start array -missing disk is going to be emulated and should now have all data available, including the one missing from the old disk -if all looks good, start copying data from the emulated disk Quote Link to comment
tcharron Posted March 28, 2016 Author Share Posted March 28, 2016 There's no need to preclear, you can use a previous formatted disk, and since you're using a known bad disk don't try to rebuild: -assign all disks, including parity and the old bad disk, double check parity disk is on the parity slot -very important, check the box "parity is already valid" before starting array -start array, old disk should mount and show whatever data it has on it -stop array, unassign old disk (select "no device") and start array -missing disk is going to be emulated and should now have all data available, including the one missing from the old disk -if all looks good, start copying data from the emulated disk I don't see how unraid will know which of my data disks is the 'bad' one... since it is stale but has a valid filesystem on it. The moment I assign all drives including parity and bring the array up, won't it somehow write to the parity drive (as soon as any process anywhere writes to the array), and therefore destroy the data I need to recover the bad drive? (for clarity, I'm referring to the moment between steps 3 and 4 above). More specifically, if any data gets written to the bad/stale drive, then the parity bits of those corresponding writes will get written to the parity drive... which means I won't be able to restore the drive properly. Maybe I should first start a preclear on the stale drive (and cancel after a minute or two) -- that should ensure that the old/stale filesystem is not recognized by unraid -- which will prevent unraid from trying to write to it, which would destroy the related parity drive data. ?? Quote Link to comment
JorgeB Posted March 28, 2016 Share Posted March 28, 2016 While I would prefer to use a precleared disk, I did try with a formatted one and it worked as well, no data from any of the disks is written to the parity disk when using the "parity is already valid" option on v6.1.5 or newer. Only thing I don't recommend is using a larger disk, I didn't test that enough, and while it also appears to work I remember there were some minor file system issues, fixed by running xfs_repair, but didn't test if all fires were 100% correct with checksums. Quote Link to comment
JorgeB Posted March 28, 2016 Share Posted March 28, 2016 I don't see how unraid will know which of my data disks is the 'bad' one... since it is stale but has a valid filesystem on it. Just one more thing, unRAID will never know which is the bad disk, that's why you have to unassign the correct one, doesn't matter if it's empty or contains data. Quote Link to comment
tcharron Posted March 28, 2016 Author Share Posted March 28, 2016 While I would prefer to use a precleared disk, I did try with a formatted one and it worked as well, no data from any of the disks is written to the parity disk when using the "parity is already valid" option on v6.1.5 or newer. Well, I can't guarantee that I don't have some device on my network which will send a file to unraid when it's up. If that happens, then unraid could try to write to my stale/bad drive. I think I'll do a partial clear to ensure that the drive can't be written to. Quote Link to comment
JorgeB Posted March 28, 2016 Share Posted March 28, 2016 Well, I can't guarantee that I don't have some device on my network which will send a file to unraid when it's up. If that happens, then unraid could try to write to my stale/bad drive. I think I'll do a partial clear to ensure that the drive can't be written to. If you can I think you should as it's safer. Quote Link to comment
tcharron Posted March 29, 2016 Author Share Posted March 29, 2016 fyi, this approach worked -- my server was rebuilt from parity and I didn't lose any data. The only strange/worrisome moment was the two array shutdowns that I had to do -- each of them took about 10 minutes to complete. I think that the delay was related to some kind of network share issue related to the temporary mount I had created on the problem server which was allowing me to access data on my other unRaid server. For some reason, that mount became unusable when the array was taken offline. the web gui became unresponsive, and all filesystem commands were very slow (including lsof, mount, df). In any event, I'm a happy camper this morning. Oh - A scan of the USB key did not find any errors, so I am not sure what caused the 0 byte super.dat file. In any event, I made sure to take a copy of the super.dat file now that it's all working well! Thanks! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.