December 12, 200619 yr I ran accross this a few times (release 3.1 r2) Unraid online for 48 hours + Access to drives very sluggish at first then unavailable. Web page does not load. Telnet works. Tried the following commands with no response!! Reboot Shutdown Poweroff Previously I've done a cold reset, and the Unraid goes into a 8 hour resync (I have 12 disks so this is unfortunately now accepted).
December 12, 200619 yr Author update: Tom, I emailed you my syslog. Update: Disk two is inaccessible from multiple computers. I ran reiserfsck and there were no errors on that disk and the web interface shows the disk is online with 0 errors. Any ideas what is going on here? Update 2: Started re-sync with 3 sync errors so far. Update 3: Sync completed, 24 "errors" in the end- web interface showing disk is still good, green light, but is in fact still inaccessable!!!!! Did I just loose all my data on that disk? what is the best course of action at this point? Tom, anyone?
December 13, 200619 yr Looking at the syslog you sent, those are tell-tale signs of a corrupted reiserfs on disk2. How did you run the reiserfsck? Proper way is to Stop array and via telnet type: reiserfsck <device> Where <device> is the device assigned to disk2 via the Devices page, e.g., reiserfsck /dev/hdb1 What rate are you getting for the parity-sync?
December 13, 200619 yr Author Looking at the syslog you sent, those are tell-tale signs of a corrupted reiserfs on disk2. How did you run the reiserfsck? Proper way is to Stop array and via telnet type: reiserfsck <device> Where <device> is the device assigned to disk2 via the Devices page, e.g., reiserfsck /dev/hdb1 What rate are you getting for the parity-sync? I ran the fsck just like that with --fix --fixable extention on all disks. One disk had the errors, not disk 2 though. The parity sync ended in the usual time 8 hours, about 10-11kBps. Correction- The re-sync may have been running at the time I issued the command.
December 13, 200619 yr Only help I can offer is, if the disc is looking like it's failing and/or has some corruption, I would highly recommend GRC.com's Spinrite low-level disc test and recovery utility. It works underneath the file system and is non-destructive and has been known to work wonders. It can sometimes recover dodgy blocks/sectors, and mark them as bad, and turn a crashed disc back into a working one. It does a read/write test across the whole surface so is very thourough! HTH! Matt
December 13, 200619 yr Author Only help I can offer is, if the disc is looking like it's failing and/or has some corruption, I would highly recommend GRC.com's Spinrite low-level disc test and recovery utility. It works underneath the file system and is non-destructive and has been known to work wonders. It can sometimes recover dodgy blocks/sectors, and mark them as bad, and turn a crashed disc back into a working one. It does a read/write test across the whole surface so is very thourough! HTH! Matt Matt, Is this something I can install in the Unraid OS? Thanks for the suggestion. Update: read up on it, sounds cool, I think I'll try everything Tom recommends and then try this out. Hopefully it wont get to this point, but I guess I can just replace the Unraid USB with Spinright DOS OS- have it boot up my server box, w/o the parity drive, and then recover disk 2, thats the optimal way or should I pull the drive and do it in anothe pc? Does Spinright need any drivers typically? I'm running the classical unraid config w/ Intel board and Promise sata kit, along with 12 WD drives. Tom, Should I run reiserfsck with -rebuild -sb/-tree on disk 2? or should I take the drive out pop in a new one and have the parity rebuild it (will the sync errors impact the recovery?), or is there yet another option from within the console? I can access all drives, save for the second, on reboot, but the web interfaces always stalls after an hour or two since this has been happening (not the telnet however), and none of the shutdown commands seem to work. I'm getting real nervous, how do I have a better grasp of what is corrupted? Thanks, Alex
December 14, 200619 yr I think you would need to temporarily install a floppy drive in your unRaid server and boot spinrite from grc.com on it. Then, it can be pointed to a given drive and instructed to do its analysis/repair. During the time it is running, you will not be running unRaid at all or have access to the files in it. Be aware that with the large disks we typically have in our servers the analysis/repair will likely take many many many hours. Or... if you suspect a specific drive, you could unplug it from your unRaid array and plug it into a machine that has a floppy. Then, run the spinrite analysis/repair there on that machine. in the mean-time, you can re-start unRaid and it should detect the missing disk and come up in a mode where it will use its Raid functionality to supply the contents of the missing disk (thinking it had failed) until you can put it back in your array after the analysis/repair. Joe L.
December 14, 200619 yr Author Thanks for the response Joe. So, you don't think this is resolvable with reiserfsck? I've been using your script to get a gander at the works via the Telnet menu. The weird thing is everything is reading fine, 0 errors and such on all disks. --fix-fixit did nothing for disk 2. It's true I can access all drives save for disk 2, but the latter reads perfect. The only telling fact is that I have 24 sync errors on re-sync yesterday. I'm also concerned that I cant actually turn the device off and have to basically crash the machine with a powercycle. Spinright can boot of USB correct? If that's the case I can just boot it up w/o floppy and point to all disks, is there a particular scan/correction I should use in spinright? I'd also like to understand how the data corruption happened, maybe it's ram? my drives?
December 14, 200619 yr Spinrite does not do the same thing as reiserfsck. Reiserfsck checks the file system structure. It attempts to fix any corruption in the structures describing how data is stored on the disk. spinrite works at a much lower level. It reads blocks of data from a disk and does not care how they are used. it will do repetitive reads on a failing block while moving the head in and outward to attempt to get a statistical sample enough to deduce what was on a block on the disk that cannot be read otherwise. It then will re-write the block if it can, or use a spare block from the drive's pool of spare blocks to replace the failed one. It works on blocks of data and does not care how the data is used, or how the disk was formatted. This process of reading, and re-reading the data can take many many hours, but in some cases, it can make a unreadable drive usable again. One example at grc.com described a failing 80 Gig drive and how priceless data was recovered after a 22 hour analysis/recovery by spinrite. I do not own spinrite, and when I saw it used it was from a floppy disk. (a number of years ago) from what I just read on grc.com, you can install it on a USB drive and run it from there, so no floppy is needed as I initially thought. Joe L.
December 22, 200619 yr Author Resolved. I ran the spinrite and swapped my ram; did a re sync andnow everything is operational, took me 28 hours! but drive two is back and now I can shutdown as well. Still not sure how the latter is connected to the data corruption , oh well, I wont look a gift horse in the mouth. Thank You all for the help.
Archived
This topic is now archived and is closed to further replies.