May 24, 201016 yr THis is really weird. I no longer have a button to start a parity check. Did something change when I wasn't looking? I'm running 4.5.1. I have the options for Refresh Stop Spin Up Spin Down Clear Statistics That's it. I checked all the tabs and there's no Parity Check. Help!
May 24, 201016 yr Author Yup, here's a screenshot. http://img405.imageshack.us/img405/5736/parity.jpg
May 24, 201016 yr Yup, here's a screenshot. http://img405.imageshack.us/img405/5736/parity.jpg You have a failed drive. (drive4 - in red) there is no way to do a parity check. You must correct the failed drive. It is ONLY because of the parity drive in combination with all the other drives that you can get to drive4's contents. DO NOT press the button labeled as "restore" it will immediately invalidate parity making re-construction of drive4 difficult/impossible. Post a syslog. Get a SMART report from drive4. IT will let you know if it is alive at all.
May 24, 201016 yr Author Hi Jo, how do I get my syslog and get a SMART report? I suspect I probably have a loose cable. Would it be okay to shut down and recheck the cables?
May 24, 201016 yr When you last rebooted, there were a huge number of transactions in the file-system journals that needed to be replayed before the file-systems could be mounted. Disk4 was just the one that took the longest. (853 seconds = over 14 minutes) This was back on May12th. May 12 19:31:13 Tower kernel: REISERFS (device md3): replayed 328 transactions in 645 seconds May 12 19:31:15 Tower kernel: REISERFS (device md3): Using r5 hash to sort names May 12 19:32:30 Tower kernel: REISERFS (device md6): replayed 348 transactions in 722 seconds May 12 19:32:31 Tower kernel: REISERFS (device md6): Using r5 hash to sort names May 12 19:32:36 Tower kernel: REISERFS (device md2): replayed 292 transactions in 728 seconds May 12 19:32:36 Tower kernel: REISERFS (device md2): Using r5 hash to sort names May 12 19:33:29 Tower kernel: REISERFS (device md1): replayed 450 transactions in 781 seconds May 12 19:33:29 Tower kernel: REISERFS (device md1): Using r5 hash to sort names May 12 19:34:41 Tower kernel: REISERFS (device md5): replayed 843 transactions in 853 seconds May 12 19:34:41 Tower kernel: REISERFS (device md5): Using r5 hash to sort names May 12 19:34:45 Tower kernel: REISERFS (device md4): replayed 732 transactions in 857 seconds May 12 19:34:45 Tower kernel: REISERFS (device md4): Using r5 hash to sort names That is most unusual, how did you stop the array previous to the re-boot? You must have been writing heavily to most of your disks and stopped the array without doing a file-system sync that would have committed all those transactions. I don't see evidence of disk4 being out-of-service though... That confuses me. (and obviously, unRAID's management console might have been confused too) Is the disk4 still showing "red" in your web-management display? Joe L.
May 24, 201016 yr Author Yup, still showing red. I have't rebooted though. I've been copying a bunch of music to that disk today. Perhaps I had a power outage here some time while I was at work....not sure. Should I stop the server and do a reboot? Would that be okay?
May 24, 201016 yr Yup, still showing red. I have't rebooted though. I've been copying a bunch of music to that disk today. Perhaps I had a power outage here some time while I was at work....not sure. Should I stop the server and do a reboot? Would that be okay? before you do, can you run a quick command and post the output: Log in and type: /root/mdcmd status | strings
May 24, 201016 yr Author Here you go. Tower login: root Linux 2.6.31.12-unRAID. root@Tower:~# /root/mdcmd status | strings cmdOper=status cmdResult=ok sbName=/boot/config/super.dat sbVersion=0.92.0 sbCreated=1178340828 sbUpdated=1274620159 sbEvents=217 sbState=0 sbNumDisks=7 sbSynced=1271873894 sbSyncErrs=0 mdVersion=0.95.4 mdState=STARTED mdNumProtected=7 mdNumDisabled=1 mdDisabledDisk=4 mdNumInvalid=1 mdInvalidDisk=4 mdNumMissing=0 mdMissingDisk=0 mdNumNew=0 mdResync=0 diskNumber.0=0 diskName.0= diskSize.0=976762552 diskState.0=7 diskModel.0=WDC WD10EARS-00Y diskSerial.0=WD-WMAV50800824 diskId.0=WDC_WD10EARS-00Y_WD-WMAV50800824 rdevNumber.0=0 rdevStatus.0=DISK_OK rdevName.0=sdf rdevSize.0=976762552 rdevModel.0=WDC WD10EARS-00Y rdevSerial.0=WD-WMAV50800824 rdevId.0=WDC_WD10EARS-00Y_WD-WMAV50800824 rdevNumErrors.0=0 rdevLastIO.0=0 rdevSpinupGroup.0=4 diskNumber.1=1 diskName.1=md1 diskSize.1=976762552 diskState.1=7 diskModel.1=WDC WD10EARS-00Y diskSerial.1=WD-WMAV50750561 diskId.1=WDC_WD10EARS-00Y_WD-WMAV50750561 rdevNumber.1=1 rdevStatus.1=DISK_OK rdevName.1=sdd rdevSize.1=976762552 rdevModel.1=WDC WD10EARS-00Y rdevSerial.1=WD-WMAV50750561 rdevId.1=WDC_WD10EARS-00Y_WD-WMAV50750561 rdevNumErrors.1=0 rdevLastIO.1=0 rdevSpinupGroup.1=32 diskNumber.2=2 diskName.2=md2 diskSize.2=976762552 diskState.2=7 diskModel.2=WDC WD10EARS-00Y diskSerial.2=WD-WMAV50750600 diskId.2=WDC_WD10EARS-00Y_WD-WMAV50750600 rdevNumber.2=2 rdevStatus.2=DISK_OK rdevName.2=sde rdevSize.2=976762552 rdevModel.2=WDC WD10EARS-00Y rdevSerial.2=WD-WMAV50750600 rdevId.2=WDC_WD10EARS-00Y_WD-WMAV50750600 rdevNumErrors.2=0 rdevLastIO.2=0 rdevSpinupGroup.2=1 diskNumber.3=3 diskName.3=md3 diskSize.3=488386552 diskState.3=7 diskModel.3=ST3500630A diskSerial.3=3QG04V5R diskId.3=ST3500630A_3QG04V5R rdevNumber.3=3 rdevStatus.3=DISK_OK rdevName.3=hda rdevSize.3=488386552 rdevModel.3=ST3500630A rdevSerial.3=3QG04V5R rdevId.3=ST3500630A_3QG04V5R rdevNumErrors.3=0 rdevLastIO.3=0 rdevSpinupGroup.3=16 diskNumber.4=4 diskName.4=md4 diskSize.4=488386552 diskState.4=4 diskModel.4=Maxtor 7H500F0 diskSerial.4=H81AX7RH diskId.4=Maxtor_7H500F0_H81AX7RH rdevNumber.4=4 rdevStatus.4=DISK_DSBL rdevName.4=sdb rdevSize.4=488386552 rdevModel.4=Maxtor 7H500F0 rdevSerial.4=H81AX7RH rdevId.4=Maxtor_7H500F0_H81AX7RH rdevNumErrors.4=0 rdevLastIO.4=0 rdevSpinupGroup.4=8 diskNumber.5=5 diskName.5=md5 diskSize.5=312571192 diskState.5=7 diskModel.5=ST3320620AS diskSerial.5=3QF0M8SG diskId.5=ST3320620AS_3QF0M8SG rdevNumber.5=5 rdevStatus.5=DISK_OK rdevName.5=sdc rdevSize.5=312571192 rdevModel.5=ST3320620AS rdevSerial.5=3QF0M8SG rdevId.5=ST3320620AS_3QF0M8SG rdevNumErrors.5=0 rdevLastIO.5=0 rdevSpinupGroup.5=2 diskNumber.6=6 diskName.6=md6 diskSize.6=488386552 diskState.6=7 diskModel.6=ST3500630AS diskSerial.6=6QG07FCL diskId.6=ST3500630AS_6QG07FCL rdevNumber.6=6 rdevStatus.6=DISK_OK rdevName.6=sda rdevSize.6=488386552 rdevModel.6=ST3500630AS rdevSerial.6=6QG07FCL rdevId.6=ST3500630AS_6QG07FCL rdevNumErrors.6=0 rdevLastIO.6=0 rdevSpinupGroup.6=0 root@Tower:~#
May 24, 201016 yr Here you go. Tower login: root Linux 2.6.31.12-unRAID. root@Tower:~# /root/mdcmd status | strings cmdOper=status cmdResult=ok sbName=/boot/config/super.dat sbVersion=0.92.0 sbCreated=1178340828 sbUpdated=1274620159 sbEvents=217 sbState=0 sbNumDisks=7 sbSynced=1271873894 sbSyncErrs=0 mdVersion=0.95.4 mdState=STARTED mdNumProtected=7 mdNumDisabled=1 mdDisabledDisk=4 mdNumInvalid=1 mdInvalidDisk=4 mdNumMissing=0 mdMissingDisk=0 mdNumNew=0 mdResync=0 diskNumber.0=0 diskName.0= diskSize.0=976762552 diskState.0=7 diskModel.0=WDC WD10EARS-00Y diskSerial.0=WD-WMAV50800824 diskId.0=WDC_WD10EARS-00Y_WD-WMAV50800824 rdevNumber.0=0 rdevStatus.0=DISK_OK rdevName.0=sdf rdevSize.0=976762552 rdevModel.0=WDC WD10EARS-00Y rdevSerial.0=WD-WMAV50800824 rdevId.0=WDC_WD10EARS-00Y_WD-WMAV50800824 rdevNumErrors.0=0 rdevLastIO.0=0 rdevSpinupGroup.0=4 diskNumber.1=1 diskName.1=md1 diskSize.1=976762552 diskState.1=7 diskModel.1=WDC WD10EARS-00Y diskSerial.1=WD-WMAV50750561 diskId.1=WDC_WD10EARS-00Y_WD-WMAV50750561 rdevNumber.1=1 rdevStatus.1=DISK_OK rdevName.1=sdd rdevSize.1=976762552 rdevModel.1=WDC WD10EARS-00Y rdevSerial.1=WD-WMAV50750561 rdevId.1=WDC_WD10EARS-00Y_WD-WMAV50750561 rdevNumErrors.1=0 rdevLastIO.1=0 rdevSpinupGroup.1=32 diskNumber.2=2 diskName.2=md2 diskSize.2=976762552 diskState.2=7 diskModel.2=WDC WD10EARS-00Y diskSerial.2=WD-WMAV50750600 diskId.2=WDC_WD10EARS-00Y_WD-WMAV50750600 rdevNumber.2=2 rdevStatus.2=DISK_OK rdevName.2=sde rdevSize.2=976762552 rdevModel.2=WDC WD10EARS-00Y rdevSerial.2=WD-WMAV50750600 rdevId.2=WDC_WD10EARS-00Y_WD-WMAV50750600 rdevNumErrors.2=0 rdevLastIO.2=0 rdevSpinupGroup.2=1 diskNumber.3=3 diskName.3=md3 diskSize.3=488386552 diskState.3=7 diskModel.3=ST3500630A diskSerial.3=3QG04V5R diskId.3=ST3500630A_3QG04V5R rdevNumber.3=3 rdevStatus.3=DISK_OK rdevName.3=hda rdevSize.3=488386552 rdevModel.3=ST3500630A rdevSerial.3=3QG04V5R rdevId.3=ST3500630A_3QG04V5R rdevNumErrors.3=0 rdevLastIO.3=0 rdevSpinupGroup.3=16 diskNumber.4=4 diskName.4=md4 diskSize.4=488386552 diskState.4=4 diskModel.4=Maxtor 7H500F0 diskSerial.4=H81AX7RH diskId.4=Maxtor_7H500F0_H81AX7RH rdevNumber.4=4 rdevStatus.4=DISK_DSBL rdevName.4=sdb rdevSize.4=488386552 rdevModel.4=Maxtor 7H500F0 rdevSerial.4=H81AX7RH rdevId.4=Maxtor_7H500F0_H81AX7RH rdevNumErrors.4=0 rdevLastIO.4=0 rdevSpinupGroup.4=8 diskNumber.5=5 diskName.5=md5 diskSize.5=312571192 diskState.5=7 diskModel.5=ST3320620AS diskSerial.5=3QF0M8SG diskId.5=ST3320620AS_3QF0M8SG rdevNumber.5=5 rdevStatus.5=DISK_OK rdevName.5=sdc rdevSize.5=312571192 rdevModel.5=ST3320620AS rdevSerial.5=3QF0M8SG rdevId.5=ST3320620AS_3QF0M8SG rdevNumErrors.5=0 rdevLastIO.5=0 rdevSpinupGroup.5=2 diskNumber.6=6 diskName.6=md6 diskSize.6=488386552 diskState.6=7 diskModel.6=ST3500630AS diskSerial.6=6QG07FCL diskId.6=ST3500630AS_6QG07FCL rdevNumber.6=6 rdevStatus.6=DISK_OK rdevName.6=sda rdevSize.6=488386552 rdevModel.6=ST3500630AS rdevSerial.6=6QG07FCL rdevId.6=ST3500630AS_6QG07FCL rdevNumErrors.6=0 rdevLastIO.6=0 rdevSpinupGroup.6=0 root@Tower:~# The output supports exactly what you are seeing on the management console. I just don't see in the syslog where it shows as disabled. Here it is initialized: May 12 19:20:24 Tower kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) May 12 19:20:24 Tower kernel: ata4.00: ATA-7: Maxtor 7H500F0, HA431DN0, max UDMA/133 May 12 19:20:24 Tower kernel: ata4.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 0/32) May 12 19:20:24 Tower kernel: ata4.00: configured for UDMA/133 May 12 19:20:24 Tower kernel: ata4: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xf t4 May 12 19:20:24 Tower kernel: ata4: hotplug_status 0x44 May 12 19:20:24 Tower kernel: ata4: hard resetting link May 12 19:20:24 Tower kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) May 12 19:20:24 Tower kernel: ata4.00: configured for UDMA/133 May 12 19:20:24 Tower kernel: ata4: EH complete May 12 19:20:24 Tower kernel: scsi 3:0:0:0: Direct-Access ATA Maxtor 7H500F0 HA43 PQ: 0 ANSI: 5 May 12 19:20:24 Tower kernel: sd 3:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/465 GiB) May 12 19:20:24 Tower kernel: sd 3:0:0:0: [sdb] Write Protect is off May 12 19:20:24 Tower kernel: sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00 May 12 19:20:24 Tower kernel: sd 3:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA May 12 19:20:24 Tower kernel: sdb: sdb1 May 12 19:20:24 Tower kernel: sd 3:0:0:0: [sdb] Attached SCSI disk To me, it all looks pretty normal back on the 12th. I don't see it going off-line... All I can suggest is to stop the array, un-assign disk 4, start the array with it un-assigned (this will cause it to forget the serial number of the existing disk4 so it will consider it as a replacement), stop the array once more, re-assign the disk4, and let it re-construct the contents of disk4 onto itself. Joe L.
May 24, 201016 yr Author What happens if I haven't run a parity check in awhile? I've copied data to the other drives since my last parity check. Will I lose any data on any of the other disks? Will my disk 4 be properly rebuilt if the parity is outdated? What happens to the data I wrote to the drive yesterday? SHould I do a restore instead?
May 24, 201016 yr What happens if I haven't run a parity check in awhile? Will I lose any data on any of the other disks? Will my disk 4 be properly rebuilt? What happens to the data I wrote to the drive yesterday? SHould I do a restore instead? DO NOT PRESS RESTORE!!!
May 24, 201016 yr What happens if I haven't run a parity check in awhile? Will I lose any data on any of the other disks? Will my disk 4 be properly rebuilt? What happens to the data I wrote to the drive yesterday? SHould I do a restore instead? DO NOT PRESS RESTORE!!! A parity check does two things. (1) - It will ensure that parity is valid and that a disk (should one fail) would be recoverable. (2) - It causes each and every sector on each and every disk to be read. Without going into the technical details, this allows the drives internal mechanisms to "remap" marginal sectors. Reading each and every sector is a good way of verifying that all of the disks are good. If you have not done a parity check in a long time - there is a slight risk that parity could be wrong. But if you have run at least one good parity check (not counting the original parity build), there is a good chance parity is fine. Note that the only time parity errors normally creep into the equation is when you have a dirty shutdown (which it looks like you have had). You really want a clean parity check AFTER the last dirty shutdown to have strong confidence in the parity. The restore button has been removed from the GUI in the latest release. It does not restore data, it restores unRAID to a vanilla state in which no array is defined. If you press it, the ability to recover data is lost.
May 24, 201016 yr What happens if I haven't run a parity check in awhile? Will I lose any data on any of the other disks? Will my disk 4 be properly rebuilt? What happens to the data I wrote to the drive yesterday? SHould I do a restore instead? DO NOT PRESS RESTORE!!! DO NOT PRESS RESTORE. It will immediately invalidate your parity and cause unRAID to save a new initial disk configuration based on the assigned and working drives. It will then begin a new initial parity calculation when you nest start the array. The use of the "restore" button, as you thought might help, is one of the few ways that unRAID has had data loss. the button itself is poorly named and, in fact, has been completely removed from the web-interface in the most current 4.5.4 release of unRAID. Think of it as a "Delete Disk Configuration and Invalidate Parity" button. It is only used to store a new initial configuration of disks. In unRAID 4.5.4 it was replaced with a command line command named "initconfig" So... DO NOT PRESS RESTORE or you will lose the contents of what had been on the failed disk. Any files you've written to it while it was failed are correctly represented in the parity disk, so the steps I gave earlier will get the files as written to the failed drive, but only if you let unRAID re-construct the contents onto it. Did we mention you should NOT press the button labeled as "restore" as it would eliminate any chance of re-constructing the failed disk? Joe L.
May 24, 201016 yr Author I guess that's my question. Have I lost data that requires a rebuild of parity? I am a little confused I have to admit. For ex, I copied a few hundred MB to disk 4 yesterday. Did they actually copy there? I can browse my network folders and see the contents of disk 4 and see the files I copied yesterday. So if I can see the data does that mean data was not lost and I can do a Restore with no fear of having lost any data? Or...is it I'm seeing the contents of disk 4 because the parity is rebuilding it when I browse? How come I was able to copy data to disk 4 and not get an error while the copying was going on? I'm soooo confused...I feel LOST... ;-)
May 24, 201016 yr I guess that's my question. Have I lost data that requires a rebuild of parity? I am a little confused I have to admit. For ex, I copied a few hundred MB to disk 4 yesterday. Did they actually copy there?Yes, they were copied. Not to the physical disk4, but to the "simulated disk4" Simulated by parity in combination with the other disks in your array. I can browse my network folders and see the contents of disk 4 and see the files I copied yesterday. That indicates that parity was written to reflect the files you copied there. It is good, it is why you have a parity protected array. So if I can see the data does that mean data was not lostYes, the data is not lost. and I can do a Restore with no fear of having lost any data?You can re-construct the data onto a replacement drive. Just do not use the word "Restore" in this context, as it is TOO easy to get confused with the labeling on the button labeled "restore" You want to re-construct the contents of the old disk onto the new. Just pretend the word "restore" does not exist... Or...is it I'm seeing the contents of disk 4 because the parity is rebuilding it when I browse?Exactly,. the contents are being re-constructed on the fly from parity in combination with all your other data disks. How come I was able to copy data to disk 4 and not get an error while the copying was going on? The disk is simulated for both reading AND writing. The failure indication is on the web-management console with the disk being show with a "red" indicator. I'm soooo confused...I feel LOST... ;-) The final episode of LOST was last night... that is an entirely different problem. Just pretend the "restore" button does not exist. Don't press it, don't even think of pressing it. Remove that word from your vocabulary. Use "re-construct" contents of failed drive onto replacement. That is done by pressing the button marked as "Start" after either replacing the drive, or going through the un-assign/re-assign process I described earlier and then pressing "Start" Did we mention you should not press the button labeled as "restore" since it has NOTHING to do with re-construction of a failed drive and will instead prevent the re-construction. Joe L.
May 24, 201016 yr Author LOL Okay..so I shouldn't press the Restore button I guess lol Ok, clicked the Start Button and a data rebuild is in progress. Gee this is the first time in about 3-4yrs I've ever had a problem with Unraid. I'll be quite impressed if everything comes back 100%! Although that's why I'm using Unraid. THanks for all the help guys! Very much appreciated!
May 24, 201016 yr LOL Okay..so I shouldn't press the Restore button I guess lol I guess you can teach an "old-dog" some new tricks. Ok, clicked the Start Button and a data rebuild is in progress. Gee this is the first time in about 3-4yrs I've ever had a problem with Unraid. I'll be quite impressed if everything comes back 100%! Although that's why I'm using Unraid. THanks for all the help guys! Very much appreciated! Let us know how it turns out. you'll probably want to post a syslog later, just in case any errors are present. Joe L.
May 25, 201016 yr Author Well everything completed fine. I now have my Parity Check button back. Here's a copy of my syslog. syslog.txt
May 26, 201016 yr Glad it worked out OK in the end. Now, please, upgrade to release 4.5.4 That way I won't have to keep reminding you to not click on the button labeled as "restore" since it is actually a "Delete Disk Configuration and Invalidate Parity" button. Joe L.
Archived
This topic is now archived and is closed to further replies.