FrankS Posted August 28, 2018 Share Posted August 28, 2018 After a power failure, the performance was so bad we could not watch a movie (it would play a couple of frames, then wait 20-30 seconds over and over). I got the syslog and SMART reports for all of the drives (1 parity and 6 data). The SMART report for the parity drive says it will fail in less than 24 hours. One of the data drives has Current_Pending_Sector = 1, and Offline_Uncorrectable = 1. I ran short and long tests on this drive. None of the other drives show any errors (I ran long tests on all of them). The drives show up in the management console with no errors. It says Parity is valid, and shows 8 parity errors found when the last check ran after the power failure. I think I need to replace the parity drive, but I'm concerned about what will happen with the data drive with the pending sector. I'm guessing that I should do the following: 1. Replace parity drive. 2. Bring up Unraid and assign the new drive as the parity drive. 3. Rebuild the parity drive. Will I be able to re-build the parity drive when the data drive has these errors? I am new to all of this. Any help would be appreciated. Thanks, Frank smartrepg_long.txt smartreph_long.txt Syslog.txt Link to comment
trurl Posted August 28, 2018 Share Posted August 28, 2018 Do you have good backups of any important and irreplaceable data? That should be your first priority. I doubt any of us remember much about your version, and most, myself included, never used it at all. But we can leave that discussion for later. Link to comment
Squid Posted August 28, 2018 Share Posted August 28, 2018 34 minutes ago, FrankS said: I think I need to replace the parity drive, With 3800+ reallocated sectors (and it's gone through its pool of extra sectors available to reallocate to -> hence the failing now), its a definite replace 35 minutes ago, FrankS said: Will I be able to re-build the parity drive when the data drive has these errors? Since it has 1 pending sector, parity will rebuild fine. But, if/when you choose to replace the data drive, you *may* have a corrupted file as you're rebuilding the parity drive with info from a drive that may or may not read the proper information from the sector. After rebuilding the data drive, run another parity check to see if parity still matches. BTW, after you handle the drive replacements you should really consider upgrading to 6.5.3+. 3800 reallocated sectors is a definite no-no, and unRaid would have told you about this drive way before it hit that point on any version of unRaid 6.x+ I don't mind a few reallocated, but some users insist that any drive with a single reallocated is worthy of replacement. Link to comment
FrankS Posted August 28, 2018 Author Share Posted August 28, 2018 Thanks for responding. Most of the data is media (movies, TV series etc.), so I don't have backups because of the size. Losing it would suck, but not be the end of the world! I have some other data (family photos) that I need to verify I have copies of, and try to copy if I don't have backups. Link to comment
FrankS Posted August 28, 2018 Author Share Posted August 28, 2018 9 minutes ago, Squid said: With 3800+ reallocated sectors (and it's gone through its pool of extra sectors available to reallocate to -> hence the failing now), its a definite replace Since it has 1 pending sector, parity will rebuild fine. But, if/when you choose to replace the data drive, you *may* have a corrupted file as you're rebuilding the parity drive with info from a drive that may or may not read the proper information from the sector. After rebuilding the data drive, run another parity check to see if parity still matches. BTW, after you handle the drive replacements you should really consider upgrading to 6.5.3+. 3800 reallocated sectors is a definite no-no, and unRaid would have told you about this drive way before it hit that point on any version of unRaid 6.x+ I don't mind a few reallocated, but some users insist that any drive with a single reallocated is worthy of replacement. Thank you!. I was worried that the rebuild would fail, and I would lose everything. I guess could handle losing a file or two. I knew I needed to replace the parity drive, and assumed the same for the data drive and had already ordered replacements. I was just unclear if I needed to do something special because of the errors on the data drive. Also, do I need to preclear the new drives? I've read through the documentation and forum posts and am confused about this (both if I need to do it, and how to do it). I will definitely look into upgrading to unRaid 6. It looks like there have been many useful changes. Link to comment
Squid Posted August 28, 2018 Share Posted August 28, 2018 Just now, FrankS said: Also, do I need to preclear the new drives? It certainly doesn't hurt. But in my mind, a rebuild of parity/data drive, followed by a non-correcting parity check (not sure if 4.x allows you to do a non-correcting though) and then checking SMART is effectively the same thing as a single pass of preclear, and saves you time/power. Link to comment
FrankS Posted August 28, 2018 Author Share Posted August 28, 2018 2 minutes ago, Squid said: It certainly doesn't hurt. But in my mind, a rebuild of parity/data drive, followed by a non-correcting parity check (not sure if 4.x allows you to do a non-correcting though) and then checking SMART is effectively the same thing as a single pass of preclear, and saves you time/power. Thanks again. But what about the data drive. Link to comment
Squid Posted August 28, 2018 Share Posted August 28, 2018 Same thing. IMO rebuild / parity check equals a preclear pass. But, it's all up to you. Preclearing a drive is definitely a plus. Link to comment
FrankS Posted August 28, 2018 Author Share Posted August 28, 2018 5 minutes ago, Squid said: Same thing. IMO rebuild / parity check equals a preclear pass. But, it's all up to you. Preclearing a drive is definitely a plus. Thanks. I'll see if I can figure out how to do it for the data drive. Link to comment
trurl Posted August 28, 2018 Share Posted August 28, 2018 45 minutes ago, FrankS said: Thanks. I'll see if I can figure out how to do it for the data drive. For your very old version you would have to go back to the original Joe L. script. You can ignore any of the more recent discussions of preclear. https://lime-technology.com/forums/topic/2732-preclear_disksh-a-new-utility-to-burn-in-and-pre-clear-disks-for-quick-add/ I tend to agree with Squid as far as preclearing a replacement disk. Rebuild followed by parity check is a pretty good test, and arguably is better than leaving a drive with known issues in the array while you do a lengthy preclear. Preclearing makes more sense for testing drives that will be added to a new slot in the array. Link to comment
FrankS Posted August 28, 2018 Author Share Posted August 28, 2018 1 hour ago, trurl said: For your very old version you would have to go back to the original Joe L. script. You can ignore any of the more recent discussions of preclear. https://lime-technology.com/forums/topic/2732-preclear_disksh-a-new-utility-to-burn-in-and-pre-clear-disks-for-quick-add/ I tend to agree with Squid as far as preclearing a replacement disk. Rebuild followed by parity check is a pretty good test, and arguably is better than leaving a drive with known issues in the array while you do a lengthy preclear. Preclearing makes more sense for testing drives that will be added to a new slot in the array. That makes sense. Thanks. Link to comment
trurl Posted August 29, 2018 Share Posted August 29, 2018 Let us know how it goes and if you need help upgrading. Link to comment
FrankS Posted August 29, 2018 Author Share Posted August 29, 2018 OK, so I did the following: Stopped the array, and shut down the system. Replaced the 2TB parity drive. Restarted unRaid. The management console had a blue orb next to the parity drive and green for the data drives. I checked the devices page and everything was correct. I checked the box under the start button, and clicked start. The management console became unresponsive. I used the /root/mdcmd status command to see the progress and it shows the following: mdState=STARTED mdNumProtected=7 mdNumDisabled=0 mdNumInvalid=1 mdInvalidDisk=0 mdResync=1953514552 mdResyncPos=512 mdResyncPrcnt=0.0 mdResyncFinish=591840.2 mdResyncSpeed=0 and for the new disk: diskState.0=6 rdevStatus.0=DISK_INVALID rdevNumErrors.0=0 rdevLastIO.0=1535561073 I've run this several times, and the only thing changing is the mdResyncFinish value. Now, I checked the syslog (attached as a zip file) and it is logging these lines over and over: Aug 29 12:55:50 Tower kernel: ata1: failed to resume link (SControl FFFFFFFF) Aug 29 12:55:51 Tower kernel: ata2: failed to resume link (SControl FFFFFFFF) Aug 29 12:55:51 Tower kernel: ata4: failed to resume link (SControl FFFFFFFF) What should I do? syslog_newparity.zip Link to comment
trurl Posted August 29, 2018 Share Posted August 29, 2018 Obviously it is having trouble communicating with at least 3 disks. Hardware problem of some kind. Bad connections, cables, ports, controllers, PSU? Bad connections (all SATA and power, both ends) are a good place to start, especially in situations where you have been in the case messing around. SATA cables should be unbundled so they don't crosstalk, the connector should sit squarely on the connection with no strain that might cause it to shift or come loose. Link to comment
trurl Posted August 29, 2018 Share Posted August 29, 2018 Is there a controller card involved? If so might try reseating that as well. Link to comment
FrankS Posted August 29, 2018 Author Share Posted August 29, 2018 Thanks trurl. The 3 drives getting errors are all on a PCIe controller card. I tried re-seating the card, and checked all of the cables, and no change. The card is a Rosewill RC-218 4-port PCIe x4 SATA, and I'm not sure what to replace it with (it is no longer available). I'm looking at the unRaid Hardware Compatibility page to see if I can figure out what to get. Link to comment
trurl Posted August 30, 2018 Share Posted August 30, 2018 4 hours ago, FrankS said: The card is a Rosewill RC-218 4-port PCIe x4 SATA That same card has been working well for me for many years, but I suppose nothing lasts forever. I especially like the ability to put some ports as eSATA. I use that feature in V6 with Unassigned Devices to create backup disks for offsite storage. Not sure what to suggest for a replacement these days though. Link to comment
FrankS Posted September 3, 2018 Author Share Posted September 3, 2018 Thanks everyone for your help. Once I got past the hardware problems, I got the server back up. I performed the following steps: Replaced the parity drive and let unRAID rebuild it. Ran a parity check. Replaced the data drive and let unRAID rebuild it. Ran a parity check. The system is running fine with no errors. Link to comment
trurl Posted September 4, 2018 Share Posted September 4, 2018 Here is the wiki on upgrading: https://wiki.unraid.net/Upgrading_to_UnRAID_v6 Link to comment
FrankS Posted September 7, 2018 Author Share Posted September 7, 2018 Thanks. After reading that, I do have some questions on the upgrade. I'll start another thread. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.