Nomad32 Posted January 31 Share Posted January 31 I don't check the dashboard everyday - but, I checked a couple of weeks ago and found that one of my parity drives had an error and was disabled. I started an extended self-test and these are the results. I'm looking for advice here on if I should ignore the error - or, try to see if the drive is covered under warranty for replacement? Thanks in advance:: ATA Error Count: 11733 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 11733 occurred at disk power-on lifetime: 18079 hours (753 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 50 ff ff ff ef 00 19d+15:20:45.234 READ DMA EXT 25 00 08 ff ff ff ef 00 19d+15:20:45.234 READ DMA EXT 35 00 88 ff ff ff ef 00 19d+15:20:45.231 WRITE DMA EXT 35 00 30 ff ff ff ef 00 19d+15:20:45.229 WRITE DMA EXT 35 00 40 ff ff ff ef 00 19d+15:20:45.227 WRITE DMA EXT Error 11732 occurred at disk power-on lifetime: 18079 hours (753 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 c8 ff ff ff ef 00 19d+15:18:55.847 READ DMA EXT 35 00 18 ff ff ff ef 00 19d+15:18:55.841 WRITE DMA EXT 35 00 40 ff ff ff ef 00 19d+15:18:55.837 WRITE DMA EXT 35 00 40 ff ff ff ef 00 19d+15:18:55.833 WRITE DMA EXT 35 00 40 ff ff ff ef 00 19d+15:18:55.829 WRITE DMA EXT Error 11731 occurred at disk power-on lifetime: 18059 hours (752 days + 11 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 f8 ff ff ff ef 00 18d+19:49:36.163 READ DMA EXT 25 00 88 ff ff ff ef 00 18d+19:49:36.161 READ DMA EXT 25 00 40 ff ff ff ef 00 18d+19:49:36.158 READ DMA EXT 25 00 40 ff ff ff ef 00 18d+19:49:36.140 READ DMA EXT 25 00 80 ff ff ff ef 00 18d+19:49:36.139 READ DMA EXT Error 11730 occurred at disk power-on lifetime: 18059 hours (752 days + 11 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 40 ff ff ff ef 00 18d+19:48:32.973 READ DMA EXT 25 00 40 ff ff ff ef 00 18d+19:48:32.970 READ DMA EXT 25 00 40 ff ff ff ef 00 18d+19:48:32.940 READ DMA EXT 35 00 b0 ff ff ff ef 00 18d+19:48:32.939 WRITE DMA EXT ea 00 00 00 00 00 a0 00 18d+19:48:32.873 FLUSH CACHE EXT Error 11729 occurred at disk power-on lifetime: 18059 hours (752 days + 11 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 40 ff ff ff ef 00 18d+19:47:56.650 READ DMA EXT ea 00 00 00 00 00 a0 00 18d+19:47:56.636 FLUSH CACHE EXT 35 00 08 ff ff ff ef 00 18d+19:47:56.636 WRITE DMA EXT 25 00 40 ff ff ff ef 00 18d+19:47:56.615 READ DMA EXT 25 00 08 ff ff ff ef 00 18d+19:47:56.614 READ DMA EXT Quote Link to comment
trurl Posted January 31 Share Posted January 31 1 minute ago, Nomad32 said: I don't check the dashboard everyday You must setup Notifications to alert you immediately by email or other agent as soon as a problem is detected. Don't let one unnoticed problem become many and data loss. That is only a part of the SMART report, and doesn't include the results of the extended test. Attach Diagnostics to your NEXT post in this thread. that will give us that information and a lot more so we can get a more complete understanding of your situation. Quote Link to comment
Nomad32 Posted January 31 Author Share Posted January 31 Sorry - attached is the SMART report and the diagnostics report. tower-smart-20240131-1218.zip tower-diagnostics-20240131-1220.zip Quote Link to comment
trurl Posted January 31 Share Posted January 31 4 minutes ago, Nomad32 said: attached is the SMART report and the diagnostics report. Diagnostics already includes SMART for all attached disks. Quote Link to comment
Solution trurl Posted January 31 Solution Share Posted January 31 ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 5 Reallocated_Sector_Ct PO--CK 001 001 010 NOW 0 (0 6) SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: unknown failure 10% 18606 0 Both of these are a little unusual. The RAW_VALUE on reallocated isn't a simple count like we usually see, but the FAIL column says NOW, so I would say it has failed. Extended self-test also says failed, but it doesn't know why. Usually we get something like read failure. In any case, yes, it should be replaced. You are also having problems with disk4, and it has a rather high Reallocated count. In fact, I would say that one needs replacing as well. Do any of your other disks show SMART ( 👎 ) warnings on the Dashboard page? What do you check on the Dashboard on those rare occasions? Unrelated, your system share has files on the array. And you should update your plugins. Quote Link to comment
Nomad32 Posted January 31 Author Share Posted January 31 OK - thank you. Disk 4 has shown errors for a while now - but, never an issue bad enough to crap out on me. I think I'll pick up a couple of larger drives to replace the parity drives, and then use the good parity drive to replace disk 4. Oh well - thank you so much for letting me know. Quote Link to comment
Nomad32 Posted February 15 Author Share Posted February 15 I ended up buying 2, new, 14TB drives. I started by removing the dead 12TB parity drive, and replacing it with the 14TB drive. Unraid automagically started to rebuild parity on the new parity disk. Once that completed - I followed the steps on the parity swap procedures page:: https://docs.unraid.net/legacy/FAQ/parity-swap-procedure/ For anyone curious - That process basically works through adding a new drive that is larger than your current parity drive. You remove the data drive you're replacing, change the current parity drive to be the data drive, and then assign the new disk as parity. After rebuilding parity, the system will start up the array and rebuild the new data disk (old parity drive converted to data). Quote Link to comment
trurl Posted February 15 Share Posted February 15 6 minutes ago, Nomad32 said: After rebuilding parity Parity swap copies parity, it doesn't rebuild it. There aren't enough disks to rebuild it since there is already another disk to be rebuilt. The array is offline during the parity copy so nothing can change that would make that copy out-of-sync. Quote Link to comment
Nomad32 Posted February 15 Author Share Posted February 15 Understood. And, it copied directly from my other parity drive. I apologize - but, essentially believed (when I typed that), that I was rebuilding parity, by copying it to the new parity drive? Quote Link to comment
trurl Posted February 15 Share Posted February 15 Rebuild means you get the data from the parity calculation by reading all other disks. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.