toohai Posted October 1, 2017 Share Posted October 1, 2017 The issue I need to clarify is I have a system with 8 data drives and 2 parity drives, 1 data drive has failed and I'm preparing to replace it but one of the parity drives are also showing read errors during a parity check. So whats the best way to proceed to replace the failed data drive. Will there be any issues with the rebuild if 1 parity drive shows read errors and the other is fine?. The parity drive that has errors as the same batch as the data drive I'm replacing and shows errors every time I run a party check. Most probably it needs to be replaced asap too. Below is what smart is showing for the Parity Drive that's having issues. ATA Error Count: 11 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 11 occurred at disk power-on lifetime: 9338 hours (389 days + 2 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 48 6d 1c e0 Error: UNC 8 sectors at LBA = 0x001c6d48 = 1862984 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 48 6d 1c e0 08 4d+14:26:03.950 READ DMA c8 00 08 40 6d 1c e0 08 4d+14:26:03.949 READ DMA c8 00 08 38 6d 1c e0 08 4d+14:26:03.949 READ DMA c8 00 08 30 6d 1c e0 08 4d+14:26:03.949 READ DMA c8 00 20 10 6d 1c e0 08 4d+14:26:03.949 READ DMA Error 10 occurred at disk power-on lifetime: 9338 hours (389 days + 2 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 58 49 1c e0 Error: UNC at LBA = 0x001c4958 = 1853784 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 f8 48 1c e0 08 4d+14:25:57.826 READ DMA c8 00 00 f8 47 1c e0 08 4d+14:25:56.953 READ DMA c8 00 08 a0 46 1c e0 08 4d+14:25:56.936 READ DMA c8 00 08 98 46 1c e0 08 4d+14:25:56.920 READ DMA Error 9 occurred at disk power-on lifetime: 9338 hours (389 days + 2 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 d8 e8 1b e0 Error: UNC at LBA = 0x001be8d8 = 1829080 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 f8 e7 1b e0 08 4d+14:25:40.675 READ DMA c8 00 00 f8 e6 1b e0 08 4d+14:25:40.646 READ DMA c8 00 00 f8 e5 1b e0 08 4d+14:25:40.636 READ DMA c8 00 00 f8 e4 1b e0 08 4d+14:25:40.632 READ DMA c8 00 00 f8 e3 1b e0 08 4d+14:25:40.628 READ DMA Error 8 occurred at disk power-on lifetime: 9022 hours (375 days + 22 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 88 28 18 1d e0 Error: UNC 136 sectors at LBA = 0x001d1828 = 1906728 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 88 c0 17 1d e0 08 20:23:18.312 READ DMA Error 7 occurred at disk power-on lifetime: 9022 hours (375 days + 22 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 40 d0 25 1d e0 Error: UNC 64 sectors at LBA = 0x001d25d0 = 1910224 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 40 c0 25 1d e0 08 20:13:14.801 READ DMA Quote Link to comment
JorgeB Posted October 1, 2017 Share Posted October 1, 2017 1 hour ago, toohai said: Will there be any issues with the rebuild if 1 parity drive shows read errors and the other is fine? If it's parity1 that has issues and there are errors during the rebuild it should use parity2, but it's something I've never seen before, so not quite sure. You could also disable the known bad parity disk and rebuild using only the good one, but I'd like to see first an attempt to rebuild with both paritys, you can always disable it and rebuild again if needed, but don't forget to grab the diagnostics before rebooting if there are any errors so we can see what happened. Quote Link to comment
toohai Posted October 1, 2017 Author Share Posted October 1, 2017 (edited) 10 minutes ago, johnnie.black said: You could also disable the known bad parity disk and rebuild using only the good one, but I'd like to see first an attempt to rebuild with both paritys, you can always disable it and rebuild again if needed, but don't forget to grab the diagnostics before rebooting if there are any errors so we can see what happened. 1 I can do this but I want to check if Unraid is able to rebuild as it is without writing any corrupted data back into my data disk. Remove the parity would also leave the array prone to failure if another disk shows errors during the rebuild. Edited October 1, 2017 by toohai Quote Link to comment
toohai Posted October 2, 2017 Author Share Posted October 2, 2017 Finally, I'm replacing the failed data disk and running rebuild as it is, I'm assuming unraid is build such that the read data is reconstructed before being supplied for parity rebuild, some read errors are showing up on the second parity drive. but parity rebuild is going forward without issues. I would still like it if someone would confirm that this is how it is. Quote Link to comment
JorgeB Posted October 2, 2017 Share Posted October 2, 2017 30 minutes ago, toohai said: I'm assuming unraid is build such that the read data is reconstructed before being supplied for parity rebuild Not quite sure what you mean here. 31 minutes ago, toohai said: some read errors are showing up on the second parity drive. but parity rebuild is going forward without issues. I would still like it if someone would confirm that this is how it is. If read errors are on the second parity only rebuilt disk will be fine but grab and post your diagnostics when it finishes. Quote Link to comment
toohai Posted October 4, 2017 Author Share Posted October 4, 2017 Disk rebuild finished without further issues, no new error on 2nd parity disk also after rebuild, but will keep a close eye on it. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.