banterer Posted January 28, 2023 Share Posted January 28, 2023 Ok so this is odd. My parity drive is disabled. Tried running a smart test, and it said completed without error. But when I look in the log I see the following. What's going on here? ATA Error Count: 165 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 165 occurred at disk power-on lifetime: 62237 hours (2593 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 00 ff ff ff 4f 00 16d+13:14:37.016 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 16d+13:14:37.016 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 16d+13:14:37.016 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 16d+13:14:37.015 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 16d+13:14:37.015 READ FPDMA QUEUED Error 164 occurred at disk power-on lifetime: 61901 hours (2579 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 00 ff ff ff 4f 00 2d+13:31:25.905 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 2d+13:31:25.905 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 2d+13:31:25.905 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 2d+13:31:25.905 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 2d+13:31:25.905 READ FPDMA QUEUED Error 163 occurred at disk power-on lifetime: 61847 hours (2576 days + 23 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 00 ff ff ff 4f 00 06:41:38.399 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 06:41:38.396 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 06:41:38.393 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 06:41:38.389 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 06:41:38.387 READ FPDMA QUEUED Error 162 occurred at disk power-on lifetime: 61844 hours (2576 days + 20 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: WP at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 61 00 08 ff ff ff 4f 00 03:58:29.699 WRITE FPDMA QUEUED 60 00 08 ff ff ff 4f 00 03:58:29.698 READ FPDMA QUEUED 60 00 40 ff ff ff 4f 00 03:58:29.698 READ FPDMA QUEUED 60 00 40 ff ff ff 4f 00 03:58:29.698 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 03:58:29.698 READ FPDMA QUEUED Error 161 occurred at disk power-on lifetime: 61844 hours (2576 days + 20 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 00 ff ff ff 4f 00 03:58:18.632 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 03:58:18.628 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 03:58:18.625 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 03:58:18.622 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 03:58:18.619 READ FPDMA QUEUED Quote Link to comment
trurl Posted January 28, 2023 Share Posted January 28, 2023 attach diagnostics to your NEXT post in this thread Quote Link to comment
banterer Posted January 28, 2023 Author Share Posted January 28, 2023 48 minutes ago, trurl said: attach diagnostics to your NEXT post in this thread tower-diagnostics-20230128-2131.zip Quote Link to comment
trurl Posted January 28, 2023 Share Posted January 28, 2023 Diagnostics after reboot, can't see what happened before. Check connections, power and SATA, both ends, including splitters. Run an extended SMART self-test on parity. Quote Link to comment
banterer Posted January 29, 2023 Author Share Posted January 29, 2023 On 1/28/2023 at 9:45 PM, trurl said: Diagnostics after reboot, can't see what happened before. tower-smart-20230129-1932.zip Check connections, power and SATA, both ends, including splitters. Run an extended SMART self-test on parity. Connections checked, smart report and full diags attached. Again, says passed no errors. Took a long time for the extended test. Itower-diagnostics-20230129-2145.zip think I did it twice though! Quote Link to comment
trurl Posted January 29, 2023 Share Posted January 29, 2023 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 62559 - # 2 Extended offline Completed without error 00% 62545 - You can rebuild to the same disk. https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself Quote Link to comment
banterer Posted February 1, 2023 Author Share Posted February 1, 2023 On 1/29/2023 at 10:15 PM, trurl said: Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 62559 - # 2 Extended offline Completed without error 00% 62545 - You can rebuild to the same disk. https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself Ok so parity rebuilding, meanwhile: Unmountable: Unsupported partition layout On *another* disk. Getting fed up with this 1. How do I know what I have lost, given I have no parity disk, and one of the disks (another one) won't mount? 2. What's going on? This chassis worked fine before unraid (please don't flame me, I'm just telling it like it is)... since unraid, my disks have been dropping like flies. nvme, hdd, one by one they are all going! Quote Link to comment
trurl Posted February 1, 2023 Share Posted February 1, 2023 Parity won't help with unmountable anyway, you need to repair the filesystem. Attach diagnostics to your NEXT post in this thread. Quote Link to comment
trurl Posted February 1, 2023 Share Posted February 1, 2023 You might need to take a closer look at how you are powering all these disks. Better if you don't try to put more than 4 disks on a single power cable. If using splitters, MOLEX crimped (not molded) splitters are preferred. Don't bundle data cables. Make sure each cable, power or SATA, has enough slack for the connector to sit squarely on the connection without any tension. You might need to replace SATA cable if it continues to give problems. Quote Link to comment
banterer Posted February 1, 2023 Author Share Posted February 1, 2023 19 hours ago, trurl said: You might need to take a closer look at how you are powering all these disks. Better if you don't try to put more than 4 disks on a single power cable. If using splitters, MOLEX crimped (not molded) splitters are preferred. Don't bundle data cables. Make sure each cable, power or SATA, has enough slack for the connector to sit squarely on the connection without any tension. You might need to replace SATA cable if it continues to give problems. tower-diagnostics-20230201-2012.zip Diags attached. Would parity not let me restore the missing disk, if it was not recoverable, though? I don't really know what was on that disk, although presumably a number of my files! I'll check all the cables again and look at how they're routed etc... but I've lost (due to same fault) 2 x NVME drives as well, and they are both on PCIe cards, so no cables involved! Quote Link to comment
trurl Posted February 1, 2023 Share Posted February 1, 2023 3 minutes ago, banterer said: Would parity not let me restore the missing disk, if it was not recoverable, though? I don't really know what was on that disk, although presumably a number of my files! 19 hours ago, trurl said: Parity won't help with unmountable anyway, you need to repair the filesystem. Check filesystem on disk3 Quote Link to comment
trurl Posted February 1, 2023 Share Posted February 1, 2023 6 minutes ago, banterer said: lost (due to same fault) 2 x NVME drives What exactly do you mean "same fault"? nvme0 is assigned as cache, and cache is mounted. The only other I see is nvme1, and it isn't assigned. Quote Link to comment
banterer Posted February 1, 2023 Author Share Posted February 1, 2023 1 hour ago, trurl said: What exactly do you mean "same fault"? nvme0 is assigned as cache, and cache is mounted. The only other I see is nvme1, and it isn't assigned. As in 'unmountable partition'. nvme0 & nvme1 were originally a BTRFS pool. When that went bad, I switched to XFS for cache instead, on one of them. That then became unmountable. So I've reformatted it (having lost my 'live' appdata etc etc), to start again. Don't know how long it will last this time! And that's just the (pcie mounted) SSDs!) Quote Link to comment
trurl Posted February 2, 2023 Share Posted February 2, 2023 3 hours ago, trurl said: Check filesystem on disk3 Be sure to capture the results so you can post them. Quote Link to comment
banterer Posted February 14, 2023 Author Share Posted February 14, 2023 On 2/2/2023 at 12:14 AM, trurl said: Be sure to capture the results so you can post them. Ok, so parity is currently ok, but I have one disk unmountable, one disabled and 'emulated'. And 'stop' is disabled so I can't stop and start in maintenance mode. Please advise?? Quote Link to comment
banterer Posted February 14, 2023 Author Share Posted February 14, 2023 ..update, now I've lost access to the GUI Really don't know what's going on here. I can still access the terminal - how can I cleanly shut down and reboot from there? I've tried the instructions here, but even the first command `/root/samba stop` isn't recognised. Quote Link to comment
trurl Posted February 15, 2023 Share Posted February 15, 2023 Can you get diagnostics? Quote Link to comment
banterer Posted February 15, 2023 Author Share Posted February 15, 2023 This is all I can get form the GUI I still have access to the terminal. Quote Link to comment
trurl Posted February 15, 2023 Share Posted February 15, 2023 40 minutes ago, trurl said: Can you get diagnostics? from the terminal. Click the link Quote Link to comment
banterer Posted February 15, 2023 Author Share Posted February 15, 2023 23 minutes ago, trurl said: from the terminal. Click the link The link, in the terminal? Are you trying to be funny? Quote Link to comment
trurl Posted February 15, 2023 Share Posted February 15, 2023 The word diagnostics in this post and everywhere it appears in posts on this forum is a link to instructions on how to get diagnostics, including how to get them from the terminal. Quote Link to comment
banterer Posted February 15, 2023 Author Share Posted February 15, 2023 I have read that link. I don't see instructions from the terminal. Maybe it's late, maybe it's my fault. Your curt replies seem to suggest so. Quote Link to comment
banterer Posted February 15, 2023 Author Share Posted February 15, 2023 Ok, found it, running... Quote Link to comment
banterer Posted February 15, 2023 Author Share Posted February 15, 2023 tower-diagnostics-20230215-0203.zip Quote Link to comment
trurl Posted February 15, 2023 Share Posted February 15, 2023 08:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller [1b4b:9172] (rev 11) Marvell controllers are not recommended, but it doesn't look like you are using it. Looks like disks 3, 4 and all others are instead using 02:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 02) No SMART report for disk3, looks like it has disconnected. It was already unmountable when you booted Feb 1. It needs to be repaired. No SMART report for disk4, looks like it has disconnected. Emulated disk4 is mounted and 80% full. It needs to be rebuilt. Log space is completely full. No syslogs in those diagnostics since Feb1, parity rebuild had not finished when you were having problems with those 2 data disks, so not clear parity build would have been good. But it is emulating disk4 so that's a good sign. Might be a good idea to rebuild disk4 to a spare just in case. Not clear why webUI isn't working now. Check connections on disks 3, 4, both ends, including power and splitters. Reboot and post new diagnostics. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.