SYMYAY Posted August 31, 2021 Share Posted August 31, 2021 I just recently got three of my drives went offline. And it prompts that "Errors occurred - Check SMART report". When I check the smart report, I didn't see anything out of the ordinary. And also in the "SMART health status" area, it shows as passed. I doubt that all three drives failed at the same time. Not sure what happend here, any idea? Thanks in advance! Below is one of the drive's smart report, I reducted the serial number. smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.107-Unraid] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: HP Product: MB6000JEFND Revision: HPD2 Compliance: SPC-4 User Capacity: 6,001,175,126,016 bytes [6.00 TB] Logical block size: 512 bytes Physical block size: 4096 bytes Rotation Rate: 7200 rpm Form Factor: 3.5 inches Logical Unit id: 0x5000c500834bf0df Serial number: Z4D1M3SA00*********** Device type: disk Transport protocol: SAS (SPL-3) Local Time is: Tue Aug 31 15:15:03 2021 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled Read Cache is: Enabled Writeback Cache is: Disabled === START OF READ SMART DATA SECTION === SMART Health Status: OK Current Drive Temperature: 57 C Drive Trip Temperature: 60 C Manufactured in week 13 of year 2015 Specified cycle count over device lifetime: 10000 Accumulated start-stop cycles: 123 Specified load-unload count over device lifetime: 300000 Accumulated load-unload cycles: 1977 Elements in grown defect list: 0 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 0 0 0 0 325690.146 0 write: 0 0 0 0 0 256955.992 0 verify: 0 0 0 0 0 53703.942 0 Non-medium error count: 22626511 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Completed - 32803 - [- - -] # 2 Background short Completed - 11 - [- - -] # 3 Background short Completed - 3 - [- - -] # 4 Background short Completed - 2 - [- - -] Long (extended) Self-test duration: 36900 seconds [615.0 minutes] Background scan results log Status: waiting until BMS interval timer expires Accumulated power on time, hours:minutes 42973:56 [2578436 minutes] Number of background scans performed: 129, scan progress: 0.00% Number of background medium scans performed: 129 Protocol Specific port log page for SAS SSP relative target port id = 1 generation code = 0 number of phys = 1 phy identifier = 0 attached device type: expander device attached reason: unknown reason: power on negotiated logical link rate: phy enabled; 6 Gbps attached initiator port: ssp=0 stp=0 smp=1 attached target port: ssp=0 stp=0 smp=1 SAS address = 0x5000c500834bf0dd attached SAS address = 0x500262d0cd7a5b20 attached phy identifier = 9 Invalid DWORD count = 159 Running disparity error count = 160 Loss of DWORD synchronization = 30 Phy reset problem = 36 Phy event descriptors: Invalid word count: 159 Running disparity error count: 160 Loss of dword synchronization count: 30 Phy reset problem count: 36 relative target port id = 2 generation code = 0 number of phys = 1 phy identifier = 1 attached device type: no device attached attached reason: unknown reason: unknown negotiated logical link rate: phy enabled; unknown attached initiator port: ssp=0 stp=0 smp=0 attached target port: ssp=0 stp=0 smp=0 SAS address = 0x5000c500834bf0de attached SAS address = 0x0 attached phy identifier = 0 Invalid DWORD count = 0 Running disparity error count = 0 Loss of DWORD synchronization = 0 Phy reset problem = 0 Phy event descriptors: Invalid word count: 0 Running disparity error count: 0 Loss of dword synchronization count: 0 Phy reset problem count: 0 Quote Link to comment
JorgeB Posted August 31, 2021 Share Posted August 31, 2021 47 minutes ago, SYMYAY said: Errors occurred - Check SMART report This is normal with SAS devices, SMART on the GUI only works correctly with SATA, diagnostics before rebooting might give some clues. Quote Link to comment
SYMYAY Posted August 31, 2021 Author Share Posted August 31, 2021 18 minutes ago, JorgeB said: This is normal with SAS devices, SMART on the GUI only works correctly with SATA, diagnostics before rebooting might give some clues. Thanks for your advice. I downloaded the diagnostics, but I have no idea what everything suppose to mean. Do you mind to help me take a peek? Thanks a lot The ones that are currently down are Parity 2 and Disk 3. tower-diagnostics-20210831-1650.zip Quote Link to comment
SYMYAY Posted August 31, 2021 Author Share Posted August 31, 2021 3 minutes ago, SYMYAY said: The ones that are currently down are Parity 2 and Disk 3. Serial numbers: MB6000JEFND_Z4D1M3SA0000R5266KCC MB6000JEFND_Z4D26ZJY0000R541KXTR Quote Link to comment
JorgeB Posted August 31, 2021 Share Posted August 31, 2021 Disks look OK, diags are after rebooting so we can't see what happened, but multiple disk errors are usually a power/connection/controller problem, one thing you should do is update the LSI to latest firmware, since it's on a very old one, then and if it happens again grab diags before rebooting. Quote Link to comment
SYMYAY Posted August 31, 2021 Author Share Posted August 31, 2021 2 hours ago, JorgeB said: Disks look OK, diags are after rebooting so we can't see what happened, but multiple disk errors are usually a power/connection/controller problem, one thing you should do is update the LSI to latest firmware, since it's on a very old one, then and if it happens again grab diags before rebooting. Thank you so much! I'll try to update the LSI firmware right away and check the cable in the case. Really appreciate your help! Viva Unraid! Quote Link to comment
Solution SYMYAY Posted December 19, 2021 Author Solution Share Posted December 19, 2021 Just a quick update for the issue I had. Update firmware were just a temporary fix, I figured out the root cause last week when I heard click noise from the disks. I used a single cable from power supply to drive 6 enterprise 6TB disks, and apparently that cable wasn't powerful enough to drive all of them. Changed to a 1000W modular power supply with tons of sata connectors, and connect only up to 4 drives with each cable solved my problem completely. Hopefully if there's anyone else have same issue, this may help you. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.