February 26Feb 26 Hello, hoping to get some guidance on what to do from here.I had a ZFS z2 pool of 6 disks that ran fine for quite awhile. One drive died with smart errors and when I went to replace it, another died with smart errors. I was in the middle of a move, so I left it for a few months. Now I’m back at it again. When I went to check, another drive just wasn’t showing, and prompted disk has size zero. I figured there was no way that many died in that short of time so I went and got a replacement HBA card. Issues persisted. So I bought another 2 drives but another drive died with the size zero. So of the original 6 disks, 2 died with smart errors and 2 are showing size zero.There is a backplane, but they’re separated into 3. So if one is bad, the row should be. But the same disks that work will work in all 3 rows. And the ones that don’t, won’t. I’m clueless here. Any ideas? tower-diagnostics-20260226-1441.zip
February 26Feb 26 Community Expert 13 minutes ago, cjkuhlenbeck said:2 are showing size zero.Where are you seeing that?Also, SSDs in the array cannot be trimmed. Unraid V7 doesn't require an array.
February 26Feb 26 Community Expert 1 minute ago, trurl said:SSDs in the array cannot be trimmed. Unraid V7 doesn't require an array.It would probably make more sense to put those in another multi-disk pool.
February 26Feb 26 Author 6 minutes ago, trurl said:Where are you seeing that?Also, SSDs in the array cannot be trimmed. Unraid V7 doesn't require an array.Hey, thanks for the fast reply. Regarding the SSDs, this was setup before v7. I did see that mentioned in the updates and I’m excited to try it out. But I wanted to get the server up again before I messed with any of that.Regarding where I see the errors, some are in the diagnostics system log. But I found the zero errors within the disk log under unassigned devices plugin. The smart errors from long tests.
February 27Feb 27 Author 6 hours ago, JorgeB said:Post the output from smartctl -x /dev/sdjSmartctl open device: /dev/sdj failed: No such deviceI currently only have one of the faulted drives connected as I was trying it in different bays right before posting. It was one that shows size zero and looks like it’s displayed as sdc. I tried the command with that one and got the screenshot.
February 27Feb 27 Community Expert Not even smartctl is working, which suggests a disk issue, or a problem with the controller it's connected to.
February 27Feb 27 Author 1 hour ago, JorgeB said:Not even smartctl is working, which suggests a disk issue, or a problem with the controller it's connected to.It's not probable but not impossible that 4 disks died in 6 months. Crazy though. One of the other drives shows partial smart errorrs. When I try to format it in a pool I think it's giving errors. Or something preventing formatting in the pool. That one prompts this on smartctl@Tower:~# smartctl -x /dev/sdb smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.12.54-Unraid] (local build) Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: Product: OOS22000G Revision: OOS1 Compliance: SPC-5 User Capacity: 22,000,969,973,760 bytes [22.0 TB] Logical block size: 512 bytes Physical block size: 4096 bytes LU is fully provisioned Rotation Rate: 7200 rpm Form Factor: 3.5 inches Logical Unit id: 0x5000c500da7b0e93 Serial number: 00009JGJ Device type: disk Transport protocol: SAS (SPL-4) Local Time is: Fri Feb 27 09:58:30 2026 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled Read Cache is: Enabled Writeback Cache is: Enabled === START OF READ SMART DATA SECTION === SMART Health Status: OK Format status indicates no format since manufacture Current temperature = 29 Lifetime maximum temperature = 34 Lifetime minimum temperature = 20 Maximum temperature since power on = 34 Minimum temperature since power on = 20 Relative humidity = 0 Lifetime maximum relative humidity = 0 Lifetime minimum relative humidity = 0 Maximum relative humidity since power on = 0 Minimum relative humidity since power on = 0 Manufactured in week 46 of year 2023 Specified cycle count over device lifetime: 50000 Accumulated start-stop cycles: 1955 Specified load-unload count over device lifetime: 600000 Accumulated load-unload cycles: 3811 Elements in grown defect list: 0 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 3 0 3 3 0.028 0 write: 0 0 0 0 0 21.067 0 Non-medium error count: 0 scsiPrintPendingDefectsLPage Failed [device not ready] [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background long Failed in segment --> - 12962 336190904 [0x3 0x11 0x0] # 2 Background long Failed in segment --> - 12785 336190904 [0x3 0x11 0x0] # 3 Background short Completed - 12284 - [- - -] # 4 Background long Aborted (device reset ?) - 2310 - [- - -] # 5 Background long Aborted (device reset ?) - 2271 - [- - -] # 6 Background long Aborted (by user command) - 2271 - [- - -] Long (extended) Self-test duration: 114660 seconds [31.9 hours] Background scan results log Status: no scans active Accumulated power on time, hours:minutes 13238:19 [794299 minutes] Number of background scans performed: 0, scan progress: 0.00% Number of background medium scans performed: 0 # when lba(hex) [sk,asc,ascq] reassign_status 1 7452:29 0000000119aad0b8 [3,11,0] Recovered via rewrite in-place 2 7452:29 0000000119aad0c0 [3,11,0] Recovered via rewrite in-place 3 7452:30 0000000119aad0c8 [3,11,0] Recovered via rewrite in-place 4 7452:30 0000000119aad0d0 [3,11,0] Recovered via rewrite in-place 5 7452:30 0000000119aad0d8 [3,11,0] Recovered via rewrite in-place 6 7452:30 0000000119aad0e0 [3,11,0] Recovered via rewrite in-place 7 7452:30 0000000119aad0e8 [3,11,0] Recovered via rewrite in-place 8 7452:30 0000000119aad0f0 [3,11,0] Recovered via rewrite in-place 9 7452:30 0000000119aae0c8 [3,11,0] Recovered via rewrite in-place 10 7500:39 0000000119aad738 [3,11,0] Recovered via rewrite in-place 11 7500:39 0000000119aad748 [3,11,0] Recovered via rewrite in-place 12 7500:39 0000000119aad750 [3,11,0] Recovered via rewrite in-place 13 7500:39 0000000119aad758 [3,11,0] Recovered via rewrite in-place 14 7500:39 0000000119aad760 [3,11,0] Recovered via rewrite in-place 15 7500:39 0000000119aad768 [3,11,0] Recovered via rewrite in-place 16 7500:39 0000000119aad770 [3,11,0] Recovered via rewrite in-place 17 7500:39 0000000119aad7b8 [3,11,0] Recovered via rewrite in-place 18 7500:39 0000000119aad7c8 [3,11,0] Recovered via rewrite in-place 19 7500:39 0000000119aad7d0 [3,11,0] Recovered via rewrite in-place 20 7500:39 0000000119aad7d8 [3,11,0] Recovered via rewrite in-place 21 7500:39 0000000119aad7e0 [3,11,0] Recovered via rewrite in-place 22 7500:39 0000000119aad7e8 [3,11,0] Recovered via rewrite in-place 23 7500:40 0000000119aad7f0 [3,11,0] Recovered via rewrite in-place 24 7500:40 0000000119aad8c8 [3,11,0] Recovered via rewrite in-place 25 7500:40 0000000119aad8d0 [3,11,0] Recovered via rewrite in-place 26 7500:40 0000000119aad8f0 [3,11,0] Recovered via rewrite in-place 27 7500:41 0000000119aace80 [3,11,0] Recovered via rewrite in-place 28 7500:41 0000000119aacfc0 [3,11,0] Recovered via rewrite in-place 29 7500:43 0000000119aad280 [3,11,0] Recovered via rewrite in-place 30 7500:45 0000000119aadd40 [3,11,0] Recovered via rewrite in-place 31 7500:45 0000000119aadf80 [3,11,0] Recovered via rewrite in-place 32 7503:16 0000000119aad6a8 [3,11,0] Recovered via rewrite in-place 33 7503:16 0000000119aadb58 [3,11,0] Recovered via rewrite in-place 34 7503:18 0000000119aadfb8 [3,11,0] Recovered via rewrite in-place 35 7503:18 0000000119aae1e0 [3,11,0] Recovered via rewrite in-place 36 7506:50 0000000119aae488 [3,11,0] Recovered via rewrite in-place 37 7661:35 0000000119abfc20 [3,11,0] Recovered via rewrite in-place 38 7661:35 0000000119abfc28 [3,11,0] Recovered via rewrite in-place 39 7661:35 0000000119abfc30 [3,11,0] Recovered via rewrite in-place 40 7661:35 0000000119abfc38 [3,11,0] Recovered via rewrite in-place 41 7661:35 0000000119abfc40 [3,11,0] Recovered via rewrite in-place 42 7661:35 0000000119abfc48 [3,11,0] Recovered via rewrite in-place 43 7661:35 0000000119abfc50 [3,11,0] Recovered via rewrite in-place 44 7661:35 0000000119abfc58 [3,11,0] Recovered via rewrite in-place 45 8056:58 0000000161aae200 [3,11,0] Recovered via rewrite in-place 46 8102:03 0000000161aadc38 [3,11,0] Recovered via rewrite in-place 47 8319:06 0000000161aad150 [3,11,0] Recovered via rewrite in-place 48 8497:29 0000000161aaddc8 [3,11,0] Recovered via rewrite in-place 49 8721:25 0000000161aae068 [3,11,0] Recovered via rewrite in-place 50 8770:06 0000000161aacc70 [3,11,0] Recovered via rewrite in-place Device does not support General statistics and performance logging Protocol Specific port log page for SAS SSP relative target port id = 1 generation code = 14 number of phys = 1 phy identifier = 0 attached device type: SAS or SATA device attached reason: unknown reason: unknown negotiated logical link rate: phy enabled; 12 Gbps attached initiator port: ssp=1 stp=1 smp=1 attached target port: ssp=0 stp=0 smp=0 SAS address = 0x5000c500da7b0e91 attached SAS address = 0x500605b010a1766c attached phy identifier = 11 Invalid DWORD count = 0 Running disparity error count = 0 Loss of DWORD synchronization count = 22 Phy reset problem count = 0 relative target port id = 2 generation code = 14 number of phys = 1 phy identifier = 1 attached device type: no device attached attached reason: unknown reason: unknown negotiated logical link rate: phy enabled; unknown attached initiator port: ssp=0 stp=0 smp=0 attached target port: ssp=0 stp=0 smp=0 SAS address = 0x5000c500da7b0e92 attached SAS address = 0x0 attached phy identifier = 0 Invalid DWORD count = 0 Running disparity error count = 0 Loss of DWORD synchronization count = 0 Phy reset problem count = 0
February 27Feb 27 Author So what should I do in this situation or what would you do? Should I buy another drive to replace unlucky number 4? Two bad HBAs? Did something maybe fry these?
February 27Feb 27 Community Expert Solution 41 minutes ago, cjkuhlenbeck said:Background long Failed in segment --> This disk is failing the long SMART test. This is a physical disk problem, and it should be replaced.Note that bad power can cause disks to fail, so it's worth considering if plenty of disks are failing, but these look like white-label disks, so they can be refurbished, and in my experience those disks fail a lot.
February 27Feb 27 Author 3 hours ago, JorgeB said:This disk is failing the long SMART test. This is a physical disk problem, and it should be replaced.Note that bad power can cause disks to fail, so it's worth considering if plenty of disks are failing, but these look like white-label disks, so they can be refurbished, and in my experience those disks fail a lot.I appreciate the guidance. The original disks were refurb so I’m not too surprised even though they ran a year without issue. I’d be more surprised if it was a PSU issue as that wasn’t a cheap one. I’m replacing the drives with new (and warrantied lol) drives so hopefully looking good from here. I’m replace the PSU if one of the new drives dies on me. Thanks again! Edited February 27Feb 27 by cjkuhlenbeck
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.