Disk and pool issues; unable to format, disk has size zero, smart errors

February 26Feb 26

Hello, hoping to get some guidance on what to do from here.

I had a ZFS z2 pool of 6 disks that ran fine for quite awhile. One drive died with smart errors and when I went to replace it, another died with smart errors. I was in the middle of a move, so I left it for a few months.

Now I’m back at it again. When I went to check, another drive just wasn’t showing, and prompted disk has size zero. I figured there was no way that many died in that short of time so I went and got a replacement HBA card. Issues persisted. So I bought another 2 drives but another drive died with the size zero. So of the original 6 disks, 2 died with smart errors and 2 are showing size zero.

There is a backplane, but they’re separated into 3. So if one is bad, the row should be. But the same disks that work will work in all 3 rows. And the ones that don’t, won’t. I’m clueless here. Any ideas?

tower-diagnostics-20260226-1441.zip

Quote

February 26Feb 26

Community Expert

13 minutes ago, cjkuhlenbeck said:
2 are showing size zero.

Where are you seeing that?

Also, SSDs in the array cannot be trimmed. Unraid V7 doesn't require an array.

Quote

February 26Feb 26

Community Expert

1 minute ago, trurl said:
SSDs in the array cannot be trimmed. Unraid V7 doesn't require an array.

It would probably make more sense to put those in another multi-disk pool.

Quote

February 26Feb 26

Author

6 minutes ago, trurl said:
Where are you seeing that?
Also, SSDs in the array cannot be trimmed. Unraid V7 doesn't require an array.

Hey, thanks for the fast reply. Regarding the SSDs, this was setup before v7. I did see that mentioned in the updates and I’m excited to try it out. But I wanted to get the server up again before I messed with any of that.

Regarding where I see the errors, some are in the diagnostics system log. But I found the zero errors within the disk log under unassigned devices plugin. The smart errors from long tests.

Quote

February 27Feb 27

Community Expert

Post the output from smartctl -x /dev/sdj

Quote

February 27Feb 27

Author

6 hours ago, JorgeB said:
Post the output from smartctl -x /dev/sdj

Smartctl open device: /dev/sdj failed: No such device

I currently only have one of the faulted drives connected as I was trying it in different bays right before posting. It was one that shows size zero and looks like it’s displayed as sdc. I tried the command with that one and got the screenshot.

Quote

February 27Feb 27

Community Expert

Not even smartctl is working, which suggests a disk issue, or a problem with the controller it's connected to.

Quote

February 27Feb 27

Author

1 hour ago, JorgeB said:
Not even smartctl is working, which suggests a disk issue, or a problem with the controller it's connected to.

It's not probable but not impossible that 4 disks died in 6 months. Crazy though. One of the other drives shows partial smart errorrs. When I try to format it in a pool I think it's giving errors. Or something preventing formatting in the pool. That one prompts this on smartctl

@Tower:~# smartctl -x /dev/sdb
smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.12.54-Unraid] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               
Product:              OOS22000G
Revision:             OOS1
Compliance:           SPC-5
User Capacity:        22,000,969,973,760 bytes [22.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500da7b0e93
Serial number:        00009JGJ
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Fri Feb 27 09:58:30 2026 CST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled
Read Cache is:        Enabled
Writeback Cache is:   Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Format status indicates no format since manufacture
Current temperature = 29
Lifetime maximum temperature = 34
Lifetime minimum temperature = 20
Maximum temperature since power on = 34
Minimum temperature since power on = 20
Relative humidity = 0
Lifetime maximum relative humidity = 0
Lifetime minimum relative humidity = 0
Maximum relative humidity since power on = 0
Minimum relative humidity since power on = 0
Manufactured in week 46 of year 2023
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  1955
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  3811
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        3         0         3          3          0.028           0
write:         0        0         0         0          0         21.067           0

Non-medium error count:        0

scsiPrintPendingDefectsLPage Failed [device not ready]

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Failed in segment -->       -   12962         336190904 [0x3 0x11 0x0]
# 2  Background long   Failed in segment -->       -   12785         336190904 [0x3 0x11 0x0]
# 3  Background short  Completed                   -   12284                 - [-   -    -]
# 4  Background long   Aborted (device reset ?)    -    2310                 - [-   -    -]
# 5  Background long   Aborted (device reset ?)    -    2271                 - [-   -    -]
# 6  Background long   Aborted (by user command)   -    2271                 - [-   -    -]

Long (extended) Self-test duration: 114660 seconds [31.9 hours]

Background scan results log
  Status: no scans active
    Accumulated power on time, hours:minutes 13238:19 [794299 minutes]
    Number of background scans performed: 0,  scan progress: 0.00%
    Number of background medium scans performed: 0

   #  when        lba(hex)    [sk,asc,ascq]    reassign_status
   1 7452:29  0000000119aad0b8  [3,11,0]   Recovered via rewrite in-place
   2 7452:29  0000000119aad0c0  [3,11,0]   Recovered via rewrite in-place
   3 7452:30  0000000119aad0c8  [3,11,0]   Recovered via rewrite in-place
   4 7452:30  0000000119aad0d0  [3,11,0]   Recovered via rewrite in-place
   5 7452:30  0000000119aad0d8  [3,11,0]   Recovered via rewrite in-place
   6 7452:30  0000000119aad0e0  [3,11,0]   Recovered via rewrite in-place
   7 7452:30  0000000119aad0e8  [3,11,0]   Recovered via rewrite in-place
   8 7452:30  0000000119aad0f0  [3,11,0]   Recovered via rewrite in-place
   9 7452:30  0000000119aae0c8  [3,11,0]   Recovered via rewrite in-place
  10 7500:39  0000000119aad738  [3,11,0]   Recovered via rewrite in-place
  11 7500:39  0000000119aad748  [3,11,0]   Recovered via rewrite in-place
  12 7500:39  0000000119aad750  [3,11,0]   Recovered via rewrite in-place
  13 7500:39  0000000119aad758  [3,11,0]   Recovered via rewrite in-place
  14 7500:39  0000000119aad760  [3,11,0]   Recovered via rewrite in-place
  15 7500:39  0000000119aad768  [3,11,0]   Recovered via rewrite in-place
  16 7500:39  0000000119aad770  [3,11,0]   Recovered via rewrite in-place
  17 7500:39  0000000119aad7b8  [3,11,0]   Recovered via rewrite in-place
  18 7500:39  0000000119aad7c8  [3,11,0]   Recovered via rewrite in-place
  19 7500:39  0000000119aad7d0  [3,11,0]   Recovered via rewrite in-place
  20 7500:39  0000000119aad7d8  [3,11,0]   Recovered via rewrite in-place
  21 7500:39  0000000119aad7e0  [3,11,0]   Recovered via rewrite in-place
  22 7500:39  0000000119aad7e8  [3,11,0]   Recovered via rewrite in-place
  23 7500:40  0000000119aad7f0  [3,11,0]   Recovered via rewrite in-place
  24 7500:40  0000000119aad8c8  [3,11,0]   Recovered via rewrite in-place
  25 7500:40  0000000119aad8d0  [3,11,0]   Recovered via rewrite in-place
  26 7500:40  0000000119aad8f0  [3,11,0]   Recovered via rewrite in-place
  27 7500:41  0000000119aace80  [3,11,0]   Recovered via rewrite in-place
  28 7500:41  0000000119aacfc0  [3,11,0]   Recovered via rewrite in-place
  29 7500:43  0000000119aad280  [3,11,0]   Recovered via rewrite in-place
  30 7500:45  0000000119aadd40  [3,11,0]   Recovered via rewrite in-place
  31 7500:45  0000000119aadf80  [3,11,0]   Recovered via rewrite in-place
  32 7503:16  0000000119aad6a8  [3,11,0]   Recovered via rewrite in-place
  33 7503:16  0000000119aadb58  [3,11,0]   Recovered via rewrite in-place
  34 7503:18  0000000119aadfb8  [3,11,0]   Recovered via rewrite in-place
  35 7503:18  0000000119aae1e0  [3,11,0]   Recovered via rewrite in-place
  36 7506:50  0000000119aae488  [3,11,0]   Recovered via rewrite in-place
  37 7661:35  0000000119abfc20  [3,11,0]   Recovered via rewrite in-place
  38 7661:35  0000000119abfc28  [3,11,0]   Recovered via rewrite in-place
  39 7661:35  0000000119abfc30  [3,11,0]   Recovered via rewrite in-place
  40 7661:35  0000000119abfc38  [3,11,0]   Recovered via rewrite in-place
  41 7661:35  0000000119abfc40  [3,11,0]   Recovered via rewrite in-place
  42 7661:35  0000000119abfc48  [3,11,0]   Recovered via rewrite in-place
  43 7661:35  0000000119abfc50  [3,11,0]   Recovered via rewrite in-place
  44 7661:35  0000000119abfc58  [3,11,0]   Recovered via rewrite in-place
  45 8056:58  0000000161aae200  [3,11,0]   Recovered via rewrite in-place
  46 8102:03  0000000161aadc38  [3,11,0]   Recovered via rewrite in-place
  47 8319:06  0000000161aad150  [3,11,0]   Recovered via rewrite in-place
  48 8497:29  0000000161aaddc8  [3,11,0]   Recovered via rewrite in-place
  49 8721:25  0000000161aae068  [3,11,0]   Recovered via rewrite in-place
  50 8770:06  0000000161aacc70  [3,11,0]   Recovered via rewrite in-place
Device does not support General statistics and performance logging

Protocol Specific port log page for SAS SSP
relative target port id = 1
  generation code = 14
  number of phys = 1
  phy identifier = 0
    attached device type: SAS or SATA device
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 12 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c500da7b0e91
    attached SAS address = 0x500605b010a1766c
    attached phy identifier = 11
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization count = 22
    Phy reset problem count = 0
relative target port id = 2
  generation code = 14
  number of phys = 1
  phy identifier = 1
    attached device type: no device attached
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; unknown
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c500da7b0e92
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization count = 0
    Phy reset problem count = 0

Quote

February 27Feb 27

Author

So what should I do in this situation or what would you do? Should I buy another drive to replace unlucky number 4? Two bad HBAs? Did something maybe fry these?

Quote

February 27Feb 27

Community Expert
Solution

41 minutes ago, cjkuhlenbeck said:
Background long   Failed in segment -->  

This disk is failing the long SMART test. This is a physical disk problem, and it should be replaced.

Note that bad power can cause disks to fail, so it's worth considering if plenty of disks are failing, but these look like white-label disks, so they can be refurbished, and in my experience those disks fail a lot.

Quote

February 27Feb 27

Author

3 hours ago, JorgeB said:
This disk is failing the long SMART test. This is a physical disk problem, and it should be replaced.
Note that bad power can cause disks to fail, so it's worth considering if plenty of disks are failing, but these look like white-label disks, so they can be refurbished, and in my experience those disks fail a lot.

I appreciate the guidance. The original disks were refurb so I’m not too surprised even though they ran a year without issue. I’d be more surprised if it was a PSU issue as that wasn’t a cheap one. I’m replacing the drives with new (and warrantied lol) drives so hopefully looking good from here. I’m replace the PSU if one of the new drives dies on me. Thanks again!

Edited February 27Feb 27 by cjkuhlenbeck

Quote

1

Disk and pool issues; unable to format, disk has size zero, smart errors

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)