Disk Errors

March 5Mar 5

I've got several drives in my array, six of which are the same model from the same place, and I'm seeing a lot of errors

Mar  2 11:42:21 Innsmouth kernel: md: disk8 write error, sector=2000
Mar  2 11:42:21 Innsmouth kernel: md: disk8 write error, sector=2008
Mar  2 11:42:21 Innsmouth kernel: md: disk8 write error, sector=2016
Mar  2 11:42:21 Innsmouth kernel: md: disk8 write error, sector=2024
Mar  2 11:42:21 Innsmouth kernel: md: disk8 write error, sector=2032
Mar  2 11:45:29 Innsmouth kernel: Buffer I/O error on dev md8p1, logical block 0, async page read
Mar  2 11:46:02 Innsmouth kernel: Buffer I/O error on dev md8p1, logical block 0, async page read
Mar  2 11:47:02 Innsmouth kernel: Buffer I/O error on dev md8p1, logical block 0, async page read
Mar  2 11:48:02 Innsmouth kernel: Buffer I/O error on dev md8p1, logical block 0, async page read
Mar  2 11:49:02 Innsmouth kernel: Buffer I/O error on dev md8p1, logical block 0, async page read
Mar  2 11:49:57 Innsmouth kernel: Buffer I/O error on dev md8p1, logical block 0, async page read
Mar  2 11:50:00 Innsmouth kernel: Buffer I/O error on dev md8p1, logical block 0, async page read
Mar  2 11:50:11 Innsmouth kernel: critical target error, dev sdh, sector 1856 op 0x1:(WRITE) flags 0x4000 phys_seg 64 prio class 0
Mar  2 11:50:11 Innsmouth kernel: md: disk8 write error, sector=1792
Mar  2 11:50:11 Innsmouth kernel: md: disk8 write error, sector=1800
Mar  2 11:50:11 Innsmouth kernel: md: disk8 write error, sector=1808
Mar  2 11:50:11 Innsmouth kernel: md: disk8 write error, sector=1816
Mar  2 11:50:11 Innsmouth kernel: md: disk8 write error, sector=1824
Mar  2 11:50:11 Innsmouth kernel: md: disk8 write error, sector=1832
Mar  2 11:50:11 Innsmouth kernel: md: disk8 write error, sector=1840

Here is the SMART report for one of the drives

smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.12.54-Unraid] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               IBM-XIV
Product:              ST6000NM0054  D5
Revision:             EC6D
Compliance:           SPC-4
User Capacity:        6,001,175,122,432 bytes [6.00 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500845ddc87
Serial number:        Z4D39C310000R608MHK6
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Tue Mar  3 01:29:25 2026 PST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled
Read Cache is:        Enabled
Writeback Cache is:   Disabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification = 0
Total blocks reassigned during format = 0
Total new blocks reassigned = 0
Power on minutes since format = 1690834
Current Drive Temperature:     29 C
Drive Trip Temperature:        65 C

Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   1949493561        0         0  1949493561          0     850636.856           0
write:         0        0         2         2          2     286897.910           0
verify: 178262958        0         0  178262958          0       8444.860           0

Non-medium error count:      523

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -   64410                 - [-   -    -]
# 2  Background short  Completed                   -   64368                 - [-   -    -]
# 3  Background long   Completed                   -   64285                 - [-   -    -]
# 4  Background short  Completed                   -   64257                 - [-   -    -]
# 5  Background short  Completed                   -   64246                 - [-   -    -]
# 6  Background short  Completed                   -   36637                 - [-   -    -]
# 7  Background short  Completed                   -   36565                 - [-   -    -]
# 8  Background short  Completed                   -   36469                 - [-   -    -]
# 9  Background short  Completed                   -   36416                 - [-   -    -]
#10  Background long   Completed                   -      81                 - [-   -    -]
#11  Background short  Completed                   -      67                 - [-   -    -]
#12  Background short  Aborted (by user command)   -      45                 - [-   -    -]

Long (extended) Self-test duration: 38632 seconds [10.7 hours]

Background scan results log
  Status: waiting until BMS interval timer expires
    Accumulated power on time, hours:minutes 64524:51 [3871491 minutes]
    Number of background scans performed: 329,  scan progress: 0.00%
    Number of background medium scans performed: 329
Device does not support General statistics and performance logging

Protocol Specific port log page for SAS SSP
relative target port id = 1
  generation code = 0
  number of phys = 1
  phy identifier = 0
    attached device type: expander device
    attached reason: SMP phy control function
    reason: loss of dword synchronization
    negotiated logical link rate: phy enabled; 6 Gbps
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=1
    SAS address = 0x5000c500845ddc85
    attached SAS address = 0x500056b39f5e9dff
    attached phy identifier = 6
    Invalid DWORD count = 28
    Running disparity error count = 28
    Loss of DWORD synchronization count = 14
    Phy reset problem count = 10
relative target port id = 2
  generation code = 0
  number of phys = 1
  phy identifier = 1
    attached device type: no device attached
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; unknown
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c500845ddc86
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization count = 0
    Phy reset problem count = 0

I don't believe I see anything out of place in the smart report

These are all 6tb SAS drives, the rest are SATA of varying sizes. I have them installed in an R730XD

One drive has been kicked from the array twice, and two of them were kicked a couple days ago.

There are no errors in the lifecycle logs on the iDRAC and all drives are showing as good.

All of the SAS drives have given errors, but only three have been kicked out of the array (one of them twice now). Swapping a sata drive into one of the failed SAS drives gives no errors, so it does not appear to be specific to a slot or cable unless there is a specific issue that a SAS drive would hit that a SATA drive wouldn't.

Can you help shed a little light on what may be happening here, since I get the feeling they're going to push back a bit on replacing 6 drives

Quote

March 5Mar 5

Community Expert

Attach Diagnostics ZIP to your NEXT post in this thread.

Quote

March 5Mar 5

Author

diagnostic attached

innsmouth-diagnostics-20260305-1340.zip

Quote

March 5Mar 5

Community Expert

Errors in multiple disks at the same time are typically a power/conenction issue:

Mar 5 00:36:09 Innsmouth kernel: md: disk3 read error, sector=11669581064

Mar 5 00:36:09 Innsmouth kernel: md: disk1 read error, sector=11669580824

Mar 5 00:36:09 Innsmouth kernel: md: disk4 read error, sector=11669580824

Do they share anything in common, like a power splitter or miniSAS cable?

Quote

March 5Mar 5

Author

I have 12 slots in the front, I believe there are two power and two SAS cables from the backplane to the mainboard. Initially, all the SAS drives were in slots 0-5 which would have shared a connection, but I moved one to slot 6 and still had issues.

If it was an issue with a cable, shouldn't the SATA drives throw errors too?

Quote

March 6Mar 6

Community Expert

Could be a backplane issue, or just a bad PSU as well.

Quote

Disk Errors

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)