Disk failed repeatedly

October 30, 20232 yr

Hi UNRAID Community,

I have an issue with different drives failing but the SMART health reports don't show any errors.

Attached you find the diag report just after the disk3 failed @ around 19:00 and got disabled

The disk3 and 4 failed several days ago too, but the SMART values are look good in my optinion. I did some tesing on my workbench PC and the disk3 did not show any signs of failures/errors.

Might be an controller/cable issue (LSI SAS2308, FWVersion(20.00.07.00), ChipRevision(0x05), BiosVersion(07.39.02.00)) but the systems runs for more than 3 years now without major issues

I replaced the disk3 now with a new (tested) one. But I don't think this is the root cause here

Please help me analyse the diag-logs

alpha-unraid-diagnostics-20231029-1938.zip

Quote

October 30, 20232 yr

Community Expert

The issue is happening with multiple devices, first I would try disabling spin down to see if it's related to that, if it doesn't help power/cables would be the next suspects.

Quote

October 30, 20232 yr

Author

Should I disable Spin-Down alltogether or just increase the timining?

I'm using a FANTEC SRC-4240X07 with SAS/SATA backplane. It should affect all disks and not 1-2 right?

Quote

October 30, 20232 yr

Community Expert

To see if it makes a difference you need to disable it completely.

Quote

October 30, 20232 yr

I had some disk issues that I believe I narrowed down to spin down. I had a disk that I would spin down that was only my media library, so it wasn't used all that often. But also in a sweep of frustration I also swapped out the SATA cables as I was using those thin cables that are bound together, so somewhere between the cables and turning off spin down, the errors have gone away.

Quote

October 30, 20232 yr

Author

I really like the spin down functionality to reduce power consumption...

Quote

October 30, 20232 yr

Community Expert

First confirm if that is the problem, if yes using a different controller (or different disks) should allow you to use it again.

Quote

November 16, 20232 yr

Author

The spin-down seems to be the issue here.

The array run for 2 weeks now with spin-down set to never.

I started yesterday with setting individual disks to spin-down after 15mins.

Almost all of my disks work with this.

Only my Exos X18 (Disk11 /dev/sdk) disks don't.

Nov 16 10:19:32 Alpha-Unraid kernel: mdcmd (74): set md_num_stripes 1280
Nov 16 10:19:32 Alpha-Unraid kernel: mdcmd (75): set md_queue_limit 80
Nov 16 10:19:32 Alpha-Unraid kernel: mdcmd (76): set md_sync_limit 5
Nov 16 10:19:32 Alpha-Unraid kernel: mdcmd (77): set md_write_method
Nov 16 10:19:32 Alpha-Unraid emhttpd: spinning down /dev/sdk
Nov 16 10:20:01 Alpha-Unraid flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update
Nov 16 10:20:47 Alpha-Unraid kernel: sd 11:0:5:0: attempting task abort!scmd(0x000000000658e7e8), outstanding for 7050 ms & timeout 7000 ms
Nov 16 10:20:47 Alpha-Unraid kernel: sd 11:0:5:0: [sdk] tag#6626 CDB: opcode=0x85 85 06 20 00 d8 00 00 00 00 00 4f 00 c2 00 b0 00
Nov 16 10:20:47 Alpha-Unraid kernel: scsi target11:0:5: handle(0x000e), sas_address(0x5001517e3b0bf0aa), phy(10)
Nov 16 10:20:47 Alpha-Unraid kernel: scsi target11:0:5: enclosure logical id(0x5001e677b6dbbfff), slot(10)
Nov 16 10:20:47 Alpha-Unraid kernel: sd 11:0:5:0: device_block, handle(0x000e)
Nov 16 10:20:49 Alpha-Unraid kernel: sd 11:0:5:0: device_unblock and setting to running, handle(0x000e)
Nov 16 10:20:50 Alpha-Unraid kernel: sd 11:0:5:0: task abort: SUCCESS scmd(0x000000000658e7e8)
Nov 16 10:20:50 Alpha-Unraid kernel: sd 11:0:5:0: [sdk] Synchronizing SCSI cache
Nov 16 10:20:50 Alpha-Unraid kernel: sd 11:0:5:0: [sdk] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=DRIVER_OK
Nov 16 10:20:50 Alpha-Unraid kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x5001517e3b0bf0aa)
Nov 16 10:20:50 Alpha-Unraid kernel: mpt2sas_cm0: removing handle(0x000e), sas_addr(0x5001517e3b0bf0aa)
Nov 16 10:20:50 Alpha-Unraid kernel: mpt2sas_cm0: enclosure logical id(0x5001e677b6dbbfff), slot(10)
Nov 16 10:20:50 Alpha-Unraid emhttpd: read SMART /dev/sdk
Nov 16 10:20:50 Alpha-Unraid kernel: mpt2sas_cm0: handle(0xe) sas_address(0x5001517e3b0bf0aa) port_type(0x1)
Nov 16 10:20:51 Alpha-Unraid kernel: scsi 11:0:15:0: Direct-Access ATA ST16000NM000J-2T SC02 PQ: 0 ANSI: 6
Nov 16 10:20:51 Alpha-Unraid kernel: scsi 11:0:15:0: SATA: handle(0x000e), sas_addr(0x5001517e3b0bf0aa), phy(10), device_name(0x0000000000000000)
Nov 16 10:20:51 Alpha-Unraid kernel: scsi 11:0:15:0: enclosure logical id (0x5001e677b6dbbfff), slot(10)
Nov 16 10:20:51 Alpha-Unraid kernel: scsi 11:0:15:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Nov 16 10:20:51 Alpha-Unraid kernel: scsi 11:0:15:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
Nov 16 10:20:51 Alpha-Unraid kernel: sd 11:0:15:0: Attached scsi generic sg10 type 0
Nov 16 10:20:51 Alpha-Unraid kernel: sd 11:0:15:0: Power-on or device reset occurred
Nov 16 10:20:51 Alpha-Unraid kernel: end_device-11:0:11: add: handle(0x000e), sas_addr(0x5001517e3b0bf0aa)
Nov 16 10:20:51 Alpha-Unraid kernel: sd 11:0:15:0: [sdt] 31251759104 512-byte logical blocks: (16.0 TB/14.6 TiB)
Nov 16 10:20:51 Alpha-Unraid kernel: sd 11:0:15:0: [sdt] 4096-byte physical blocks
Nov 16 10:20:51 Alpha-Unraid kernel: sd 11:0:15:0: [sdt] Write Protect is off
Nov 16 10:20:51 Alpha-Unraid kernel: sd 11:0:15:0: [sdt] Mode Sense: 7f 00 10 08
Nov 16 10:20:51 Alpha-Unraid kernel: sd 11:0:15:0: [sdt] Write cache: enabled, read cache: enabled, supports DPO and FUA
Nov 16 10:20:51 Alpha-Unraid kernel: sdt: sdt1
Nov 16 10:20:51 Alpha-Unraid kernel: sd 11:0:15:0: [sdt] Attached SCSI disk
Nov 16 10:20:52 Alpha-Unraid unassigned.devices: Disk with ID 'ST16000NM000J-2TW103_ZR5AGC4V ()' is not set to auto mount.
Nov 16 10:20:53 Alpha-Unraid emhttpd: error: hotplug_devices, 1706: No such file or directory (2): tagged device ST16000NM000J-2TW103_ZR5AGC4V was (sdk) is now (sdt)
Nov 16 10:20:53 Alpha-Unraid emhttpd: read SMART /dev/sdt
Nov 16 10:20:53 Alpha-Unraid kernel: emhttpd[10420]: segfault at 67c ip 000056528418775f sp 00007ffd34b8b620 error 4 in emhttpd[565284172000+24000] likely on CPU 8 (core 10, socket 0)
Nov 16 10:20:53 Alpha-Unraid kernel: Code: c4 36 01 00 48 89 45 f8 48 8d 05 f9 23 01 00 48 89 45 f0 e9 79 01 00 00 8b 45 ec 89 c7 e8 1a 88 ff ff 48 89 45 d8 48 8b 45 d8 <8b> 80 7c 06 00 00 85 c0 0f 94 c0 0f b6 c0 89 45 d4 48 8b 45 e0 48
Nov 16 10:21:01 Alpha-Unraid flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update

Quote

November 16, 20232 yr

Community Expert

There are some spin down known issues with LSI and Seagate, this is a for a different model disk but see if it helps:

Quote

November 16, 20232 yr

Author

Thanks i'll try this as soon as the parity rebuild is done.

I have two IronWolf disks in the array who don't have the issue.

When this issue first came up, the disks failing where WesternDigital-disks

I replaced the WD-disks with new Exos X18

I'll update the post as soon as its done and tested

Quote

November 18, 20232 yr

Author

The EPC and lowCurrentSpinup setting did not solve the issue

I noticed this the last time too but did not tell you here. When the disks goes into error-mode, the web-gui becomes kind of read-only. The Stop-Array buttons does not work and no settings can be changed anymore. The containers and VMs are still up an I can access shares. I have to push the power-button to shutdown and restart the array

Very strange

alpha-unraid-diagnostics-20231118-1739.zip

Quote

November 18, 20232 yr

Community Expert

The diagnostics show that disk12 dropped offline, and then later reconnected. How is this connected? Have you checked its power and SATA cabling?

Quote

November 18, 20232 yr

Author

I'm using a FANTEC SRC-4240X07, it has a 24 port sas/sata backplane. I really doubt this is a "cable" issue. At first I had the issue with disk 3+4 and now I'm reproducing it with disk11 because there is no data on it. I don't know why disk12 disconnected

Quote

November 19, 20232 yr

Community Expert

If it works without spinning down the disks suggest doing that for now, or try with a different controller.

Quote

Disk failed repeatedly

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)