mic.88 Posted October 30, 2023 Share Posted October 30, 2023 Hi UNRAID Community, I have an issue with different drives failing but the SMART health reports don't show any errors. Attached you find the diag report just after the disk3 failed @ around 19:00 and got disabled The disk3 and 4 failed several days ago too, but the SMART values are look good in my optinion. I did some tesing on my workbench PC and the disk3 did not show any signs of failures/errors. Might be an controller/cable issue (LSI SAS2308, FWVersion(20.00.07.00), ChipRevision(0x05), BiosVersion(07.39.02.00)) but the systems runs for more than 3 years now without major issues I replaced the disk3 now with a new (tested) one. But I don't think this is the root cause here Please help me analyse the diag-logs alpha-unraid-diagnostics-20231029-1938.zip Quote Link to comment
JorgeB Posted October 30, 2023 Share Posted October 30, 2023 The issue is happening with multiple devices, first I would try disabling spin down to see if it's related to that, if it doesn't help power/cables would be the next suspects. Quote Link to comment
mic.88 Posted October 30, 2023 Author Share Posted October 30, 2023 Should I disable Spin-Down alltogether or just increase the timining? I'm using a FANTEC SRC-4240X07 with SAS/SATA backplane. It should affect all disks and not 1-2 right? Quote Link to comment
JorgeB Posted October 30, 2023 Share Posted October 30, 2023 To see if it makes a difference you need to disable it completely. Quote Link to comment
mellow65 Posted October 30, 2023 Share Posted October 30, 2023 I had some disk issues that I believe I narrowed down to spin down. I had a disk that I would spin down that was only my media library, so it wasn't used all that often. But also in a sweep of frustration I also swapped out the SATA cables as I was using those thin cables that are bound together, so somewhere between the cables and turning off spin down, the errors have gone away. Quote Link to comment
mic.88 Posted October 30, 2023 Author Share Posted October 30, 2023 I really like the spin down functionality to reduce power consumption... Quote Link to comment
JorgeB Posted October 30, 2023 Share Posted October 30, 2023 First confirm if that is the problem, if yes using a different controller (or different disks) should allow you to use it again. Quote Link to comment
mic.88 Posted November 16, 2023 Author Share Posted November 16, 2023 The spin-down seems to be the issue here. The array run for 2 weeks now with spin-down set to never. I started yesterday with setting individual disks to spin-down after 15mins. Almost all of my disks work with this. Only my Exos X18 (Disk11 /dev/sdk) disks don't. Nov 16 10:19:32 Alpha-Unraid kernel: mdcmd (74): set md_num_stripes 1280 Nov 16 10:19:32 Alpha-Unraid kernel: mdcmd (75): set md_queue_limit 80 Nov 16 10:19:32 Alpha-Unraid kernel: mdcmd (76): set md_sync_limit 5 Nov 16 10:19:32 Alpha-Unraid kernel: mdcmd (77): set md_write_method Nov 16 10:19:32 Alpha-Unraid emhttpd: spinning down /dev/sdk Nov 16 10:20:01 Alpha-Unraid flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update Nov 16 10:20:47 Alpha-Unraid kernel: sd 11:0:5:0: attempting task abort!scmd(0x000000000658e7e8), outstanding for 7050 ms & timeout 7000 ms Nov 16 10:20:47 Alpha-Unraid kernel: sd 11:0:5:0: [sdk] tag#6626 CDB: opcode=0x85 85 06 20 00 d8 00 00 00 00 00 4f 00 c2 00 b0 00 Nov 16 10:20:47 Alpha-Unraid kernel: scsi target11:0:5: handle(0x000e), sas_address(0x5001517e3b0bf0aa), phy(10) Nov 16 10:20:47 Alpha-Unraid kernel: scsi target11:0:5: enclosure logical id(0x5001e677b6dbbfff), slot(10) Nov 16 10:20:47 Alpha-Unraid kernel: sd 11:0:5:0: device_block, handle(0x000e) Nov 16 10:20:49 Alpha-Unraid kernel: sd 11:0:5:0: device_unblock and setting to running, handle(0x000e) Nov 16 10:20:50 Alpha-Unraid kernel: sd 11:0:5:0: task abort: SUCCESS scmd(0x000000000658e7e8) Nov 16 10:20:50 Alpha-Unraid kernel: sd 11:0:5:0: [sdk] Synchronizing SCSI cache Nov 16 10:20:50 Alpha-Unraid kernel: sd 11:0:5:0: [sdk] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=DRIVER_OK Nov 16 10:20:50 Alpha-Unraid kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x5001517e3b0bf0aa) Nov 16 10:20:50 Alpha-Unraid kernel: mpt2sas_cm0: removing handle(0x000e), sas_addr(0x5001517e3b0bf0aa) Nov 16 10:20:50 Alpha-Unraid kernel: mpt2sas_cm0: enclosure logical id(0x5001e677b6dbbfff), slot(10) Nov 16 10:20:50 Alpha-Unraid emhttpd: read SMART /dev/sdk Nov 16 10:20:50 Alpha-Unraid kernel: mpt2sas_cm0: handle(0xe) sas_address(0x5001517e3b0bf0aa) port_type(0x1) Nov 16 10:20:51 Alpha-Unraid kernel: scsi 11:0:15:0: Direct-Access ATA ST16000NM000J-2T SC02 PQ: 0 ANSI: 6 Nov 16 10:20:51 Alpha-Unraid kernel: scsi 11:0:15:0: SATA: handle(0x000e), sas_addr(0x5001517e3b0bf0aa), phy(10), device_name(0x0000000000000000) Nov 16 10:20:51 Alpha-Unraid kernel: scsi 11:0:15:0: enclosure logical id (0x5001e677b6dbbfff), slot(10) Nov 16 10:20:51 Alpha-Unraid kernel: scsi 11:0:15:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Nov 16 10:20:51 Alpha-Unraid kernel: scsi 11:0:15:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1) Nov 16 10:20:51 Alpha-Unraid kernel: sd 11:0:15:0: Attached scsi generic sg10 type 0 Nov 16 10:20:51 Alpha-Unraid kernel: sd 11:0:15:0: Power-on or device reset occurred Nov 16 10:20:51 Alpha-Unraid kernel: end_device-11:0:11: add: handle(0x000e), sas_addr(0x5001517e3b0bf0aa) Nov 16 10:20:51 Alpha-Unraid kernel: sd 11:0:15:0: [sdt] 31251759104 512-byte logical blocks: (16.0 TB/14.6 TiB) Nov 16 10:20:51 Alpha-Unraid kernel: sd 11:0:15:0: [sdt] 4096-byte physical blocks Nov 16 10:20:51 Alpha-Unraid kernel: sd 11:0:15:0: [sdt] Write Protect is off Nov 16 10:20:51 Alpha-Unraid kernel: sd 11:0:15:0: [sdt] Mode Sense: 7f 00 10 08 Nov 16 10:20:51 Alpha-Unraid kernel: sd 11:0:15:0: [sdt] Write cache: enabled, read cache: enabled, supports DPO and FUA Nov 16 10:20:51 Alpha-Unraid kernel: sdt: sdt1 Nov 16 10:20:51 Alpha-Unraid kernel: sd 11:0:15:0: [sdt] Attached SCSI disk Nov 16 10:20:52 Alpha-Unraid unassigned.devices: Disk with ID 'ST16000NM000J-2TW103_ZR5AGC4V ()' is not set to auto mount. Nov 16 10:20:53 Alpha-Unraid emhttpd: error: hotplug_devices, 1706: No such file or directory (2): tagged device ST16000NM000J-2TW103_ZR5AGC4V was (sdk) is now (sdt) Nov 16 10:20:53 Alpha-Unraid emhttpd: read SMART /dev/sdt Nov 16 10:20:53 Alpha-Unraid kernel: emhttpd[10420]: segfault at 67c ip 000056528418775f sp 00007ffd34b8b620 error 4 in emhttpd[565284172000+24000] likely on CPU 8 (core 10, socket 0) Nov 16 10:20:53 Alpha-Unraid kernel: Code: c4 36 01 00 48 89 45 f8 48 8d 05 f9 23 01 00 48 89 45 f0 e9 79 01 00 00 8b 45 ec 89 c7 e8 1a 88 ff ff 48 89 45 d8 48 8b 45 d8 <8b> 80 7c 06 00 00 85 c0 0f 94 c0 0f b6 c0 89 45 d4 48 8b 45 e0 48 Nov 16 10:21:01 Alpha-Unraid flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update Quote Link to comment
JorgeB Posted November 16, 2023 Share Posted November 16, 2023 There are some spin down known issues with LSI and Seagate, this is a for a different model disk but see if it helps: Quote Link to comment
mic.88 Posted November 16, 2023 Author Share Posted November 16, 2023 Thanks i'll try this as soon as the parity rebuild is done. I have two IronWolf disks in the array who don't have the issue. When this issue first came up, the disks failing where WesternDigital-disks I replaced the WD-disks with new Exos X18 I'll update the post as soon as its done and tested Quote Link to comment
mic.88 Posted November 18, 2023 Author Share Posted November 18, 2023 The EPC and lowCurrentSpinup setting did not solve the issue I noticed this the last time too but did not tell you here. When the disks goes into error-mode, the web-gui becomes kind of read-only. The Stop-Array buttons does not work and no settings can be changed anymore. The containers and VMs are still up an I can access shares. I have to push the power-button to shutdown and restart the array Very strange alpha-unraid-diagnostics-20231118-1739.zip Quote Link to comment
itimpi Posted November 18, 2023 Share Posted November 18, 2023 The diagnostics show that disk12 dropped offline, and then later reconnected. How is this connected? Have you checked its power and SATA cabling? Quote Link to comment
mic.88 Posted November 18, 2023 Author Share Posted November 18, 2023 I'm using a FANTEC SRC-4240X07, it has a 24 port sas/sata backplane. I really doubt this is a "cable" issue. At first I had the issue with disk 3+4 and now I'm reproducing it with disk11 because there is no data on it. I don't know why disk12 disconnected Quote Link to comment
JorgeB Posted November 19, 2023 Share Posted November 19, 2023 If it works without spinning down the disks suggest doing that for now, or try with a different controller. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.