jpimlott Posted September 10, 2021 Share Posted September 10, 2021 System was up and running fine. One failed (drive 2 ) with a drive light stuck on. With in a minute or so the party num 1 dropped out, I removed the bad drive and reset the party and started rebuilding the array. It got to about 10% and drive 3 dropped out. I think it is still good but now i cant rebuild as i have 3 disks missing. I think drive 3 is good i would like to make it good and start again. The array is in that stat right now and have done nothing to it. tower-diagnostics-20210910-0044.zip Quote Link to comment
JorgeB Posted September 10, 2021 Share Posted September 10, 2021 There are issues with multiple disks on multiple controllers: Sep 9 23:07:49 Tower kernel: ata4: hard resetting link Sep 9 23:07:55 Tower kernel: ata4: link is slow to respond, please be patient (ready=0) Sep 9 23:07:59 Tower kernel: ata4: COMRESET failed (errno=-16) Sep 9 23:07:59 Tower kernel: ata4: hard resetting link Sep 9 23:08:01 Tower kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Sep 9 23:08:01 Tower kernel: ata4.00: configured for UDMA/33 Sep 9 23:08:42 Tower kernel: ata8: failed to read log page 10h (errno=-5) Sep 9 23:08:42 Tower kernel: ata8.00: exception Emask 0x1 SAct 0x1000 SErr 0x0 action 0x6 Sep 9 23:08:42 Tower kernel: ata8.00: failed command: READ FPDMA QUEUED Sep 9 23:08:42 Tower kernel: ata8.00: cmd 60/00:00:c8:6a:14/01:00:96:00:00/40 tag 12 ncq dma 131072 in Sep 9 23:08:42 Tower kernel: res 01/04:60:c8:6a:14/00:00:96:00:00/40 Emask 0x3 (HSM violation) Sep 9 23:08:42 Tower kernel: ata8.00: status: { ERR } Sep 9 23:08:42 Tower kernel: ata8.00: error: { ABRT } Sep 9 23:10:34 Tower kernel: ata13.00: exception Emask 0x1 SAct 0x1000 SErr 0x0 action 0x6 Sep 9 23:10:34 Tower kernel: ata13.00: failed command: WRITE FPDMA QUEUED Sep 9 23:10:34 Tower kernel: ata13.00: cmd 61/00:00:f8:39:97/01:00:96:00:00/40 tag 12 ncq dma 131072 out Sep 9 23:10:34 Tower kernel: res 01/04:58:f8:38:97/00:00:96:00:00/40 Emask 0x3 (HSM violation) Sep 9 23:10:34 Tower kernel: ata13.00: status: { ERR } Sep 9 23:10:34 Tower kernel: ata13.00: error: { ABRT } Sep 9 23:10:34 Tower kernel: ata13: hard resetting link Sep 9 23:11:03 Tower kernel: sd 6:0:3:0: [sdi] tag#803 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 cmd_age=0s Sep 9 23:11:03 Tower kernel: sd 6:0:3:0: [sdi] tag#803 CDB: opcode=0x88 88 00 00 00 00 01 93 77 f4 70 00 00 00 a0 00 00 Sep 9 23:11:03 Tower kernel: blk_update_request: I/O error, dev sdi, sector 6769079408 op 0x0:(READ) flags 0x0 phys_seg 20 prio class 0 Two of the controllers are Marvell based and have known issues, but ata4 is the onboard SATA, so there might be a power or connection issue, I would recommend replacing the SASLP/SAS2LP controllers with LSI anyway and then check all connections and/or test with a different PSU. Quote Link to comment
jpimlott Posted September 10, 2021 Author Share Posted September 10, 2021 Thank you for the response Is there a way to make Disk 3 show good again in the system after reboot and wire check ? I think disk 2 has failed. I would like to rebuild it on the missing cache drive Quote Link to comment
trurl Posted September 10, 2021 Share Posted September 10, 2021 30 minutes ago, jpimlott said: I think disk 2 has failed Why? Seems more likely that it just suffered the same connection problems as others. Quote Link to comment
jpimlott Posted September 10, 2021 Author Share Posted September 10, 2021 When drives stopped working the drive light was stuck on, I also tried reseating the drives and removed from the array and restarted and re-added. After doing that disk 2 had real issues in writing. It would go slow then stop then back to med speed. I later tried to just rebuild party 1 and was building fast 140 MBs then disk 3 dropped off. I am copying data off disk 2 now to a Linux machine and so far so good Quote Link to comment
jpimlott Posted September 10, 2021 Author Share Posted September 10, 2021 How do i get the system back up with disk 3 as told it is good and disk 2 as well ? So far i have copied most of the data off of it with no trouble. Quote Link to comment
JorgeB Posted September 10, 2021 Share Posted September 10, 2021 Power down check/replace cables/slot on disk 2 and see if it comes back online. Quote Link to comment
trurl Posted September 10, 2021 Share Posted September 10, 2021 You have apparently done some things (not entirely clear exactly what) since those earlier diagnostics, so attach new diagnostics next post Quote Link to comment
jpimlott Posted September 10, 2021 Author Share Posted September 10, 2021 It is still running when drive 3 dropped out. tower-diagnostics-20210910-1019.zip Quote Link to comment
trurl Posted September 10, 2021 Share Posted September 10, 2021 Where is disk2? Quote Link to comment
jpimlott Posted September 10, 2021 Author Share Posted September 10, 2021 Not in the system as I am copying it to another disk. I tried rebuilding party with out but 3 dropped Quote Link to comment
trurl Posted September 10, 2021 Share Posted September 10, 2021 According to those diagnostics, only missing disk2 is disabled, and emulated disk2 is mounted as are all other disks. Parity1 is invalid because you were rebuilding it, and disk3 problems, probably connection, is interfering with parity1 rebuild. Parity2 will allow you to rebuild the emulated disk2 (and parity1) if you could get your connections fixed. Just to confirm, post a screenshot of Main - Array Devices Quote Link to comment
jpimlott Posted September 10, 2021 Author Share Posted September 10, 2021 Is this a decent replacement card for the controllers ? https://www.newegg.com/p/17Z-010M-00012 Quote Link to comment
trurl Posted September 10, 2021 Share Posted September 10, 2021 Also, your log has filled up. You will have to reboot to fix that, but you will have to shutdown anyway to fix your connection issues. Quote Link to comment
trurl Posted September 10, 2021 Share Posted September 10, 2021 1 minute ago, jpimlott said: Is this a decent replacement card for the controllers ? https://www.newegg.com/p/17Z-010M-00012 Are you just asking about that model, or are you also considering buying from that source? Says ships from China in 5 to 32 days Quote Link to comment
jpimlott Posted September 10, 2021 Author Share Posted September 10, 2021 I redid all the power and sata wiring making sure that each 5in3 3in2 gets power from 2 different home cables. found one questionable port/cable changed to one on the marvel controllers. disk 3 is happy and is rebuilding disk 2 and party 1, at about 95 meg bytes per sec. That slower than normal but assume it is because one party and one disk needs building and the disk is need to generated before the party can be calculated. Quote Link to comment
JorgeB Posted September 11, 2021 Share Posted September 11, 2021 Monitor the log for similar errors to the ones posted above to make sure all is good. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.