trumpets Posted August 2, 2020 Share Posted August 2, 2020 We setup a new unraid system using a repurposed motherboard with 2 new Seagate Ironwolf drives. After the setup was done, we started to copy data from a Windows Server to Unraid. After the first drive was copied, we mounted a old 2nd data drive as Disk 2 and after formating, started to copy again. However during the middle of the copying, the system seems to have crashed and the original Disk 1 was empty with the Unmountable message. Making a rookie mistake, we just formatted Disk 1 again and started copying again. But since the copying over the network was slow, we decided to mount the drive as an Unassigned drive and copy from an ssh terminal and using Midnight Commander. The copying started out fine, copying them one directory at a time until we started copying a directory which has about 130gb of data. We left the system to copy and when I came back I saw the eror screen below After aborting, I came to the copying screen that indicated a segmentation fault error. So I tried to stop the array but it doesn't seem to work. I tried to reboot but even though I was kicked out of the ssh session, I logged back in with the system still running. It has not actually rebooted. The load on the system is currently still at 3.00. We cannot do a hard reset since we are accessing it remotely while copying. Also, we are worrying that if after the reboot, we might again encounter loosing a disk or worse, data. Any advise on what we might be doing wrong? Attached also the diagnostic file. Appreciate any advice. TIA. tower-diagnostics-20200802-1400.zip Quote Link to comment
JorgeB Posted August 2, 2020 Share Posted August 2, 2020 You might need to force a reboot, also lots of ATA errors from disk1, start by replacing cables. Aug 2 13:12:38 Tower kernel: ata2.00: exception Emask 0x50 SAct 0x7e000 SErr 0x4090800 action 0xe frozen Aug 2 13:12:38 Tower kernel: ata2.00: irq_stat 0x00400040, connection status changed Aug 2 13:12:38 Tower kernel: ata2: SError: { HostInt PHYRdyChg 10B8B DevExch } Aug 2 13:12:38 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED Aug 2 13:12:38 Tower kernel: ata2.00: cmd 60/40:68:40:e2:83/05:00:06:00:00/40 tag 13 ncq dma 688128 in Aug 2 13:12:38 Tower kernel: res 40/00:08:a0:f5:3a/00:00:3a:00:00/40 Emask 0x50 (ATA bus error) Aug 2 13:12:38 Tower kernel: ata2.00: status: { DRDY } Aug 2 13:12:38 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED Aug 2 13:12:38 Tower kernel: ata2.00: cmd 60/40:70:80:e7:83/05:00:06:00:00/40 tag 14 ncq dma 688128 in Aug 2 13:12:38 Tower kernel: res 40/00:08:a0:f5:3a/00:00:3a:00:00/40 Emask 0x50 (ATA bus error) Aug 2 13:12:38 Tower kernel: ata2.00: status: { DRDY } Aug 2 13:12:38 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED Aug 2 13:12:38 Tower kernel: ata2.00: cmd 60/40:78:c0:ec:83/05:00:06:00:00/40 tag 15 ncq dma 688128 in Aug 2 13:12:38 Tower kernel: res 40/00:08:a0:f5:3a/00:00:3a:00:00/40 Emask 0x50 (ATA bus error) Aug 2 13:12:38 Tower kernel: ata2.00: status: { DRDY } Aug 2 13:12:38 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED Aug 2 13:12:38 Tower kernel: ata2.00: cmd 60/40:80:00:f2:83/05:00:06:00:00/40 tag 16 ncq dma 688128 in Aug 2 13:12:38 Tower kernel: res 40/00:08:a0:f5:3a/00:00:3a:00:00/40 Emask 0x50 (ATA bus error) Aug 2 13:12:38 Tower kernel: ata2.00: status: { DRDY } Aug 2 13:12:38 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED Quote Link to comment
trumpets Posted August 2, 2020 Author Share Posted August 2, 2020 Waiting for the cables and give it another go. Thank you. Quote Link to comment
trumpets Posted August 19, 2020 Author Share Posted August 19, 2020 We are under lockdown due to the pandemic so was only able to replace all the cables yesterday. Run smart tests and the drives appear to PASS the test. Now I start copying the data off the unassigned drive to Drive 1 and the logs started to flood again with the kernel: ata2.00 failed command messges Attached also latest diagnoistics file. Thanks in advance. tower-diagnostics-20200802-1400.zip Quote Link to comment
JorgeB Posted August 19, 2020 Share Posted August 19, 2020 Those are CRC errors, usually a bad SATA cable, but could also be the controller or in extremely rare cases the disk itself. Quote Link to comment
trumpets Posted August 19, 2020 Author Share Posted August 19, 2020 I have replaced it with new cables yesterday. I'm beginning to suspecting its the drive. Previous it had 166 UDMA CRC errors. Now, after copying 130g of data, I got a notice: Tower: Unraid Parity disk SMART health [199] Warning [TOWER] - udma crc error count is 2096 ST1000VN002-2EY102_Z9CBWT4L (sdc) Quote Link to comment
JorgeB Posted August 19, 2020 Share Posted August 19, 2020 Still much more likely to be the cable (or port) than the drive, you should try another cable in a different port. Quote Link to comment
trumpets Posted August 19, 2020 Author Share Posted August 19, 2020 ah.. didn't think of the port. Will try that tomorrow. Can I just shut it down and change the ports and turn on again, or are there other steps I need to do as well, like remove it from array, etc? Quote Link to comment
JorgeB Posted August 19, 2020 Share Posted August 19, 2020 6 minutes ago, trumpets said: Can I just shut it down and change the ports and turn on again Just this. Quote Link to comment
trumpets Posted August 29, 2020 Author Share Posted August 29, 2020 As recommended, have moved the drive to a different slot, and also a different cable. The system has so far, been stable and not showing any of the ata errors. Thank you. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.