[SOLVED] Stuck on stopping array (6.8.3 trial)


Recommended Posts

We setup a new unraid system using a repurposed motherboard with 2 new Seagate Ironwolf drives. After the setup was done, we started to copy data from a Windows Server to Unraid.  After the first drive was copied, we mounted a old 2nd data drive as Disk 2 and after formating, started to copy again.  However during the middle of the copying, the system seems to have crashed and the original Disk 1 was empty with the Unmountable message.  Making a rookie mistake, we just formatted Disk 1 again and started copying again.  But since the copying over the network was slow, we decided to mount the drive as an Unassigned drive and copy from an ssh terminal and using Midnight Commander. 

 

The copying started out fine, copying them one directory at a time until we started copying a directory which has about 130gb of data.  We left the system to copy and when I came back I saw the eror screen below

 

image.png.414964b40a58c00a851f7c65b7a02506.png

After aborting, I came to the copying screen that indicated a segmentation fault error.  

 

image.png.02461c816bbfc5116973f253f0fe0f35.png
 

So I tried to stop the array but it doesn't seem to work.  I tried to reboot but even though I was kicked out of the ssh session, I logged back in with the system still running.  It has not actually rebooted.  The load on the system is currently still at 3.00.

 

image.png.8bfb86b43d770d1a9cbcbc3bf08c9019.png

 

We cannot do a hard reset since we are accessing it remotely while copying.  
Also, we are worrying that if after the reboot, we might again encounter loosing a disk or worse, data.

Any advise on what we might be doing wrong?  

 

Attached also the diagnostic file. 

 

Appreciate any advice.  TIA.

 

tower-diagnostics-20200802-1400.zip

 

 

 

 

Link to comment

You might need to force a reboot, also lots of ATA errors from disk1, start by replacing cables.

 

Aug  2 13:12:38 Tower kernel: ata2.00: exception Emask 0x50 SAct 0x7e000 SErr 0x4090800 action 0xe frozen
Aug  2 13:12:38 Tower kernel: ata2.00: irq_stat 0x00400040, connection status changed
Aug  2 13:12:38 Tower kernel: ata2: SError: { HostInt PHYRdyChg 10B8B DevExch }
Aug  2 13:12:38 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED
Aug  2 13:12:38 Tower kernel: ata2.00: cmd 60/40:68:40:e2:83/05:00:06:00:00/40 tag 13 ncq dma 688128 in
Aug  2 13:12:38 Tower kernel:         res 40/00:08:a0:f5:3a/00:00:3a:00:00/40 Emask 0x50 (ATA bus error)
Aug  2 13:12:38 Tower kernel: ata2.00: status: { DRDY }
Aug  2 13:12:38 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED
Aug  2 13:12:38 Tower kernel: ata2.00: cmd 60/40:70:80:e7:83/05:00:06:00:00/40 tag 14 ncq dma 688128 in
Aug  2 13:12:38 Tower kernel:         res 40/00:08:a0:f5:3a/00:00:3a:00:00/40 Emask 0x50 (ATA bus error)
Aug  2 13:12:38 Tower kernel: ata2.00: status: { DRDY }
Aug  2 13:12:38 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED
Aug  2 13:12:38 Tower kernel: ata2.00: cmd 60/40:78:c0:ec:83/05:00:06:00:00/40 tag 15 ncq dma 688128 in
Aug  2 13:12:38 Tower kernel:         res 40/00:08:a0:f5:3a/00:00:3a:00:00/40 Emask 0x50 (ATA bus error)
Aug  2 13:12:38 Tower kernel: ata2.00: status: { DRDY }
Aug  2 13:12:38 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED
Aug  2 13:12:38 Tower kernel: ata2.00: cmd 60/40:80:00:f2:83/05:00:06:00:00/40 tag 16 ncq dma 688128 in
Aug  2 13:12:38 Tower kernel:         res 40/00:08:a0:f5:3a/00:00:3a:00:00/40 Emask 0x50 (ATA bus error)
Aug  2 13:12:38 Tower kernel: ata2.00: status: { DRDY }
Aug  2 13:12:38 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED

 

 

 

Link to comment
  • 3 weeks later...

We are under lockdown due to the pandemic so was only able to replace all the cables yesterday. 
Run smart tests and the drives appear to PASS the test.


Now I start copying the data off  the unassigned drive to Drive 1 and the logs started to flood again with the kernel: ata2.00 failed command messges

 

image.png.f6094e5a56f2913bdf7f19c74d876c32.png

 

Attached also latest diagnoistics file.

 

Thanks in advance.
 

tower-diagnostics-20200802-1400.zip

Link to comment

I have replaced it with new cables yesterday.  I'm beginning to suspecting its the drive.  Previous it had 166 UDMA CRC errors.  Now, after copying 130g of data, I  got a notice:

Tower: Unraid Parity disk SMART health [199]
Warning [TOWER] - udma crc error count is 2096
ST1000VN002-2EY102_Z9CBWT4L (sdc)

 

Link to comment
  • trumpets changed the title to Stuck on stopping array (6.8.3 trial) (Solved)
  • JorgeB changed the title to [SOLVED] Stuck on stopping array (6.8.3 trial)

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.