Jump to content

Disk array in the disk cabinet randomly drops disks during verification.


Go to solution Solved by yuelpl,

Recommended Posts

Disk array in the disk cabinet randomly drops disks during verification. However, the disk checks out fine, and the array starts normally after a reboot.

Disks 9 and 11 are in an external disk cabinet. Even when I use a 'new configuration', these two disks randomly drop out during the synchronization process, their temperature cannot be measured, and read errors continue to accumulate.

After the error occurs, I stop the array and then reboot. The faulty disks can automatically mount and the array starts, everything appears normal. The disks have been checked and no issues were found.

I attempted to start the array and automatically sync three times. Each time, these two disks had issues, whereas the other disks not in the external disk cabinet did not encounter any problems.

My server is a Gen8 ml310e v2, with a P222 serving as the HBA card connecting the internal hard drives and the external disk cabinet.

I'm seeking advice on where the problem might be occurring.

Thank you.

20240111103829.png

 

unraid-diagnostics-20240111-1051.zip

Edited by yuelpl
Link to comment

I don't see any I/O errors logged during the parity sync

Jan 11 09:44:18 UNRAID kernel: md: recovery thread: recon P ...
Jan 11 09:44:21 UNRAID tips.and.tweaks: Tweaks Applied
Jan 11 09:44:21 UNRAID sudo:     root : PWD=/ ; USER=root ; COMMAND=/bin/bash -c '/usr/local/emhttp/plugins/unbalance/unbalance -port 6237'
Jan 11 09:44:21 UNRAID sudo: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
Jan 11 09:44:26 UNRAID kernel: eth0: renamed from veth32e14d3
Jan 11 09:44:26 UNRAID kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethd9736a9: link becomes ready
Jan 11 09:44:26 UNRAID kernel: docker0: port 1(vethd9736a9) entered blocking state
Jan 11 09:44:26 UNRAID kernel: docker0: port 1(vethd9736a9) entered forwarding state
Jan 11 09:44:26 UNRAID kernel: mdcmd (37): nocheck cancel
Jan 11 09:44:26 UNRAID kernel: md: recovery thread: exit status: -4

DId you cancel it or did it just stop?

 

How is the external cabinet powered?

Link to comment

After the error, the read error count keeps accumulating, and many of my services become inaccessible. Therefore, I clicked 'Cancel' to stop the verification and restarted the array after rebooting, which restored normal operation of services like Docker.

During a previous attempt, I also tried to directly stop the array, but the UI froze, and I ultimately had to resort to a hard power reset.

The external disk cabinet has its own power supply, model: Sea Sonic 350W SS-350M1U, which is synchronized with the main server for power supply and power-off through a UPS.This power supply has been in use for less than 2 years.

Link to comment

Now I have paused the validation.Click the arrow in front of the disk to view its contents; it shows 'Invalid Path'.

Strangely, it appears to be readable on the UI. After stopping the array, the disk shows as missing.

Either the 9th or the 11th disk always have read error on one of them. In 5 verification attempts, they have never both experienced read errors at the same time.

Once rebooted, the disk list appears to be normal, but I have to manually stop the verification to prevent the 9th and 11th disks from experiencing read errors again.

Diagnostic logs and syslog have been uploaded above for your review. Thank you.

3.png

4.png

5.png

6.png

7.png

Edited by yuelpl
Link to comment
9 minutes ago, JorgeB said:

Disk is dropping offline, this is most often a power/connection issue, try replacing cables or connecting that disk to a different controller.

I have purchased a new cable and power supply, and I will replace them once it arrives

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...