January 11, 20242 yr Disk array in the disk cabinet randomly drops disks during verification. However, the disk checks out fine, and the array starts normally after a reboot. Disks 9 and 11 are in an external disk cabinet. Even when I use a 'new configuration', these two disks randomly drop out during the synchronization process, their temperature cannot be measured, and read errors continue to accumulate. After the error occurs, I stop the array and then reboot. The faulty disks can automatically mount and the array starts, everything appears normal. The disks have been checked and no issues were found. I attempted to start the array and automatically sync three times. Each time, these two disks had issues, whereas the other disks not in the external disk cabinet did not encounter any problems. My server is a Gen8 ml310e v2, with a P222 serving as the HBA card connecting the internal hard drives and the external disk cabinet. I'm seeking advice on where the problem might be occurring. Thank you. unraid-diagnostics-20240111-1051.zip Edited January 11, 20242 yr by yuelpl
January 11, 20242 yr Community Expert I don't see any I/O errors logged during the parity sync Jan 11 09:44:18 UNRAID kernel: md: recovery thread: recon P ... Jan 11 09:44:21 UNRAID tips.and.tweaks: Tweaks Applied Jan 11 09:44:21 UNRAID sudo: root : PWD=/ ; USER=root ; COMMAND=/bin/bash -c '/usr/local/emhttp/plugins/unbalance/unbalance -port 6237' Jan 11 09:44:21 UNRAID sudo: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0) Jan 11 09:44:26 UNRAID kernel: eth0: renamed from veth32e14d3 Jan 11 09:44:26 UNRAID kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethd9736a9: link becomes ready Jan 11 09:44:26 UNRAID kernel: docker0: port 1(vethd9736a9) entered blocking state Jan 11 09:44:26 UNRAID kernel: docker0: port 1(vethd9736a9) entered forwarding state Jan 11 09:44:26 UNRAID kernel: mdcmd (37): nocheck cancel Jan 11 09:44:26 UNRAID kernel: md: recovery thread: exit status: -4 DId you cancel it or did it just stop? How is the external cabinet powered?
January 11, 20242 yr Author After the error, the read error count keeps accumulating, and many of my services become inaccessible. Therefore, I clicked 'Cancel' to stop the verification and restarted the array after rebooting, which restored normal operation of services like Docker. During a previous attempt, I also tried to directly stop the array, but the UI froze, and I ultimately had to resort to a hard power reset. The external disk cabinet has its own power supply, model: Sea Sonic 350W SS-350M1U, which is synchronized with the main server for power supply and power-off through a UPS.This power supply has been in use for less than 2 years.
January 11, 20242 yr Author The cable is connected from the hard drive backplane through an 8087 to 8088 cable to the P222 on the Gen8.
January 11, 20242 yr Author Could it be that because I rebooted the server, the previous error diagnostics were lost?
January 11, 20242 yr Community Expert Diagnostics can only tell about how things are since reboot. Setup syslog server.
January 11, 20242 yr Author I setup local syslog server to "Enable",Is it possible to view the logs from the past few days after a reboot? If so, I will reboot the server now. Edited January 11, 20242 yr by yuelpl
January 11, 20242 yr Author If that's not possible, do I only have the option to start the disk verification again and wait for the issue to occur, then download the diagnostic logs before rebooting?
January 11, 20242 yr Community Expert syslog is in RAM just like the rest of the OS. Unless you have syslog server setup to store it somewhere it is gone. Better post a screenshot of your syslog server setup, it can be confusing.
January 11, 20242 yr Author I have about 500GB of space left in my cache. Will doing this cause the cache to fill up and lead to Docker running abnormally?
January 11, 20242 yr Author if the settings are correct? If so, I will start the disk verification again and promptly download the diagnostic logs if there is an error.
January 11, 20242 yr Community Expert 5 minutes ago, yuelpl said: Screenshot as follows. Can't read that. Are you logging to flash drive? Or what do you have set as the remote server? And diagnostics will not include the syslog from syslog server you have to get that from where it is stored and post it.
January 11, 20242 yr Author Sorry, I have taken the screenshot again. I have set the logs to output to my cache disk. Please check if this setup is correct.
January 11, 20242 yr Author Should I directly reboot next, or start a new verification and wait for the error to occur?
January 11, 20242 yr Community Expert 6 minutes ago, yuelpl said: the screenshot again You have to tell it what remote server to log to. Put the IP address of your server to get it to send the log to itself.
January 11, 20242 yr Author Is that so? Then, should I reboot the server and wait for downloadable content to appear in the folder?
January 11, 20242 yr Author OK, I will start a new verification and update this topic when the error occurs. Thank you
January 11, 20242 yr Author It happen again😵 unraid-diagnostics-20240111-1544.zip syslog-10.0.0.10.zip Edited January 11, 20242 yr by yuelpl
January 11, 20242 yr Author Now I have paused the validation.Click the arrow in front of the disk to view its contents; it shows 'Invalid Path'. Strangely, it appears to be readable on the UI. After stopping the array, the disk shows as missing. Either the 9th or the 11th disk always have read error on one of them. In 5 verification attempts, they have never both experienced read errors at the same time. Once rebooted, the disk list appears to be normal, but I have to manually stop the verification to prevent the 9th and 11th disks from experiencing read errors again. Diagnostic logs and syslog have been uploaded above for your review. Thank you. Edited January 11, 20242 yr by yuelpl
January 11, 20242 yr Community Expert Disk is dropping offline, this is most often a power/connection issue, try replacing cables or connecting that disk to a different controller.
January 11, 20242 yr Author 9 minutes ago, JorgeB said: Disk is dropping offline, this is most often a power/connection issue, try replacing cables or connecting that disk to a different controller. I have purchased a new cable and power supply, and I will replace them once it arrives
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.