yuelpl Posted January 11 Share Posted January 11 (edited) Disk array in the disk cabinet randomly drops disks during verification. However, the disk checks out fine, and the array starts normally after a reboot. Disks 9 and 11 are in an external disk cabinet. Even when I use a 'new configuration', these two disks randomly drop out during the synchronization process, their temperature cannot be measured, and read errors continue to accumulate. After the error occurs, I stop the array and then reboot. The faulty disks can automatically mount and the array starts, everything appears normal. The disks have been checked and no issues were found. I attempted to start the array and automatically sync three times. Each time, these two disks had issues, whereas the other disks not in the external disk cabinet did not encounter any problems. My server is a Gen8 ml310e v2, with a P222 serving as the HBA card connecting the internal hard drives and the external disk cabinet. I'm seeking advice on where the problem might be occurring. Thank you. unraid-diagnostics-20240111-1051.zip Edited January 11 by yuelpl Quote Link to comment
trurl Posted January 11 Share Posted January 11 Attach Diagnostics to your NEXT post in this thread. Quote Link to comment
yuelpl Posted January 11 Author Share Posted January 11 thank you unraid-diagnostics-20240111-1051.zip Quote Link to comment
trurl Posted January 11 Share Posted January 11 I don't see any I/O errors logged during the parity sync Jan 11 09:44:18 UNRAID kernel: md: recovery thread: recon P ... Jan 11 09:44:21 UNRAID tips.and.tweaks: Tweaks Applied Jan 11 09:44:21 UNRAID sudo: root : PWD=/ ; USER=root ; COMMAND=/bin/bash -c '/usr/local/emhttp/plugins/unbalance/unbalance -port 6237' Jan 11 09:44:21 UNRAID sudo: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0) Jan 11 09:44:26 UNRAID kernel: eth0: renamed from veth32e14d3 Jan 11 09:44:26 UNRAID kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethd9736a9: link becomes ready Jan 11 09:44:26 UNRAID kernel: docker0: port 1(vethd9736a9) entered blocking state Jan 11 09:44:26 UNRAID kernel: docker0: port 1(vethd9736a9) entered forwarding state Jan 11 09:44:26 UNRAID kernel: mdcmd (37): nocheck cancel Jan 11 09:44:26 UNRAID kernel: md: recovery thread: exit status: -4 DId you cancel it or did it just stop? How is the external cabinet powered? Quote Link to comment
yuelpl Posted January 11 Author Share Posted January 11 After the error, the read error count keeps accumulating, and many of my services become inaccessible. Therefore, I clicked 'Cancel' to stop the verification and restarted the array after rebooting, which restored normal operation of services like Docker. During a previous attempt, I also tried to directly stop the array, but the UI froze, and I ultimately had to resort to a hard power reset. The external disk cabinet has its own power supply, model: Sea Sonic 350W SS-350M1U, which is synchronized with the main server for power supply and power-off through a UPS.This power supply has been in use for less than 2 years. Quote Link to comment
yuelpl Posted January 11 Author Share Posted January 11 The cable is connected from the hard drive backplane through an 8087 to 8088 cable to the P222 on the Gen8. Quote Link to comment
yuelpl Posted January 11 Author Share Posted January 11 Could it be that because I rebooted the server, the previous error diagnostics were lost? Quote Link to comment
trurl Posted January 11 Share Posted January 11 Diagnostics can only tell about how things are since reboot. Setup syslog server. Quote Link to comment
yuelpl Posted January 11 Author Share Posted January 11 (edited) I setup local syslog server to "Enable",Is it possible to view the logs from the past few days after a reboot? If so, I will reboot the server now. Edited January 11 by yuelpl Quote Link to comment
yuelpl Posted January 11 Author Share Posted January 11 If that's not possible, do I only have the option to start the disk verification again and wait for the issue to occur, then download the diagnostic logs before rebooting? Quote Link to comment
trurl Posted January 11 Share Posted January 11 syslog is in RAM just like the rest of the OS. Unless you have syslog server setup to store it somewhere it is gone. Better post a screenshot of your syslog server setup, it can be confusing. Quote Link to comment
yuelpl Posted January 11 Author Share Posted January 11 I have about 500GB of space left in my cache. Will doing this cause the cache to fill up and lead to Docker running abnormally? Quote Link to comment
yuelpl Posted January 11 Author Share Posted January 11 if the settings are correct? If so, I will start the disk verification again and promptly download the diagnostic logs if there is an error. Quote Link to comment
trurl Posted January 11 Share Posted January 11 5 minutes ago, yuelpl said: Screenshot as follows. Can't read that. Are you logging to flash drive? Or what do you have set as the remote server? And diagnostics will not include the syslog from syslog server you have to get that from where it is stored and post it. Quote Link to comment
yuelpl Posted January 11 Author Share Posted January 11 Sorry, I have taken the screenshot again. I have set the logs to output to my cache disk. Please check if this setup is correct. Quote Link to comment
yuelpl Posted January 11 Author Share Posted January 11 Should I directly reboot next, or start a new verification and wait for the error to occur? Quote Link to comment
trurl Posted January 11 Share Posted January 11 6 minutes ago, yuelpl said: the screenshot again You have to tell it what remote server to log to. Put the IP address of your server to get it to send the log to itself. Quote Link to comment
yuelpl Posted January 11 Author Share Posted January 11 Is that so? Then, should I reboot the server and wait for downloadable content to appear in the folder? Quote Link to comment
trurl Posted January 11 Share Posted January 11 wait for the error to occur then get it, zip it, and post it. Quote Link to comment
yuelpl Posted January 11 Author Share Posted January 11 OK, I will start a new verification and update this topic when the error occurs. Thank you Quote Link to comment
yuelpl Posted January 11 Author Share Posted January 11 (edited) It happen again😵 unraid-diagnostics-20240111-1544.zip syslog-10.0.0.10.zip Edited January 11 by yuelpl Quote Link to comment
yuelpl Posted January 11 Author Share Posted January 11 (edited) Now I have paused the validation.Click the arrow in front of the disk to view its contents; it shows 'Invalid Path'. Strangely, it appears to be readable on the UI. After stopping the array, the disk shows as missing. Either the 9th or the 11th disk always have read error on one of them. In 5 verification attempts, they have never both experienced read errors at the same time. Once rebooted, the disk list appears to be normal, but I have to manually stop the verification to prevent the 9th and 11th disks from experiencing read errors again. Diagnostic logs and syslog have been uploaded above for your review. Thank you. Edited January 11 by yuelpl Quote Link to comment
JorgeB Posted January 11 Share Posted January 11 Disk is dropping offline, this is most often a power/connection issue, try replacing cables or connecting that disk to a different controller. Quote Link to comment
yuelpl Posted January 11 Author Share Posted January 11 9 minutes ago, JorgeB said: Disk is dropping offline, this is most often a power/connection issue, try replacing cables or connecting that disk to a different controller. I have purchased a new cable and power supply, and I will replace them once it arrives Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.