JorgeB Posted April 9, 2020 Share Posted April 9, 2020 Swap cables (both) or slot with another disk and try again. Quote Link to comment
Dissones4U Posted April 9, 2020 Share Posted April 9, 2020 Your 8TB parity disk (sn#ZCT0LHSD) went from Seek_Error_Rate 120587123 on Sun Mar 29 to ===> 133347312 on Thu Apr 9 Your 4TB disk (sn#WFN1DP18) went from Seek_Error_Rate 34284786 on Sun Mar 29 to ===> 41689377 on Thu Apr 9 14 hours ago, aurevo said: I only get these errors in the array, everything else is working fine? If the disks work in another machine then replace your power and data cables, otherwise these numbers going up is not good and indicates a need to replace the disk On 4/1/2020 at 7:07 PM, aurevo said: I have removed the (possibly) defective 2TB and 4TB hard disks, as they have caused problems in the past. Why are you trying to add a disk back into the array when you say that it has caused problems in the past? Quote Link to comment
JorgeB Posted April 9, 2020 Share Posted April 9, 2020 1 hour ago, Dissones4U said: Your 8TB parity disk (sn#ZCT0LHSD) went from Seek_Error_Rate 120587123 on Sun Mar 29 to ===> 133347312 on Thu Apr 9 Your 4TB disk (sn#WFN1DP18) went from Seek_Error_Rate 34284786 on Sun Mar 29 to ===> 41689377 on Thu Apr 9 Seagate drives use a multibit value for that (and a few others) attribute, you can't just look at the RAW value totals: https://forums.unraid.net/topic/86337-are-my-smart-reports-bad/?do=findComment&comment=800888 1 Quote Link to comment
Dissones4U Posted April 9, 2020 Share Posted April 9, 2020 (edited) 2 hours ago, johnnie.black said: you can't just look at the RAW value totals: I knew that was true for the raw read errors but I didn't realize that it applied to all of the attributes, mainly because I didn't understand why it held true for the read errors attribute. Anyway, understanding now that they use a multibit value for all of the attributes helps, thanks. I plugged those into google and they both converted to 0, seems counter-intuitive that they would appear to go up rather than appearing static, but now I know. Edited April 9, 2020 by Dissones4U Quote Link to comment
aurevo Posted April 9, 2020 Author Share Posted April 9, 2020 (edited) 6 hours ago, johnnie.black said: Swap cables (both) or slot with another disk and try again. I switched cables and power supply in the past before. 6 hours ago, Dissones4U said: Your 8TB parity disk (sn#ZCT0LHSD) went from Seek_Error_Rate 120587123 on Sun Mar 29 to ===> 133347312 on Thu Apr 9 Your 4TB disk (sn#WFN1DP18) went from Seek_Error_Rate 34284786 on Sun Mar 29 to ===> 41689377 on Thu Apr 9 If the disks work in another machine then replace your power and data cables, otherwise these numbers going up is not good and indicates a need to replace the disk Why are you trying to add a disk back into the array when you say that it has caused problems in the past? I meant the errors on the Main page under Errors. There the number of errors exploded after starting the array and rose to over one million, not the values at the smart screen. My parity disk is the newest disk from all and had no errors in the past. I tried it again because the disk had no errors in any smart check, extended smart check and I was able to copy all data to another disk without any error mounted via UD outside of array. Edited April 9, 2020 by aurevo Quote Link to comment
Dissones4U Posted April 9, 2020 Share Posted April 9, 2020 3 hours ago, aurevo said: I tried it again because the disk had no errors in any smart check, extended smart check and I was able to copy all data to another disk without any error mounted via UD outside of array. 3 hours ago, aurevo said: However, I was able to copy all the data until just before the end (because the disk I wanted to clone to was too small), and for the second time, and copy all the data that was on the 4TB disk to the array. <=== This isn't super clear but it sounds like you eventually succeeded in copying all of the data If I understand you correctly (honestly, the thread was tough to follow) : The 4TB disk in question is the only disk in the array throwing errors at this time The same 4TB disk seems to work perfectly when outside of the array You want to add it back into the array but it begins to error again Assuming that everything above is correct, there are only a few points of possible failure and they all sit between the disk and the mobo: Data and / or power cables Data port Power supply (possibly the breakout if not all disks involved) or you're splitting power from one point to too many disks If there is more than one disk involved, then the point of failure is tougher to troubleshoot: It could be the controller It could be RAM It could be the power supply It could be the CPU Only change one thing at a time between reboots and make sure that your syslog server is set up so that it is persistent. Quote Link to comment
aurevo Posted May 5, 2020 Author Share Posted May 5, 2020 (edited) On 4/10/2020 at 12:39 AM, Dissones4U said: If I understand you correctly (honestly, the thread was tough to follow) : The 4TB disk in question is the only disk in the array throwing errors at this time The same 4TB disk seems to work perfectly when outside of the array You want to add it back into the array but it begins to error again Assuming that everything above is correct, there are only a few points of possible failure and they all sit between the disk and the mobo: Data and / or power cables Data port Power supply (possibly the breakout if not all disks involved) or you're splitting power from one point to too many disks If there is more than one disk involved, then the point of failure is tougher to troubleshoot: It could be the controller It could be RAM It could be the power supply It could be the CPU Only change one thing at a time between reboots and make sure that your syslog server is set up so that it is persistent. Hello, it's me again. After I thought that I had removed the broken 2TB and 4TB hard disk, I created a new config without these disks. After a few days without problems I bought a new harddisk (HGST_HUS726060ALE614_W2503880) and installed it and had the system rebuilt. The system ran for about 10 days and then the new hard disk failed without warning. Since I couldn't imagine that three hard drives failed so shortly after each other, I removed the new hard drive (6TB HGST) from the array, installed another new hard drive (warranty replacement of the 4TB ST4000DM004, which was allegedly broken before), changed the power cable and connected the hard drive onboard instead of the SAS controller. Also got the power from another cable, instead of the shared cable before. Unfortunately the speed of the rebuild was then very slow. After that I started a rebuild of the array without the new hard disk (4TB) and the speed is still absolutely slow. Power supply was changed a few weeks ago. Also changed power cables and sata cables. The crazy thing is that the hard disk (HGST_HUS726060ALE614_W2503880) was displayed as disabled from one day to the next, but the docker containers and the apps were still able to write to and access this disk. I can still mount, write, read and do anything else on the disk via UD, but the disk was shown as disabled at that time and the parity check should be done again. Okay, TVheadend and smb file system looked like it worked with correct file dates and Co. but at the moment I am not able to find the correct files with date in Mai etc. Files that lookes available yesterday in Tvheadend are lost today, also after mounting the HGST 6TB HDD. tower-diagnostics-20200505-0259.zip Edited May 5, 2020 by aurevo Quote Link to comment
trurl Posted May 5, 2020 Share Posted May 5, 2020 Your syslog is being spammed with these: May 5 02:53:19 Tower root: error: /plugins/unassigned.devices/UnassignedDevices.php: wrong csrf_token Here is the FAQ about that: Quote Link to comment
aurevo Posted May 6, 2020 Author Share Posted May 6, 2020 On 5/5/2020 at 4:16 AM, trurl said: Your syslog is being spammed with these: May 5 02:53:19 Tower root: error: /plugins/unassigned.devices/UnassignedDevices.php: wrong csrf_token Here is the FAQ about that: Fixed that. New diags attached. Should be this error with slow rates: May 5 12:07:33 Tower kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) May 5 12:07:33 Tower kernel: sd 1:0:0:0: Power-on or device reset occurred Could you say if this it the SAS-Controller or what supposed to be the problem? tower-diagnostics-20200506-1301.zip Quote Link to comment
JorgeB Posted May 6, 2020 Share Posted May 6, 2020 Replace/swap both cables (power + SATA) on disk7 and try again. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.