Disk offline - removed - Parity rebuild slow


aurevo

Recommended Posts

Your 8TB parity disk (sn#ZCT0LHSD) went from Seek_Error_Rate 120587123 on Sun Mar 29 to ===> 133347312 on Thu Apr  9

Your 4TB disk (sn#WFN1DP18) went from Seek_Error_Rate 34284786 on Sun Mar 29 to ===> 41689377 on Thu Apr  9

14 hours ago, aurevo said:

I only get these errors in the array, everything else is working fine?

If the disks work in another machine then replace your power and data cables, otherwise these numbers going up is not good and indicates a need to replace the disk

On 4/1/2020 at 7:07 PM, aurevo said:

I have removed the (possibly) defective 2TB and 4TB hard disks, as they have caused problems in the past.

Why are you trying to add a disk back into the array when you say that it has caused problems in the past?

 

Link to comment
1 hour ago, Dissones4U said:

Your 8TB parity disk (sn#ZCT0LHSD) went from Seek_Error_Rate 120587123 on Sun Mar 29 to ===> 133347312 on Thu Apr  9

Your 4TB disk (sn#WFN1DP18) went from Seek_Error_Rate 34284786 on Sun Mar 29 to ===> 41689377 on Thu Apr  9

Seagate drives use a multibit value for that (and a few others) attribute, you can't just look at the RAW value totals:

https://forums.unraid.net/topic/86337-are-my-smart-reports-bad/?do=findComment&comment=800888

 

  • Thanks 1
Link to comment
2 hours ago, johnnie.black said:

you can't just look at the RAW value totals:

I knew that was true for the raw read errors but I didn't realize that it applied to all of the attributes, mainly because I didn't understand why it held true for the read errors attribute. Anyway, understanding now that they use a multibit value for all of the attributes helps, thanks. I plugged those into google and they both converted to 0, seems counter-intuitive that they would appear to go up rather than appearing static, but now I know.

 

Edited by Dissones4U
Link to comment
6 hours ago, johnnie.black said:

Swap cables (both) or slot with another disk and try again.

 

I switched cables and power supply in the past before.

 

6 hours ago, Dissones4U said:

Your 8TB parity disk (sn#ZCT0LHSD) went from Seek_Error_Rate 120587123 on Sun Mar 29 to ===> 133347312 on Thu Apr  9

Your 4TB disk (sn#WFN1DP18) went from Seek_Error_Rate 34284786 on Sun Mar 29 to ===> 41689377 on Thu Apr  9

If the disks work in another machine then replace your power and data cables, otherwise these numbers going up is not good and indicates a need to replace the disk

Why are you trying to add a disk back into the array when you say that it has caused problems in the past?

 

 

I meant the errors on the Main page under Errors. There the number of errors exploded after starting the array and rose to over one million, not the values at the smart screen.

 

My parity disk is the newest disk from all and had no errors in the past.

 

I tried it again because the disk had no errors in any smart check, extended smart check and I was able to copy all data to another disk without any error mounted via UD outside of array. 

Edited by aurevo
Link to comment
3 hours ago, aurevo said:

I tried it again because the disk had no errors in any smart check, extended smart check and I was able to copy all data to another disk without any error mounted via UD outside of array. 

3 hours ago, aurevo said:

However, I was able to copy all the data until just before the end (because the disk I wanted to clone to was too small), and for the second time, and copy all the data that was on the 4TB disk to the array. <=== This isn't super clear but it sounds like you eventually succeeded in copying all of the data

If I understand you correctly (honestly, the thread was tough to follow) :

  • The 4TB disk in question is the only disk in the array throwing errors at this time
  • The same 4TB disk seems to work perfectly when outside of the array
  • You want to add it back into the array but it begins to error again

Assuming that everything above is correct, there are only a few points of possible failure and they all sit between the disk and the mobo:

  • Data and / or power cables
  • Data port
  • Power supply (possibly the breakout if not all disks involved) or you're splitting power from one point to too many disks

If there is more than one disk involved, then the point of failure is tougher to troubleshoot:

  • It could be the controller
  • It could be RAM
  • It could be the power supply
  • It could be the CPU

Only change one thing at a time between reboots and make sure that your syslog server is set up so that it is persistent.

Link to comment
  • 4 weeks later...
On 4/10/2020 at 12:39 AM, Dissones4U said:

If I understand you correctly (honestly, the thread was tough to follow) :

  • The 4TB disk in question is the only disk in the array throwing errors at this time
  • The same 4TB disk seems to work perfectly when outside of the array
  • You want to add it back into the array but it begins to error again

Assuming that everything above is correct, there are only a few points of possible failure and they all sit between the disk and the mobo:

  • Data and / or power cables
  • Data port
  • Power supply (possibly the breakout if not all disks involved) or you're splitting power from one point to too many disks

If there is more than one disk involved, then the point of failure is tougher to troubleshoot:

  • It could be the controller
  • It could be RAM
  • It could be the power supply
  • It could be the CPU

Only change one thing at a time between reboots and make sure that your syslog server is set up so that it is persistent.

 

Hello, it's me again.

 

After I thought that I had removed the broken 2TB and 4TB hard disk, I created a new config without these disks.


After a few days without problems I bought a new harddisk (HGST_HUS726060ALE614_W2503880) and installed it and had the system rebuilt.

The system ran for about 10 days and then the new hard disk failed without warning.

 

Since I couldn't imagine that three hard drives failed so shortly after each other, I removed the new hard drive (6TB HGST) from the array, installed another new hard drive (warranty replacement of the 4TB ST4000DM004, which was allegedly broken before), changed the power cable and connected the hard drive onboard instead of the SAS controller. Also got the power from another cable, instead of the shared cable before.

 

Unfortunately the speed of the rebuild was then very slow.

After that I started a rebuild of the array without the new hard disk (4TB) and the speed is still absolutely slow.

 

Power supply was changed a few weeks ago. Also changed power cables and sata cables.

 

The crazy thing is that the hard disk (HGST_HUS726060ALE614_W2503880) was displayed as disabled from one day to the next, but the docker containers and the apps were still able to write to and access this disk.

I can still mount, write, read and do anything else on the disk via UD, but the disk was shown as disabled at that time and the parity check should be done again.

 

Okay, TVheadend and smb file system looked like it worked with correct file dates and Co. but at the moment I am not able to find the correct files with date in Mai etc. Files that lookes available yesterday in Tvheadend are lost today, also after mounting the HGST 6TB HDD.

 

tower-diagnostics-20200505-0259.zip

Edited by aurevo
Link to comment
On 5/5/2020 at 4:16 AM, trurl said:

Your syslog is being spammed with these:


May  5 02:53:19 Tower root: error: /plugins/unassigned.devices/UnassignedDevices.php: wrong csrf_token

Here is the FAQ about that:

 

 

Fixed that.

 

New diags attached.

 

Should be this error with slow rates:

May 5 12:07:33 Tower kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) May 5 12:07:33 Tower kernel: sd 1:0:0:0: Power-on or device reset occurred

 

Could you say if this it the SAS-Controller or what supposed to be the problem?

tower-diagnostics-20200506-1301.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.