Disk offline - removed - Parity rebuild slow

JorgeB · April 9, 2020

Swap cables (both) or slot with another disk and try again.

Dissones4U · April 9, 2020

Your 8TB parity disk (sn#ZCT0LHSD) went from Seek_Error_Rate 120587123 on Sun Mar 29 to ===> 133347312 on Thu Apr 9

Your 4TB disk (sn#WFN1DP18) went from Seek_Error_Rate 34284786 on Sun Mar 29 to ===> 41689377 on Thu Apr 9

14 hours ago, aurevo said:

I only get these errors in the array, everything else is working fine?

If the disks work in another machine then replace your power and data cables, otherwise these numbers going up is not good and indicates a need to replace the disk

On 4/1/2020 at 7:07 PM, aurevo said:

I have removed the (possibly) defective 2TB and 4TB hard disks, as they have caused problems in the past.

Why are you trying to add a disk back into the array when you say that it has caused problems in the past?

JorgeB · April 9, 2020

1 hour ago, Dissones4U said:

Your 8TB parity disk (sn#ZCT0LHSD) went from Seek_Error_Rate 120587123 on Sun Mar 29 to ===> 133347312 on Thu Apr 9

Your 4TB disk (sn#WFN1DP18) went from Seek_Error_Rate 34284786 on Sun Mar 29 to ===> 41689377 on Thu Apr 9

Seagate drives use a multibit value for that (and a few others) attribute, you can't just look at the RAW value totals:

https://forums.unraid.net/topic/86337-are-my-smart-reports-bad/?do=findComment&comment=800888

Dissones4U · April 9, 2020

2 hours ago, johnnie.black said:

you can't just look at the RAW value totals:

I knew that was true for the raw read errors but I didn't realize that it applied to all of the attributes, mainly because I didn't understand why it held true for the read errors attribute. Anyway, understanding now that they use a multibit value for all of the attributes helps, thanks. I plugged those into google and they both converted to 0, seems counter-intuitive that they would appear to go up rather than appearing static, but now I know.

Edited April 9, 2020 by Dissones4U

aurevo · April 9, 2020

6 hours ago, johnnie.black said:

Swap cables (both) or slot with another disk and try again.

I switched cables and power supply in the past before.

6 hours ago, Dissones4U said:

Your 8TB parity disk (sn#ZCT0LHSD) went from Seek_Error_Rate 120587123 on Sun Mar 29 to ===> 133347312 on Thu Apr 9

Your 4TB disk (sn#WFN1DP18) went from Seek_Error_Rate 34284786 on Sun Mar 29 to ===> 41689377 on Thu Apr 9

If the disks work in another machine then replace your power and data cables, otherwise these numbers going up is not good and indicates a need to replace the disk

Why are you trying to add a disk back into the array when you say that it has caused problems in the past?

I meant the errors on the Main page under Errors. There the number of errors exploded after starting the array and rose to over one million, not the values at the smart screen.

My parity disk is the newest disk from all and had no errors in the past.

I tried it again because the disk had no errors in any smart check, extended smart check and I was able to copy all data to another disk without any error mounted via UD outside of array.

Edited April 9, 2020 by aurevo

Dissones4U · April 9, 2020

3 hours ago, aurevo said:

I tried it again because the disk had no errors in any smart check, extended smart check and I was able to copy all data to another disk without any error mounted via UD outside of array.

3 hours ago, aurevo said:

However, I was able to copy all the data until just before the end (because the disk I wanted to clone to was too small), and for the second time, and copy all the data that was on the 4TB disk to the array. <=== This isn't super clear but it sounds like you eventually succeeded in copying all of the data

If I understand you correctly (honestly, the thread was tough to follow) :

The 4TB disk in question is the only disk in the array throwing errors at this time
The same 4TB disk seems to work perfectly when outside of the array
You want to add it back into the array but it begins to error again

Assuming that everything above is correct, there are only a few points of possible failure and they all sit between the disk and the mobo:

Data and / or power cables
Data port
Power supply (possibly the breakout if not all disks involved) or you're splitting power from one point to too many disks

If there is more than one disk involved, then the point of failure is tougher to troubleshoot:

It could be the controller
It could be RAM
It could be the power supply
It could be the CPU

Only change one thing at a time between reboots and make sure that your syslog server is set up so that it is persistent.

aurevo · May 5, 2020

On 4/10/2020 at 12:39 AM, Dissones4U said:

If I understand you correctly (honestly, the thread was tough to follow) :

The 4TB disk in question is the only disk in the array throwing errors at this time

The same 4TB disk seems to work perfectly when outside of the array

You want to add it back into the array but it begins to error again

Assuming that everything above is correct, there are only a few points of possible failure and they all sit between the disk and the mobo:

Data and / or power cables

Data port

Power supply (possibly the breakout if not all disks involved) or you're splitting power from one point to too many disks

If there is more than one disk involved, then the point of failure is tougher to troubleshoot:

It could be the controller

It could be RAM

It could be the power supply

It could be the CPU

Only change one thing at a time between reboots and make sure that your syslog server is set up so that it is persistent.

Hello, it's me again.

After I thought that I had removed the broken 2TB and 4TB hard disk, I created a new config without these disks.

After a few days without problems I bought a new harddisk (HGST_HUS726060ALE614_W2503880) and installed it and had the system rebuilt.

The system ran for about 10 days and then the new hard disk failed without warning.

Since I couldn't imagine that three hard drives failed so shortly after each other, I removed the new hard drive (6TB HGST) from the array, installed another new hard drive (warranty replacement of the 4TB ST4000DM004, which was allegedly broken before), changed the power cable and connected the hard drive onboard instead of the SAS controller. Also got the power from another cable, instead of the shared cable before.

Unfortunately the speed of the rebuild was then very slow.

After that I started a rebuild of the array without the new hard disk (4TB) and the speed is still absolutely slow.

Power supply was changed a few weeks ago. Also changed power cables and sata cables.

The crazy thing is that the hard disk (HGST_HUS726060ALE614_W2503880) was displayed as disabled from one day to the next, but the docker containers and the apps were still able to write to and access this disk.

I can still mount, write, read and do anything else on the disk via UD, but the disk was shown as disabled at that time and the parity check should be done again.

Okay, TVheadend and smb file system looked like it worked with correct file dates and Co. but at the moment I am not able to find the correct files with date in Mai etc. Files that lookes available yesterday in Tvheadend are lost today, also after mounting the HGST 6TB HDD.

tower-diagnostics-20200505-0259.zip

Edited May 5, 2020 by aurevo

trurl · May 5, 2020

Your syslog is being spammed with these:

May  5 02:53:19 Tower root: error: /plugins/unassigned.devices/UnassignedDevices.php: wrong csrf_token

Here is the FAQ about that:

aurevo · May 6, 2020

On 5/5/2020 at 4:16 AM, trurl said:
Your syslog is being spammed with these:
May  5 02:53:19 Tower root: error: /plugins/unassigned.devices/UnassignedDevices.php: wrong csrf_token
Here is the FAQ about that:

Fixed that.

New diags attached.

Should be this error with slow rates:

May 5 12:07:33 Tower kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) May 5 12:07:33 Tower kernel: sd 1:0:0:0: Power-on or device reset occurred

Could you say if this it the SAS-Controller or what supposed to be the problem?

tower-diagnostics-20200506-1301.zip

JorgeB · May 6, 2020

Replace/swap both cables (power + SATA) on disk7 and try again.

Disk offline - removed - Parity rebuild slow

Recommended Posts

JorgeB

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

JorgeB

Posted Images

Dissones4U

Link to comment

JorgeB

Link to comment

Dissones4U

Link to comment

aurevo

Link to comment

Dissones4U

Link to comment

aurevo

Link to comment

trurl

Link to comment

aurevo

Link to comment

JorgeB

Link to comment

Join the conversation

JorgeB