Replaced failed drive, now Parity seems to have problems


aurevo

Recommended Posts

  • Replies 53
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

9 minutes ago, trurl said:

I see now after reviewing thread.

 

You can assign it back, but don't format if it gives you that option.

 

And you will have to rebuild parity.

 

Since I have to restore parity in this case in any case, would it be possible to go directly to dual parity with two new 8TB hard disks or is there something in the way of this option?

Link to comment
47 minutes ago, aurevo said:

I changed the SATA cable and connected the HDD to the onboard controller instead of the other one.

 

Does this looks better in logs or still errors? Should this errors appear in system log or another one?

Looks fine now, no more ATA errors and SMART looks OK.

 

 

38 minutes ago, aurevo said:

But some posts ago I tried to mount the "old previous disk" via Unassigned Devides and that worked. So I think I can assign it back and will have access to the data, correct?

If the old disk is mounting and SMART looks OK you can resync parity (including parity2 at the same time if you want), then copy the data back.

Link to comment
9 minutes ago, JorgeB said:

Looks fine now, no more ATA errors and SMART looks OK.

 

 

If the old disk is mounting and SMART looks OK you can resync parity (including parity2 at the same time if you want), then copy the data back.

 

So for check and double check:


My plan would be to assign all drives as before, changing Disk 3 (the empty new one) against the old one with data on it and changing from single parity to dual parity.

 

Would this be an option?

Link to comment
On 1/29/2024 at 3:50 PM, trurl said:

Correct. And

 

 

So I actually changed all SATA data cables to new ones and started a parity rebuild to dual parity folowing you tips.

 

The rebuild was successful but this night system got unreachable and I hat to restart it unclean.

 

As for now I see several errors in system log. Can you check if this could be a connection problem or are there some possible hardware failures.

 

For now I don't know what I should do next. Maybe change an HDD or change SATA adapter or something else.

And I don't know why the system froze or was unresponsible this night/morning.

backup-diagnostics-20240204-1443.zip syslog

Link to comment

The errors like Feb 4 15:00:14 Backup kernel: ata12.02: mean it is device [12:0:0:0]disk ATA HGST HUS726060AL WD05 /dev/sdi 6.00TB under system devices, correct?

 

So only one HDD throughs this error messages at the moment right?

Link to comment
On 1/29/2024 at 7:34 AM, JorgeB said:

Still having ATA errors, note that using a Marvell controller and a controller with SATA port multipliers is not recommended, especially both together.

Looks like this controller is causing problems for disks 1 and 2.

 

Link to comment
On 2/5/2024 at 2:02 AM, trurl said:

Looks like this controller is causing problems for disks 1 and 2.

 

 

I changed the adapter to a crossflashed D2607-A21.

After a few hours one of the parity disks had a red cross for defect/missing so I changed it today against a new HDD.

 

I started parity rebuild, but some moments ago Disk 1 had the same red X, so I shutdown device to look forward what to do next. It's so annoying.

backup-diagnostics-20240212-1704.zip

Link to comment
On 2/12/2024 at 6:02 PM, JorgeB said:

Though in the syslog it still looks more like a power/connection issue disk1 may be failing, run an extended SMART test on that disk.

 

 

 

Changed power cable to another string from PSU and changed SATA cable from D2607 to onboard controller.

 

Some time after starting extended SMART test, the system hang and became unavailable.

 

Does it looks like defect HDD or still cable or power? An what could be the reason for the whole system to hang?

backup-diagnostics-20240213-1813.zip syslog syslog-previous

Link to comment
9 minutes ago, JorgeB said:

This happens if something else accesses the disk, try again.

 

Tried it several times. Same error each time.

 

Feb 14 17:14:18 Backup kernel: ata1: link is slow to respond, please be patient (ready=0) Feb 14 17:14:22 Backup kernel: ata1: softreset failed (device not ready) Feb 14 17:14:23 Backup kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Feb 14 17:14:23 Backup kernel: ata1.00: configured for UDMA/133

 

Also after a reboot, array stopped and disk not in slot.

Link to comment
2 minutes ago, aurevo said:

Feb 14 17:14:18 Backup kernel: ata1: link is slow to respond, please be patient (ready=0) Feb 14 17:14:22 Backup kernel: ata1: softreset failed (device not ready) Feb 14 17:14:23 Backup kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Feb 14 17:14:23 Backup kernel: ata1.00: configured for UDMA/133

These can also interrupt the test, replace/swap cables and try again.

Link to comment
3 hours ago, JorgeB said:

These can also interrupt the test, replace/swap cables and try again.

 

Changed SATA cable to one from crossflashed adapter and used another power cable too.

 

SMART check was on 20% as system got partially unavailable again.

 

Webinterface is unavailable, ping is possible but no connection possible via SSH, stuck at prompt.

2024-02-14 21_01_49-10.10.10.21 - PuTTY.png

syslog-10.10.10.21.log

Link to comment
3 hours ago, JorgeB said:

If the server is hanging/crashing you will need to try and fix that first, extremely unlikely that a SMART test is crashing the server.

 

Are there any hints in the logs as to what could be causing the crash or the system hang?

 

The server ran for months without any problems or dropouts, I just installed the same components in a new case and replaced the hard disks.

 

In the course of this I only updated UnRAID to the latest version, but at least I had no problems with this on my other system.

Link to comment
On 2/15/2024 at 6:22 PM, trurl said:

Have you done memtest?

 

Yes, image attached was the third run. Did the automatic starting run twice and than choose every test that was possible to choose and let it run.

 

Image is from yesterday. It run longer until now and still no errors.

photo_2024-02-18_23-39-46.jpg

Link to comment
On 2/15/2024 at 1:08 PM, JorgeB said:

Nothing obvious that I can see, a couple of smartctl segfaults and some ATA errors, if you leave the server idle without doing anything does it still crash?

 

Yes, restarted the server more than one time after hanging and after restart without doing anything else it freezes again.

 

Interesting is, that in the meantime of the memtest nothing everything was okay. But maybe that was coincidence.

syslog-10.10.10.21.log

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.