Data rebuild stopping

January 25, 20233 yr

Hi everyone, I had a drive fail and i'm trying to replace it. However, data rebuild essentially stops and i've had 4 failed attempts at the rebuild. Each time it stops, always at a different point, it says the rebuild is < 1mbs and the array shows no activity at all on any drive. In this state, the rebuild won't pause or cancel, nor will the system shut down or reboot. Pulling the cable is the only option.

In my last attempt, I was booted in safe mode with the array in maintenance mode and still had the same result.

I read a suggestion on another thread that 'new config' might fix the problem, but considering the failed drive and it being emulated, i would think that would result in data loss.

My diag is attached, and i'm open to any advice.

Thanks!

diagnostics-20230125-1403.zip

Quote

January 25, 20233 yr

Community Expert

9 minutes ago, PZ303 said:

I read a suggestion on another thread that 'new config' might fix the problem, but considering the failed drive and it being emulated, i would think that would result in data loss.

New Config would make it impossible to rebuild. You misunderstood that other thread, or it doesn't apply to your situation, or someone is giving bad advice.

Quote

January 25, 20233 yr

Author

@trurl that's what i thought. The other thread had some more nuance to it where it wasn't a replacement and rebuild, so that's what i thought.

Quote

January 25, 20233 yr

Community Expert

Jan 25 11:55:21 SilentCricket kernel: md: recovery thread: multiple disk errors, sector=5291262368

Disable Autostart in Disk Settings.

Shutdown, check all disk connections, SATA and power, both ends, including splitters.

Unassign disk1, start the array in normal (not maintenance mode) and post new diagnostics.

Quote

January 25, 20233 yr

Author

Thanks for the quick reply, @trurl. This is in a racked array with a backplane and I did a shuffle of which bay the disks live in. Disk 1 is unassigned now and that was the replacement disk. The previous occupant of disk 1 threw tons of errors.

New diag attached.

diagnostics-20230125-1643.zip

Quote

January 25, 20233 yr

Community Expert

2 minutes ago, PZ303 said:

The previous occupant of disk 1 threw tons of errors

Do you still have that disk? Possibly nothing wrong with it. Hang on to it in case we need its contents.

Quote

January 25, 20233 yr

Community Expert

Do any of your disks have SMART warnings on the Dashboard page? (save me the trouble of examining every SMART report).

Quote

January 25, 20233 yr

Author

I have the original disk. I just mounted it, and it's replacement outside the array to SMART test it and it's erroring pretty badly. It's running an extended SMART now.

I ran an extend SMART on all drive yesterday to trouble shoot and no issues. Including the new disk.

Quote

January 25, 20233 yr

Community Expert

Emulated disk1 mounted and shows plenty of data.

Didn't see much in that previous syslog except that one line. Makes me wonder if the controller is dead or needs reseating. Usually you would see multiple lines showing the multiple disks involved.

06:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097] (rev 02)
	Subsystem: Super Micro Computer Inc AOC-S3008L-L8e [15d9:0808]
	Kernel driver in use: mpt3sas
	Kernel modules: mpt3sas

Quote

January 25, 20233 yr

Community Expert

Looks like all of your array is on that same controller. Can you read the other disks OK?

Quote

January 25, 20233 yr

Community Expert

Other disks must be readable or disk1 couldn't be emulated.

Did you check the connections?

Quote

January 25, 20233 yr

Community Expert

NVM, I see what happened. md driver crashed right after that line in syslog.

Have you done memtest recently?

Quote

January 25, 20233 yr

Author

I just shut down, checked / reseated the LSI (correct there is only 1) and rebooted. But like you mentioned, all other disks have been fine and Disk 1 has moved to 2 or 3 locations on the backplane to validate that wasn't the issue.

I did run a memtest as a part of this a few days ago. I only let it do 1 full pass because there were zero errors. And it wasn't the included memtest, i downloaded memtest onto a new USB and ran it that way.

Quote

January 25, 20233 yr

Author

One more thing worth mentioning is that both of these drives, the failed one and it's replacement, are Renewaed drives from Amazon. https://www.amazon.com/gp/product/B0BLJVKQKJ/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1 The first one failed Smart the second I installed it but i foolishly continued with it and within a month, the system got completely unstable with it just being present. It';s the one with the SMART errors i posted.

The replacement is the warranty replacement, still renewed, but it's passing SMART. It is throwing some errors though.

I'm starting to wonder if I just got double unlucky with renewed drives.

Quote

January 25, 20233 yr

Community Expert

1 hour ago, trurl said:

Do any of your disks have SMART warnings on the Dashboard page? (save me the trouble of examining every SMART report).

Quote

January 26, 20233 yr

Author

Yeah sorry. I hadn't seen these errors and the dashboard shows it as passed smart. It doesn't show the drive as healthy because it shows it as emulated. Are you also thinking it's also just a bad drive?

Quote

January 26, 20233 yr

Community Expert

Suggest trying with a different Unraid release, like v6.10.3, current kernel might not like your hardware, if the same happens with v6.10.3 then it would suggest a hardware problem.

Quote

January 26, 20233 yr

Community Expert

Are you sure you're talking about the Dashboard page? You might have to spinup drives to see the thumbs-up thumbs down indicator.

Quote

January 26, 20233 yr

Community Expert

SMART report for all array disks looks fine.

Quote

January 26, 20233 yr

Author

@trurl - correct. All green checks. I was confusing the Emulated orange dot with the green check, but it doesn't report as failed and when i ran the smart test it was, and is still saying passed. Despite that, the drive is getting a lot of Raw read error rates and seek error rates. The other drives all show zero while the number on this drive is climbing.

@JorgeB i was on 6.11.5 and it gives me a downgrade path to 6.11.4 so i'll start there. This is an upgraded build with a new 13700k and ASRock Z690 Steel Legend with latest bios to support the new 13gen chips. 128gb of RAM memtested with zero errors on just one pass. These drives are both Seagate Exos from 2018 so i doubt those are not supported.

I've ordered other drives to replace these so in a few days i'll be able to tell if i just had 2 bad renewed drives. I guess one question is could there be data corruption causing this issue on rebuild?

image.png.a522441ba2896cdf45eee90593518ce0.png

Quote

January 26, 20233 yr

Community Expert

Just now, PZ303 said:

a downgrade path to 6.11.4

You can download the zip of any current release and just replace all the bz* files on flash to get that version. Of course you should backup flash first.

Quote

January 26, 20233 yr

Author

Thanks. I'll try that next. I had seen that option, but the in-gui downgrade was a bit quicker and i'm going to be too busy with work today to mess with the flash. I'll presume the minor version downgrade won't really change things, but i figured why not while i'm busy today.

Quote

January 26, 20233 yr

Author

So i just downgraded to 6.10.3 and the issue persists. The rebuild got unstable and stopped within 5 minutes.

I'll wait for the (now third) replacement drive from a different manufacturer and vendor and see if that solves the problem. 2 bad drives, while wildly bad luck, seems to be the most likely scenario here.

Quote

January 27, 20233 yr

Community Expert

Post new diags from v6.10.3 to see if the Unraid driver is still crashing.

Quote

January 28, 20233 yr

Author

Here is the diag when I try the rebuild on 6.10.3

silentcricket-diagnostics-20230127-2037.zip

Quote

Data rebuild stopping

Featured Replies

Solved by PZ303

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)