kernel: BTRFS error (device md8p1): bdev /dev/md8p1 errs: wr 0, rd 0, flush 0, corrupt 32996, gen 0

[email protected] · July 14

I am getting the logs filled with errors all of the sudden.

and

failed command: READ FPDMA QUEUED

something very odds that I can not find what device is md8p1. its not listed in the tools-> system devices page.

I can find the ata32, i think, its the [32:0:0:0] disk ATA ST4000NE001-2MA1 EN01 /dev/sdi 4.00TB but no idea what the md8p1 could be.

Also a prity sync is running at the moment and the speed varies from 140 MB/sec to under 5 MB/sec.

Any help figuring out what the problem is would be greatly appreciated. Diagnostics are attached.

cerint-diagnostics-20240714-1923.zip

Edited July 14 by [email protected]
added more info

itimpi · July 14

md8 is disk8 on the Main tab. The /dev/md? type devices are created by the Unraid driver for each disk in the main array when it is started.

JorgeB · July 15

Replace cables for that disk or swap with a another one, then run a scrub.

[email protected] · July 15

I am using Mini SAS HD SFF-8643 toSFF-8087 cables. Each on is connected to 4 drives. If it was an issue with the cable shouldn't more than one drives have errors?

itimpi · July 15

Just now, [email protected] said:

I am using Mini SAS HD SFF-8643 toSFF-8087 cables. Each on is connected to 4 drives. If it was an issue with the cable shouldn't more than one drives have errors?

I have some cables like that where I can only get 3 of the 4 SATA connectors to work correctly so it is definitely possible to not have all the drives affected.

[email protected] · July 15

1 minute ago, itimpi said:

I have some cables like that where I can only get 3 of the 4 SATA connectors to work correctly so it is definitely possible to not have all the drives affected.

These cables connect to a back plane and not individual drives.

https://www.amazon.co.uk/ipolex-Internal-SFF-8643-SFF-8087-Foldable/dp/B0868H6L9D/ref=sr_1_4

I would expect thigs to happen in groups of 4 drives. i will move the cables around and see if another drive on another row starts throwing read errors an report back.

[email protected] · July 25

SO i removed all the drives and started adding them back in one by one. Each time doing a "new config" and re-creating the parity. I am at the point where i added drive number 6 and during the parity sync it started showing read errors. I moved the drive to a different slot in tha case and got read errors again. So its not the cables as each slot is connected to a different port/cable. Any other thoughts?

Thanks,

G

[email protected] · July 25

Pausing and resuming the sync and pausing again created the following log:

sdq is the drive in the array that now was the read errors. no idea what this log actually means and why it is saying that sdq is now sdv when in the array its still sdq

image.png.fdc649727ae37ade607e0afd59c02de5.png

JorgeB · July 25

If you have used different cables/slot and the errors persist it may be the disk.

[email protected] · July 27

Ok more updates. I added 3 x 4TB drives to my array using the new config tool. I started a parity sync and almost immediately one of the existing 16TB drives and one of the newly added 4TB drives started with the read errors. I stopped the sync, removed 2 of the 4 TB drives and started the sync again. the 16TB drive did NOT throw any errors and the sync completed. I have no idea why the 16TB gave read errors when the 3x4tb drives got added at the same time.

I have ordered some replacement drives and will remove the last 4tb drive. Very odd behaviour.

[email protected] · July 27

I did a reboot and got some new errors now:

For some reason disk 1 is now read only

[email protected] · July 27

Moving the data out of disk1 causes more errors:

JorgeB · July 27

All those disk errors are likely going to cause issues for btrfs.

[email protected] · July 27

I am trying to move back to XFS but its going to be a slow process. need to empty each drive, remove from the array, and add it back with xfs format it's going to take weeks...

JonathanM · July 27

7 hours ago, [email protected] said:

I have no idea why the 16TB gave read errors when the 3x4tb drives got added at the same time.

Maybe the power got overloaded? How many drives do you have on each PSU lead? What size is your PSU? Any splitters? What kind? Molex-SATA or SATA-SATA?

[email protected] · July 28

20 hours ago, JonathanM said:

Maybe the power got overloaded? How many drives do you have on each PSU lead? What size is your PSU? Any splitters? What kind? Molex-SATA or SATA-SATA?

Got a corsair 850watt brand new. in total its 12SSDs and 12HDDs 2 power leads from the PSU, 1 molex and 1 sata with adaptor.

Also i run a parity sync the day before and was fine (no read errors, and no errors in logs) and today with the same drives same config, run parity sync again and i am getting :

kernel: I/O error, dev sdk, sector 31251758424 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2

on an array drive that never before had any errors.

can figure this out. from the moment i build this new server it has been a pain. I may need to scrap it and start new at this point.

JonathanM · July 28

12 minutes ago, [email protected] said:

12SSDs and 12HDDs 2 power leads from the PSU, 1 molex and 1 sata with adaptor.

Is there any way you can use more leads? Preferably 4 pin molex style, they can handle much more current. Each SATA connector can really only handle 2 drives, so if you try to use SATA splitters it's very easy to run out of current. It's much safer to use molex -> SATA splitters if you really have to use splitters at all.

Ideally you shouldn't use any splitters, rather source a PSU with enough connections available. Many modular PSU's have extra cables available specifically for them.

[email protected] · July 28

Its a modular PSU so i'll get a second psu to molex cable and remove the sata completely.

JonathanM · July 28

18 minutes ago, [email protected] said:

Its a modular PSU so i'll get a second psu to molex cable and remove the sata completely.

Make sure you either source directly from the manufacture for that SPECIFIC power supply, or verify the pinouts with a tester of some sort before attaching any drives. There have been multiple accounts on this forum alone of people frying multiple hard drives with cables that physically fit perfectly but were pinned differently, feeding the drives 12V where they were expecting 5V.

[email protected] · August 3

It was the SAS ports on the motherboard. If i had drives connected to those ports the array would become unstable. What was odd was that not only drives connected to those 2 ports would be throwing errors. As soon as I connected the drives to a new pcie HBA it all sorted it self out. No more read errors, parity sync completes fine, etc...

Thanks everyone for their help,

G

kernel: BTRFS error (device md8p1): bdev /dev/md8p1 errs: wr 0, rd 0, flush 0, corrupt 32996, gen 0

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation