[email protected] Posted July 14 Share Posted July 14 (edited) I am getting the logs filled with errors all of the sudden. kernel: BTRFS error (device md8p1): bdev /dev/md8p1 errs: wr 0, rd 0, flush 0, corrupt 32996, gen 0 and failed command: READ FPDMA QUEUED something very odds that I can not find what device is md8p1. its not listed in the tools-> system devices page. I can find the ata32, i think, its the [32:0:0:0] disk ATA ST4000NE001-2MA1 EN01 /dev/sdi 4.00TB but no idea what the md8p1 could be. Also a prity sync is running at the moment and the speed varies from 140 MB/sec to under 5 MB/sec. Any help figuring out what the problem is would be greatly appreciated. Diagnostics are attached. cerint-diagnostics-20240714-1923.zip Edited July 14 by [email protected] added more info Quote Link to comment
itimpi Posted July 14 Share Posted July 14 md8 is disk8 on the Main tab. The /dev/md? type devices are created by the Unraid driver for each disk in the main array when it is started. Quote Link to comment
JorgeB Posted July 15 Share Posted July 15 Replace cables for that disk or swap with a another one, then run a scrub. Quote Link to comment
[email protected] Posted July 15 Author Share Posted July 15 I am using Mini SAS HD SFF-8643 toSFF-8087 cables. Each on is connected to 4 drives. If it was an issue with the cable shouldn't more than one drives have errors? Quote Link to comment
itimpi Posted July 15 Share Posted July 15 Just now, [email protected] said: I am using Mini SAS HD SFF-8643 toSFF-8087 cables. Each on is connected to 4 drives. If it was an issue with the cable shouldn't more than one drives have errors? I have some cables like that where I can only get 3 of the 4 SATA connectors to work correctly so it is definitely possible to not have all the drives affected. Quote Link to comment
[email protected] Posted July 15 Author Share Posted July 15 1 minute ago, itimpi said: I have some cables like that where I can only get 3 of the 4 SATA connectors to work correctly so it is definitely possible to not have all the drives affected. These cables connect to a back plane and not individual drives. https://www.amazon.co.uk/ipolex-Internal-SFF-8643-SFF-8087-Foldable/dp/B0868H6L9D/ref=sr_1_4 I would expect thigs to happen in groups of 4 drives. i will move the cables around and see if another drive on another row starts throwing read errors an report back. Quote Link to comment
[email protected] Posted July 25 Author Share Posted July 25 SO i removed all the drives and started adding them back in one by one. Each time doing a "new config" and re-creating the parity. I am at the point where i added drive number 6 and during the parity sync it started showing read errors. I moved the drive to a different slot in tha case and got read errors again. So its not the cables as each slot is connected to a different port/cable. Any other thoughts? Thanks, G Quote Link to comment
[email protected] Posted July 25 Author Share Posted July 25 Pausing and resuming the sync and pausing again created the following log: sdq is the drive in the array that now was the read errors. no idea what this log actually means and why it is saying that sdq is now sdv when in the array its still sdq Quote Link to comment
JorgeB Posted July 25 Share Posted July 25 If you have used different cables/slot and the errors persist it may be the disk. Quote Link to comment
[email protected] Posted July 27 Author Share Posted July 27 Ok more updates. I added 3 x 4TB drives to my array using the new config tool. I started a parity sync and almost immediately one of the existing 16TB drives and one of the newly added 4TB drives started with the read errors. I stopped the sync, removed 2 of the 4 TB drives and started the sync again. the 16TB drive did NOT throw any errors and the sync completed. I have no idea why the 16TB gave read errors when the 3x4tb drives got added at the same time. I have ordered some replacement drives and will remove the last 4tb drive. Very odd behaviour. Quote Link to comment
[email protected] Posted July 27 Author Share Posted July 27 I did a reboot and got some new errors now: For some reason disk 1 is now read only Quote Link to comment
[email protected] Posted July 27 Author Share Posted July 27 Moving the data out of disk1 causes more errors: Quote Link to comment
JorgeB Posted July 27 Share Posted July 27 All those disk errors are likely going to cause issues for btrfs. Quote Link to comment
[email protected] Posted July 27 Author Share Posted July 27 I am trying to move back to XFS but its going to be a slow process. need to empty each drive, remove from the array, and add it back with xfs format it's going to take weeks... Quote Link to comment
JonathanM Posted July 27 Share Posted July 27 7 hours ago, [email protected] said: I have no idea why the 16TB gave read errors when the 3x4tb drives got added at the same time. Maybe the power got overloaded? How many drives do you have on each PSU lead? What size is your PSU? Any splitters? What kind? Molex-SATA or SATA-SATA? Quote Link to comment
[email protected] Posted July 28 Author Share Posted July 28 20 hours ago, JonathanM said: Maybe the power got overloaded? How many drives do you have on each PSU lead? What size is your PSU? Any splitters? What kind? Molex-SATA or SATA-SATA? Got a corsair 850watt brand new. in total its 12SSDs and 12HDDs 2 power leads from the PSU, 1 molex and 1 sata with adaptor. Also i run a parity sync the day before and was fine (no read errors, and no errors in logs) and today with the same drives same config, run parity sync again and i am getting : kernel: I/O error, dev sdk, sector 31251758424 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2 on an array drive that never before had any errors. can figure this out. from the moment i build this new server it has been a pain. I may need to scrap it and start new at this point. Quote Link to comment
JonathanM Posted July 28 Share Posted July 28 12 minutes ago, [email protected] said: 12SSDs and 12HDDs 2 power leads from the PSU, 1 molex and 1 sata with adaptor. Is there any way you can use more leads? Preferably 4 pin molex style, they can handle much more current. Each SATA connector can really only handle 2 drives, so if you try to use SATA splitters it's very easy to run out of current. It's much safer to use molex -> SATA splitters if you really have to use splitters at all. Ideally you shouldn't use any splitters, rather source a PSU with enough connections available. Many modular PSU's have extra cables available specifically for them. Quote Link to comment
[email protected] Posted July 28 Author Share Posted July 28 Its a modular PSU so i'll get a second psu to molex cable and remove the sata completely. Quote Link to comment
JonathanM Posted July 28 Share Posted July 28 18 minutes ago, [email protected] said: Its a modular PSU so i'll get a second psu to molex cable and remove the sata completely. Make sure you either source directly from the manufacture for that SPECIFIC power supply, or verify the pinouts with a tester of some sort before attaching any drives. There have been multiple accounts on this forum alone of people frying multiple hard drives with cables that physically fit perfectly but were pinned differently, feeding the drives 12V where they were expecting 5V. Quote Link to comment
Solution [email protected] Posted August 3 Author Solution Share Posted August 3 It was the SAS ports on the motherboard. If i had drives connected to those ports the array would become unstable. What was odd was that not only drives connected to those 2 ports would be throwing errors. As soon as I connected the drives to a new pcie HBA it all sorted it self out. No more read errors, parity sync completes fine, etc... Thanks everyone for their help, G Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.