Jump to content

Cannot build parity on new hard drives


Stubbs

Recommended Posts

Consider this a semi-continuation from my previous thread (https://forums.unraid.net/topic/128808-need-help-upgrading-my-hdds/#comment-1173836)

But this is a separate problem.

 

To summarize:

I bought four new 10TB Seagate Ironwolf Hard Drives. This is to upgrade the storage of my current array, which is: three 3TB WD Reds(one parity), two 2TB WD Reds.

As people on this forum instructed me, I started by replacing the WD Red 3TB Parity Drive with one of the Ironwolf 10TB drives. I shut my server down, removed the WD Red, installed the Seagate in it's place, and booted the server back up.

I headed to the "Main" menu, confirmed there was no parity drive, and my Seagate was there under unassigned devices. I shut down the array, assigned the Seagate Ironwolf as a parity, started the array and... errors.

 

Straight away, I ran diagnostics (see attachment: (first attempt)). It took a few minutes for the array to even boot up, but when it finally did, parity almost immediately started returning errors and a read-check was initiated.

 

Before making this thread, I thought I'd do some extra tests. I powered down the server, took the Seagate Ironwolf out and replaced it with one of the other brand new Seagate Ironwolfs I bought. I powered the server back on, tried to build parity with the next Seagate, and it returned the same errors (see attachment: (second attempt)).

 

Finally, I put my old WD Red 3TB parity drive back in. Once again, I triggered a parity rebuild and... it worked fine, parity started rebuilding without any errors.

 

Can anyone explain to me what the problem is? Is it another case of hardware connectivity issues? Is it something to do with being a different brand of HDD?(I thought Unraid didn't care about this). Did I somehow buy two dud HDDs?

 

Here's some logging snippets (not that it matters much):

 

Sep 28 23:52:19 Tower  avahi-daemon[9851]: Interface vethec6bd6e.IPv6 no longer relevant for mDNS.
Sep 28 23:52:19 Tower  avahi-daemon[9851]: Leaving mDNS multicast group on interface vethec6bd6e.IPv6 with address fe80::704d:adff:fe0f:2f34.
Sep 28 23:52:19 Tower kernel: docker0: port 10(vethec6bd6e) entered disabled state
Sep 28 23:52:19 Tower kernel: device vethec6bd6e left promiscuous mode
Sep 28 23:52:19 Tower kernel: docker0: port 10(vethec6bd6e) entered disabled state
Sep 28 23:52:19 Tower  avahi-daemon[9851]: Withdrawing address record for fe80::704d:adff:fe0f:2f34 on vethec6bd6e.
Sep 28 23:52:20 Tower kernel: ata6: found unknown device (class 0)
Sep 28 23:52:25 Tower kernel: ata6: softreset failed (1st FIS failed)
Sep 28 23:52:25 Tower kernel: ata6: hard resetting link
Sep 28 23:52:30 Tower kernel: ata6: found unknown device (class 0)
Sep 28 23:52:30 Tower kernel: ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 28 23:52:31 Tower kernel: ata6.00: configured for UDMA/133
Sep 28 23:52:31 Tower kernel: sd 6:0:0:0: [sdh] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=16s
Sep 28 23:52:31 Tower kernel: sd 6:0:0:0: [sdh] tag#3 Sense Key : 0x5 [current] 
Sep 28 23:52:31 Tower kernel: sd 6:0:0:0: [sdh] tag#3 ASC=0x21 ASCQ=0x4 
Sep 28 23:52:31 Tower kernel: sd 6:0:0:0: [sdh] tag#3 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 10 00 00 00 08 00 00
Sep 28 23:52:31 Tower kernel: I/O error, dev sdh, sector 16 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Sep 28 23:52:31 Tower kernel: sd 6:0:0:0: [sdh] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=16s
Sep 28 23:52:31 Tower kernel: sd 6:0:0:0: [sdh] tag#7 Sense Key : 0x5 [current] 
Sep 28 23:52:31 Tower kernel: sd 6:0:0:0: [sdh] tag#7 ASC=0x21 ASCQ=0x4 
Sep 28 23:52:31 Tower kernel: sd 6:0:0:0: [sdh] tag#7 CDB: opcode=0x88 88 00 00 00 00 00 00 00 01 08 00 00 00 f8 00 00
Sep 28 23:52:31 Tower kernel: I/O error, dev sdh, sector 264 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
Sep 28 23:52:31 Tower kernel: ata6: EH complete
Sep 28 23:52:32 Tower kernel: ata6.00: exception Emask 0x50 SAct 0x400 SErr 0xb0802 action 0xe frozen
Sep 28 23:52:32 Tower kernel: ata6.00: irq_stat 0x00400000, PHY RDY changed
Sep 28 23:52:32 Tower kernel: ata6: SError: { RecovComm HostInt PHYRdyChg PHYInt 10B8B }
Sep 28 23:52:32 Tower kernel: ata6.00: failed command: READ FPDMA QUEUED
Sep 28 23:52:32 Tower kernel: ata6.00: cmd 60/08:50:40:20:00/00:00:00:00:00/40 tag 10 ncq dma 4096 in
Sep 28 23:52:32 Tower kernel:         res 40/00:50:40:20:00/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Sep 28 23:52:32 Tower kernel: ata6.00: status: { DRDY }
Sep 28 23:52:32 Tower kernel: ata6: hard resetting link
Sep 28 23:52:32 Tower kernel: ata6: SATA link down (SStatus 0 SControl 300)
Sep 28 23:52:33 Tower kernel: ata6: hard resetting link
Sep 28 23:52:39 Tower kernel: ata6: found unknown device (class 0)
Sep 28 23:52:43 Tower kernel: ata6: softreset failed (1st FIS failed)
Sep 28 23:52:43 Tower kernel: ata6: hard resetting link
Sep 28 23:52:49 Tower kernel: ata6: found unknown device (class 0)
Sep 28 23:52:49 Tower kernel: ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 28 23:52:49 Tower kernel: ata6.00: configured for UDMA/133
Sep 28 23:52:49 Tower kernel: sd 6:0:0:0: [sdh] tag#10 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=17s
Sep 28 23:52:49 Tower kernel: sd 6:0:0:0: [sdh] tag#10 Sense Key : 0x5 [current] 
Sep 28 23:52:49 Tower kernel: sd 6:0:0:0: [sdh] tag#10 ASC=0x21 ASCQ=0x4 
Sep 28 23:52:49 Tower kernel: sd 6:0:0:0: [sdh] tag#10 CDB: opcode=0x88 88 00 00 00 00 00 00 00 20 40 00 00 00 08 00 00
Sep 28 23:52:49 Tower kernel: I/O error, dev sdh, sector 8256 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Sep 28 23:52:49 Tower kernel: ata6: EH complete
Sep 28 23:52:49 Tower  emhttpd: error: hotplug_devices, 1730: No such file or directory (2): Error: tagged device ST10000VN000-3AK101_WWY036M2 was (sde) is now (sdh)
Sep 28 23:52:49 Tower  emhttpd: read SMART /dev/sdh
Sep 28 23:52:49 Tower kernel: emhttpd[5074]: segfault at 674 ip 0000000000413f90 sp 00007ffcc22ab490 error 4 in emhttpd[403000+1d000]
Sep 29 00:29:47 Tower kernel: SVM: TSC scaling supported
Sep 29 00:29:47 Tower kernel: kvm: Nested Virtualization enabled
Sep 29 00:29:47 Tower kernel: SVM: kvm: Nested Paging enabled
Sep 29 00:29:47 Tower kernel: SEV supported: 16 ASIDs
Sep 29 00:29:47 Tower kernel: SEV-ES supported: 4294967295 ASIDs
Sep 29 00:29:47 Tower kernel: SVM: Virtual VMLOAD VMSAVE supported
Sep 29 00:29:47 Tower kernel: SVM: Virtual GIF supported
Sep 29 00:29:47 Tower kernel: SVM: LBR virtualization supported
Sep 29 00:29:47 Tower kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth9d6e238: link becomes ready
Sep 29 00:29:47 Tower kernel: docker0: port 9(veth9d6e238) entered blocking state
Sep 29 00:29:47 Tower kernel: docker0: port 9(veth9d6e238) entered forwarding state
Sep 29 00:29:47 Tower kernel: tun: Universal TUN/TAP device driver, 1.6
Sep 29 00:29:47 Tower kernel: mdcmd (36): check 
Sep 29 00:29:47 Tower kernel: md: recovery thread: recon P ...
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=0
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=8
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=16
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=24
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=32
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=40
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=48
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=56
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=64
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=72
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=80
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=88
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=96
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=104
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=112
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=120
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=128
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=136
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=144
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=152
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=160
Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=168

 

(second attempt)tower-diagnostics-20220928-2354.zip (first attempt)tower-diagnostics-20220929-0030.zip

Edited by Stubbs
Link to comment
7 hours ago, JorgeB said:

Try swapping both the power and SATA cables with an existing disk, then see if the issue follows the disk.

I already did. As stated in the OP, I put my old WD Red Parity back in the slot, and there were no errors. The parity build started, and worked just fine.

I then cancelled the rebuild. I put the new Seagate back in that slot; exact same cables, exact same screws and everything. Same errors.

I then tried putting the new Seagate in a different slot with different cables. Again, same errors.

 

Then I connected the Seagate to my Windows 10 PC via a docking station/toaster. I formatted it with NTFS and it's working fine.

 

The new Seagate drives seem to be working fine. For some reason Unraid doesn't want to build parity on them.

Link to comment
15 hours ago, Stubbs said:

I bought four new 10TB Seagate Ironwolf Hard Drives.

Does same problem happen on other Seagate disk ? The problem disk negotiate at 1.5Gbps instead 6Gbps, this also indicate problem.

 

I notice you have two SATA controller, B450 have 4 SATA port only, two additional port should come from CPU, anyway the problem disk should connect to B450 chipset.

 

So, if both Seagate disk got problem, then pls try to connect to CPU SATA port which connect below disk. Just swap it and verify.

[9:0:0:0]    disk    ATA      WDC WD20EFRX-68E 0A82  /dev/sdf   /dev/sg5 
[10:0:0:0]   disk    ATA      WDC WD30EFRX-68E 0A82  /dev/sdg   /dev/sg6 

Edited by Vr2Io
Link to comment
34 minutes ago, Vr2Io said:

Does same problem happen on other Seagate disk ? The problem disk negotiate at 1.5Gbps instead 6Gbps, this also indicate problem.

 

I notice you have two SATA controller, B450 have 4 SATA port only, two additional port should come from CPU, anyway the problem disk should connect to B450 chipset.

 

So, if both Seagate disk got problem, then pls try to connect to CPU SATA port which connect below disk. Just swap it and verify.

[9:0:0:0]    disk    ATA      WDC WD20EFRX-68E 0A82  /dev/sdf   /dev/sg5 
[10:0:0:0]   disk    ATA      WDC WD30EFRX-68E 0A82  /dev/sdg   /dev/sg6 

Yes, both the two I tested. I attached diagnostics for both in the OP.

 

[edit] Also my motherboard has six SATA ports:

https://www.asus.com/au/motherboards-components/motherboards/prime/prime-b450m-a/

 

I connected them via my hotswap bay mounted to the front of the case. This bay has three SATA ports, and is powered by two SATA power connectors.

I will try connecting the new drives to the motherboard the next opportunity I get. That being said, I find it a bit strange that the WD Red drives work perfectly fine in the bay, yet the Seagates do not. Could this have something to do with the WD Reds being 5400RPM, and the Seagates being 7200RPM? Because the hotswap bay (which includes a fan) is powered by only two SATA power cables. Could that introduce some kind of bottleneck if a drive with higher RPM was installed?

Edited by Stubbs
Link to comment
17 hours ago, Vr2Io said:

Doesn't relate, and you should try bypass the hotswap enclosure.

 

17 hours ago, JorgeB said:

This looks more like a compatibility issue, look for a BIOS update for the board, also for a firmware update for the disks, failing that best bet would be trying with a different controller (or board), many users using the same model disks with Unraid, so it's not an Unraid/Linux issue.

Alright, I gave it a try and got some mixed results.

 

I updated the BIOS on my Asus B450M-A first because that was the simplest. Didn't fix it. There were also no firmware updates available for these disks.

 

I proceeded to change some of the SATA cabling around. I swapped one of my internal drive's data cable with the Seagate's, and I think it was connected to SATA2 on the motherboard. The errors persisted but the parity check started. It just ran extremely slow (would've taken a year to build) and the log was full of errors so something was wrong. It looked like this:

https://i.imgur.com/jTvI3JR.png

Diagnostics attached attached(fourth attempt)

 

So I shut down, tried a different cable setup. I plugged the Seagate into SATA2 using it's original cable (not the one I swapped in). Started a parity check and it worked; parity actually started building normally.

 

But my server case was still open and not in its usual resting spot. So I cancelled the parity check, shut down the server, put the case cover back on and moved the server back into its original position. I powered back on, started the parity check and... back to really slow speeds, taking 300+ days to rebuild. Diagnostics attached(fifth attempt).

 

I just don't understand. It went from working fine to not working. It's using the same cable, same port, but after one shutdown cycle, the parity build decided not to work anymore.

(fifth)tower-diagnostics-20220930-1141.zip (fourth attempt)tower-diagnostics-20220930-1108.zip

Link to comment
1 hour ago, JorgeB said:

If I understand correctly the 4th attempt was the one going well? 4th and 5th have the disk in a different port, it was ATA10 on the forth and ATA2 on the 5th, see if you can connect it to ATA9/10, should be ports 5 and 6.

Unfortunately no. The 4th attempt was the first time the parity check actually started running, but it was at 300K/Bs and errors kept repeating in the log.

 

I shut down the server, plugged the Seagate into SATA port 2 with it's original cable, powered the server on, and parity started building properly. I didn't save a diagnostics file because I thought the problem was gone.

 

I cancelled the parity build and shut down the server again (the case was open and laying on the floor). The only other thing I did was swap out the SATA4 cableto a newer cable. This was connected to a data drive and was unrelated to the Seagate.

 

(fifth) is the most recent diagnostics. Exact same cable and port for the Seagate when the parity build worked. For some reason it went back to this: jTvI3JR.png

Link to comment
39 minutes ago, JorgeB said:

The fact that it worked once makes me thing it could still be a power/cable issue, but I understand you already tested with different power and SATA cable?

 

Do you have another controller you could test with? A cheap 2 port Asmedia/JMB controller would do.

I agree, which is why I'm going to keep trying to get it working the next chance I get (I can't power it off right now). I'm hoping if I can get parity built, the storage drives will be less trouble.

 

I have an HBA Card but I really don't want to install it yet. The only PCIe x16 slot is currently occupied by a NIC, which I use for a virtualized router. I'm really not keen on using a backup router without all my firewall stuff set up. This is the main reason I'm going to buy a new motherboard on black friday/cyber monday.

  • Like 1
Link to comment

The frustration continues.

 

First, I tried connecting the Seagate to SATA_1. This was the port my cache drive was connected to. I connected the Seagate to it WITH the cache drive's cable. Same errors (1st FIS failed) (hard resetting link), along with 150KB/s parity rebuild. Diagnostics attached (sixth attempt).

 

Then I reverted back to the exact same configuration where the parity build appeared to be working. Seagate connected to SATA_2 on the motherboard with its original cable, and I brought back the old orange SATA cable to connect the data drive to SATA_4. I powered the server back on, and it wouldn't even start the rebuild. The parity disk (seagate) was stuck in a disabled state. Stopped and started the array, and it was still disabled. Diagnostics attached (seventh) (disabled).

 

Finally, I tried a different cable and a different port again, this time SATA_5. I powered the server back on, and initially it started rebuilding just fine at 90MB/s. But after about 30 seconds, the same errors shows up again.

 

Oct  1 08:42:59 Tower kernel: ata9: softreset failed (1st FIS failed)
Oct  1 08:43:05 Tower kernel: ata9: found unknown device (class 0)
Oct  1 08:43:06 Tower kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Oct  1 08:43:06 Tower kernel: ata9.00: configured for UDMA/133
Oct  1 08:43:06 Tower kernel: ata9.00: exception Emask 0x10 SAct 0x60000000 SErr 0x90202 action 0xe frozen
Oct  1 08:43:06 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Oct  1 08:43:06 Tower kernel: ata9.00: cmd 61/c8:f0:c8:c5:14/00:00:00:00:00/40 tag 30 ncq dma 102400 out
Oct  1 08:43:06 Tower kernel:         res 40/00:e8:88:c0:14/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Oct  1 08:43:06 Tower kernel: ata9.00: status: { DRDY }
Oct  1 08:43:06 Tower kernel: ata9: hard resetting link

 

And the parity build got slower and slower. 90MB/s to 30MB/s to 13MB/s and so on, so I cancelled it. Diagnostics attached (eighth attempt).

 

(eighth)tower-diagnostics-20221001-0843.zip (seventh) (disabled) tower-diagnostics-20221001-0755.zip (sixth)tower-diagnostics-20221001-0732.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...