Stubbs Posted September 28, 2022 Share Posted September 28, 2022 (edited) Consider this a semi-continuation from my previous thread (https://forums.unraid.net/topic/128808-need-help-upgrading-my-hdds/#comment-1173836) But this is a separate problem. To summarize: I bought four new 10TB Seagate Ironwolf Hard Drives. This is to upgrade the storage of my current array, which is: three 3TB WD Reds(one parity), two 2TB WD Reds. As people on this forum instructed me, I started by replacing the WD Red 3TB Parity Drive with one of the Ironwolf 10TB drives. I shut my server down, removed the WD Red, installed the Seagate in it's place, and booted the server back up. I headed to the "Main" menu, confirmed there was no parity drive, and my Seagate was there under unassigned devices. I shut down the array, assigned the Seagate Ironwolf as a parity, started the array and... errors. Straight away, I ran diagnostics (see attachment: (first attempt)). It took a few minutes for the array to even boot up, but when it finally did, parity almost immediately started returning errors and a read-check was initiated. Before making this thread, I thought I'd do some extra tests. I powered down the server, took the Seagate Ironwolf out and replaced it with one of the other brand new Seagate Ironwolfs I bought. I powered the server back on, tried to build parity with the next Seagate, and it returned the same errors (see attachment: (second attempt)). Finally, I put my old WD Red 3TB parity drive back in. Once again, I triggered a parity rebuild and... it worked fine, parity started rebuilding without any errors. Can anyone explain to me what the problem is? Is it another case of hardware connectivity issues? Is it something to do with being a different brand of HDD?(I thought Unraid didn't care about this). Did I somehow buy two dud HDDs? Here's some logging snippets (not that it matters much): Sep 28 23:52:19 Tower avahi-daemon[9851]: Interface vethec6bd6e.IPv6 no longer relevant for mDNS. Sep 28 23:52:19 Tower avahi-daemon[9851]: Leaving mDNS multicast group on interface vethec6bd6e.IPv6 with address fe80::704d:adff:fe0f:2f34. Sep 28 23:52:19 Tower kernel: docker0: port 10(vethec6bd6e) entered disabled state Sep 28 23:52:19 Tower kernel: device vethec6bd6e left promiscuous mode Sep 28 23:52:19 Tower kernel: docker0: port 10(vethec6bd6e) entered disabled state Sep 28 23:52:19 Tower avahi-daemon[9851]: Withdrawing address record for fe80::704d:adff:fe0f:2f34 on vethec6bd6e. Sep 28 23:52:20 Tower kernel: ata6: found unknown device (class 0) Sep 28 23:52:25 Tower kernel: ata6: softreset failed (1st FIS failed) Sep 28 23:52:25 Tower kernel: ata6: hard resetting link Sep 28 23:52:30 Tower kernel: ata6: found unknown device (class 0) Sep 28 23:52:30 Tower kernel: ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 28 23:52:31 Tower kernel: ata6.00: configured for UDMA/133 Sep 28 23:52:31 Tower kernel: sd 6:0:0:0: [sdh] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=16s Sep 28 23:52:31 Tower kernel: sd 6:0:0:0: [sdh] tag#3 Sense Key : 0x5 [current] Sep 28 23:52:31 Tower kernel: sd 6:0:0:0: [sdh] tag#3 ASC=0x21 ASCQ=0x4 Sep 28 23:52:31 Tower kernel: sd 6:0:0:0: [sdh] tag#3 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 10 00 00 00 08 00 00 Sep 28 23:52:31 Tower kernel: I/O error, dev sdh, sector 16 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 Sep 28 23:52:31 Tower kernel: sd 6:0:0:0: [sdh] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=16s Sep 28 23:52:31 Tower kernel: sd 6:0:0:0: [sdh] tag#7 Sense Key : 0x5 [current] Sep 28 23:52:31 Tower kernel: sd 6:0:0:0: [sdh] tag#7 ASC=0x21 ASCQ=0x4 Sep 28 23:52:31 Tower kernel: sd 6:0:0:0: [sdh] tag#7 CDB: opcode=0x88 88 00 00 00 00 00 00 00 01 08 00 00 00 f8 00 00 Sep 28 23:52:31 Tower kernel: I/O error, dev sdh, sector 264 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0 Sep 28 23:52:31 Tower kernel: ata6: EH complete Sep 28 23:52:32 Tower kernel: ata6.00: exception Emask 0x50 SAct 0x400 SErr 0xb0802 action 0xe frozen Sep 28 23:52:32 Tower kernel: ata6.00: irq_stat 0x00400000, PHY RDY changed Sep 28 23:52:32 Tower kernel: ata6: SError: { RecovComm HostInt PHYRdyChg PHYInt 10B8B } Sep 28 23:52:32 Tower kernel: ata6.00: failed command: READ FPDMA QUEUED Sep 28 23:52:32 Tower kernel: ata6.00: cmd 60/08:50:40:20:00/00:00:00:00:00/40 tag 10 ncq dma 4096 in Sep 28 23:52:32 Tower kernel: res 40/00:50:40:20:00/00:00:00:00:00/40 Emask 0x50 (ATA bus error) Sep 28 23:52:32 Tower kernel: ata6.00: status: { DRDY } Sep 28 23:52:32 Tower kernel: ata6: hard resetting link Sep 28 23:52:32 Tower kernel: ata6: SATA link down (SStatus 0 SControl 300) Sep 28 23:52:33 Tower kernel: ata6: hard resetting link Sep 28 23:52:39 Tower kernel: ata6: found unknown device (class 0) Sep 28 23:52:43 Tower kernel: ata6: softreset failed (1st FIS failed) Sep 28 23:52:43 Tower kernel: ata6: hard resetting link Sep 28 23:52:49 Tower kernel: ata6: found unknown device (class 0) Sep 28 23:52:49 Tower kernel: ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 28 23:52:49 Tower kernel: ata6.00: configured for UDMA/133 Sep 28 23:52:49 Tower kernel: sd 6:0:0:0: [sdh] tag#10 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=17s Sep 28 23:52:49 Tower kernel: sd 6:0:0:0: [sdh] tag#10 Sense Key : 0x5 [current] Sep 28 23:52:49 Tower kernel: sd 6:0:0:0: [sdh] tag#10 ASC=0x21 ASCQ=0x4 Sep 28 23:52:49 Tower kernel: sd 6:0:0:0: [sdh] tag#10 CDB: opcode=0x88 88 00 00 00 00 00 00 00 20 40 00 00 00 08 00 00 Sep 28 23:52:49 Tower kernel: I/O error, dev sdh, sector 8256 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 Sep 28 23:52:49 Tower kernel: ata6: EH complete Sep 28 23:52:49 Tower emhttpd: error: hotplug_devices, 1730: No such file or directory (2): Error: tagged device ST10000VN000-3AK101_WWY036M2 was (sde) is now (sdh) Sep 28 23:52:49 Tower emhttpd: read SMART /dev/sdh Sep 28 23:52:49 Tower kernel: emhttpd[5074]: segfault at 674 ip 0000000000413f90 sp 00007ffcc22ab490 error 4 in emhttpd[403000+1d000] Sep 29 00:29:47 Tower kernel: SVM: TSC scaling supported Sep 29 00:29:47 Tower kernel: kvm: Nested Virtualization enabled Sep 29 00:29:47 Tower kernel: SVM: kvm: Nested Paging enabled Sep 29 00:29:47 Tower kernel: SEV supported: 16 ASIDs Sep 29 00:29:47 Tower kernel: SEV-ES supported: 4294967295 ASIDs Sep 29 00:29:47 Tower kernel: SVM: Virtual VMLOAD VMSAVE supported Sep 29 00:29:47 Tower kernel: SVM: Virtual GIF supported Sep 29 00:29:47 Tower kernel: SVM: LBR virtualization supported Sep 29 00:29:47 Tower kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth9d6e238: link becomes ready Sep 29 00:29:47 Tower kernel: docker0: port 9(veth9d6e238) entered blocking state Sep 29 00:29:47 Tower kernel: docker0: port 9(veth9d6e238) entered forwarding state Sep 29 00:29:47 Tower kernel: tun: Universal TUN/TAP device driver, 1.6 Sep 29 00:29:47 Tower kernel: mdcmd (36): check Sep 29 00:29:47 Tower kernel: md: recovery thread: recon P ... Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=0 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=8 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=16 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=24 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=32 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=40 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=48 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=56 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=64 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=72 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=80 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=88 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=96 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=104 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=112 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=120 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=128 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=136 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=144 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=152 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=160 Sep 29 00:29:47 Tower kernel: md: disk0 write error, sector=168 (second attempt)tower-diagnostics-20220928-2354.zip (first attempt)tower-diagnostics-20220929-0030.zip Edited September 28, 2022 by Stubbs Quote Link to comment
JorgeB Posted September 28, 2022 Share Posted September 28, 2022 Try swapping both the power and SATA cables with an existing disk, then see if the issue follows the disk. Quote Link to comment
Stubbs Posted September 28, 2022 Author Share Posted September 28, 2022 7 hours ago, JorgeB said: Try swapping both the power and SATA cables with an existing disk, then see if the issue follows the disk. I already did. As stated in the OP, I put my old WD Red Parity back in the slot, and there were no errors. The parity build started, and worked just fine. I then cancelled the rebuild. I put the new Seagate back in that slot; exact same cables, exact same screws and everything. Same errors. I then tried putting the new Seagate in a different slot with different cables. Again, same errors. Then I connected the Seagate to my Windows 10 PC via a docking station/toaster. I formatted it with NTFS and it's working fine. The new Seagate drives seem to be working fine. For some reason Unraid doesn't want to build parity on them. Quote Link to comment
Vr2Io Posted September 29, 2022 Share Posted September 29, 2022 (edited) 15 hours ago, Stubbs said: I bought four new 10TB Seagate Ironwolf Hard Drives. Does same problem happen on other Seagate disk ? The problem disk negotiate at 1.5Gbps instead 6Gbps, this also indicate problem. I notice you have two SATA controller, B450 have 4 SATA port only, two additional port should come from CPU, anyway the problem disk should connect to B450 chipset. So, if both Seagate disk got problem, then pls try to connect to CPU SATA port which connect below disk. Just swap it and verify. [9:0:0:0] disk ATA WDC WD20EFRX-68E 0A82 /dev/sdf /dev/sg5 [10:0:0:0] disk ATA WDC WD30EFRX-68E 0A82 /dev/sdg /dev/sg6 Edited September 29, 2022 by Vr2Io Quote Link to comment
Stubbs Posted September 29, 2022 Author Share Posted September 29, 2022 (edited) 34 minutes ago, Vr2Io said: Does same problem happen on other Seagate disk ? The problem disk negotiate at 1.5Gbps instead 6Gbps, this also indicate problem. I notice you have two SATA controller, B450 have 4 SATA port only, two additional port should come from CPU, anyway the problem disk should connect to B450 chipset. So, if both Seagate disk got problem, then pls try to connect to CPU SATA port which connect below disk. Just swap it and verify. [9:0:0:0] disk ATA WDC WD20EFRX-68E 0A82 /dev/sdf /dev/sg5 [10:0:0:0] disk ATA WDC WD30EFRX-68E 0A82 /dev/sdg /dev/sg6 Yes, both the two I tested. I attached diagnostics for both in the OP. [edit] Also my motherboard has six SATA ports: https://www.asus.com/au/motherboards-components/motherboards/prime/prime-b450m-a/ I connected them via my hotswap bay mounted to the front of the case. This bay has three SATA ports, and is powered by two SATA power connectors. I will try connecting the new drives to the motherboard the next opportunity I get. That being said, I find it a bit strange that the WD Red drives work perfectly fine in the bay, yet the Seagates do not. Could this have something to do with the WD Reds being 5400RPM, and the Seagates being 7200RPM? Because the hotswap bay (which includes a fan) is powered by only two SATA power cables. Could that introduce some kind of bottleneck if a drive with higher RPM was installed? Edited September 29, 2022 by Stubbs Quote Link to comment
Vr2Io Posted September 29, 2022 Share Posted September 29, 2022 1 hour ago, Stubbs said: being 5400RPM, and the Seagates being 7200RPM? Doesn't relate, and you should try bypass the hotswap enclosure. Quote Link to comment
JorgeB Posted September 29, 2022 Share Posted September 29, 2022 This looks more like a compatibility issue, look for a BIOS update for the board, also for a firmware update for the disks, failing that best bet would be trying with a different controller (or board), many users using the same model disks with Unraid, so it's not an Unraid/Linux issue. Quote Link to comment
Stubbs Posted September 30, 2022 Author Share Posted September 30, 2022 17 hours ago, Vr2Io said: Doesn't relate, and you should try bypass the hotswap enclosure. 17 hours ago, JorgeB said: This looks more like a compatibility issue, look for a BIOS update for the board, also for a firmware update for the disks, failing that best bet would be trying with a different controller (or board), many users using the same model disks with Unraid, so it's not an Unraid/Linux issue. Alright, I gave it a try and got some mixed results. I updated the BIOS on my Asus B450M-A first because that was the simplest. Didn't fix it. There were also no firmware updates available for these disks. I proceeded to change some of the SATA cabling around. I swapped one of my internal drive's data cable with the Seagate's, and I think it was connected to SATA2 on the motherboard. The errors persisted but the parity check started. It just ran extremely slow (would've taken a year to build) and the log was full of errors so something was wrong. It looked like this: https://i.imgur.com/jTvI3JR.png Diagnostics attached attached(fourth attempt) So I shut down, tried a different cable setup. I plugged the Seagate into SATA2 using it's original cable (not the one I swapped in). Started a parity check and it worked; parity actually started building normally. But my server case was still open and not in its usual resting spot. So I cancelled the parity check, shut down the server, put the case cover back on and moved the server back into its original position. I powered back on, started the parity check and... back to really slow speeds, taking 300+ days to rebuild. Diagnostics attached(fifth attempt). I just don't understand. It went from working fine to not working. It's using the same cable, same port, but after one shutdown cycle, the parity build decided not to work anymore. (fifth)tower-diagnostics-20220930-1141.zip (fourth attempt)tower-diagnostics-20220930-1108.zip Quote Link to comment
JorgeB Posted September 30, 2022 Share Posted September 30, 2022 If I understand correctly the 4th attempt was the one going well? 4th and 5th have the disk in a different port, it was ATA10 on the forth and ATA2 on the 5th, see if you can connect it to ATA9/10, should be ports 5 and 6. Quote Link to comment
Stubbs Posted September 30, 2022 Author Share Posted September 30, 2022 1 hour ago, JorgeB said: If I understand correctly the 4th attempt was the one going well? 4th and 5th have the disk in a different port, it was ATA10 on the forth and ATA2 on the 5th, see if you can connect it to ATA9/10, should be ports 5 and 6. Unfortunately no. The 4th attempt was the first time the parity check actually started running, but it was at 300K/Bs and errors kept repeating in the log. I shut down the server, plugged the Seagate into SATA port 2 with it's original cable, powered the server on, and parity started building properly. I didn't save a diagnostics file because I thought the problem was gone. I cancelled the parity build and shut down the server again (the case was open and laying on the floor). The only other thing I did was swap out the SATA4 cableto a newer cable. This was connected to a data drive and was unrelated to the Seagate. (fifth) is the most recent diagnostics. Exact same cable and port for the Seagate when the parity build worked. For some reason it went back to this: Quote Link to comment
JorgeB Posted September 30, 2022 Share Posted September 30, 2022 The fact that it worked once makes me thing it could still be a power/cable issue, but I understand you already tested with different power and SATA cable? Do you have another controller you could test with? A cheap 2 port Asmedia/JMB controller would do. Quote Link to comment
Stubbs Posted September 30, 2022 Author Share Posted September 30, 2022 39 minutes ago, JorgeB said: The fact that it worked once makes me thing it could still be a power/cable issue, but I understand you already tested with different power and SATA cable? Do you have another controller you could test with? A cheap 2 port Asmedia/JMB controller would do. I agree, which is why I'm going to keep trying to get it working the next chance I get (I can't power it off right now). I'm hoping if I can get parity built, the storage drives will be less trouble. I have an HBA Card but I really don't want to install it yet. The only PCIe x16 slot is currently occupied by a NIC, which I use for a virtualized router. I'm really not keen on using a backup router without all my firewall stuff set up. This is the main reason I'm going to buy a new motherboard on black friday/cyber monday. 1 Quote Link to comment
Stubbs Posted September 30, 2022 Author Share Posted September 30, 2022 The frustration continues. First, I tried connecting the Seagate to SATA_1. This was the port my cache drive was connected to. I connected the Seagate to it WITH the cache drive's cable. Same errors (1st FIS failed) (hard resetting link), along with 150KB/s parity rebuild. Diagnostics attached (sixth attempt). Then I reverted back to the exact same configuration where the parity build appeared to be working. Seagate connected to SATA_2 on the motherboard with its original cable, and I brought back the old orange SATA cable to connect the data drive to SATA_4. I powered the server back on, and it wouldn't even start the rebuild. The parity disk (seagate) was stuck in a disabled state. Stopped and started the array, and it was still disabled. Diagnostics attached (seventh) (disabled). Finally, I tried a different cable and a different port again, this time SATA_5. I powered the server back on, and initially it started rebuilding just fine at 90MB/s. But after about 30 seconds, the same errors shows up again. Oct 1 08:42:59 Tower kernel: ata9: softreset failed (1st FIS failed) Oct 1 08:43:05 Tower kernel: ata9: found unknown device (class 0) Oct 1 08:43:06 Tower kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Oct 1 08:43:06 Tower kernel: ata9.00: configured for UDMA/133 Oct 1 08:43:06 Tower kernel: ata9.00: exception Emask 0x10 SAct 0x60000000 SErr 0x90202 action 0xe frozen Oct 1 08:43:06 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED Oct 1 08:43:06 Tower kernel: ata9.00: cmd 61/c8:f0:c8:c5:14/00:00:00:00:00/40 tag 30 ncq dma 102400 out Oct 1 08:43:06 Tower kernel: res 40/00:e8:88:c0:14/00:00:00:00:00/40 Emask 0x10 (ATA bus error) Oct 1 08:43:06 Tower kernel: ata9.00: status: { DRDY } Oct 1 08:43:06 Tower kernel: ata9: hard resetting link And the parity build got slower and slower. 90MB/s to 30MB/s to 13MB/s and so on, so I cancelled it. Diagnostics attached (eighth attempt). (eighth)tower-diagnostics-20221001-0843.zip (seventh) (disabled) tower-diagnostics-20221001-0755.zip (sixth)tower-diagnostics-20221001-0732.zip Quote Link to comment
JorgeB Posted October 1, 2022 Share Posted October 1, 2022 You've also tried with a different power cable correct? If yes I think trying with a different controller (or board) would be the next step. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.