Jump to content

Parity Check hung up at 15%


Recommended Posts

I had been transferring files from portable drives to the array and moving files off of a drive in the that has issues before shutting the array down to fiddle with the failed drive when, after a break a parity check had started.  The CPU was at nearly 100% utilization which slowed everything down so I paused the parity check to finish what I was doing and restarted the parity check.  It says running but is making no progress and the CPU utilization graphics on the Dashboard don't seem to be working any more.  I tried to pause the parity check and even cancel it but after confirming there is no impact on the condition of the parity check.  I am including the current  diagnostic file.  I didn't want to try a reboot until I check here. 

tower-diagnostics-20240618-1058.zip

Link to comment

 I shut down from the array operation menu on the Main tab and swapped the cables on disk 3.  When I powered up it indicated that it was not a clean shut down.  Interestingly, when I checked the notices it reported successful parity check from a couple of days ago.  Now disk 3 and 4 show failures and the array just says starting but has not started.  the fix common problems just says disk 4 is disabled and refers me to the main tab.

tower-diagnostics-20240618-1808.zip

Link to comment

Now there are constant errors with sdj, it's currently unassigned, was this an array disk?

 

Model Family:     Western Digital Blue (SMR)
Device Model:     WDC WD60EZAZ-00ZGHB0
Serial Number:    WD-WX11D19A4C2V

 

Jun 18 16:10:06 Tower kernel: sd 10:0:32:0: Power-on or device reset occurred
Jun 18 16:10:06 Tower kernel: sd 10:0:32:0: [sdj] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB)
Jun 18 16:10:06 Tower kernel: sd 10:0:32:0: [sdj] 4096-byte physical blocks
Jun 18 16:10:06 Tower kernel: sd 10:0:32:0: [sdj] Write Protect is off
Jun 18 16:10:06 Tower kernel: sd 10:0:32:0: [sdj] Mode Sense: 7f 00 10 08
Jun 18 16:10:06 Tower kernel: sd 10:0:32:0: [sdj] Write cache: enabled, read cache: enabled, supports DPO and FUA
Jun 18 16:10:06 Tower kernel: sd 10:0:32:0: [sdj] Attached SCSI disk
Jun 18 16:10:07 Tower kernel: sd 10:0:32:0: device_block, handle(0x000d)
Jun 18 16:10:08 Tower kernel: sd 10:0:32:0: device_unblock and setting to running, handle(0x000d)
Jun 18 16:10:10 Tower kernel: sd 10:0:32:0: [sdj] Synchronizing SCSI cache
Jun 18 16:10:10 Tower kernel: sd 10:0:32:0: [sdj] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=DRIVER_OK
Jun 18 16:10:10 Tower kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221106000000)
Jun 18 16:10:10 Tower kernel: mpt2sas_cm0: removing handle(0x000d), sas_addr(0x4433221106000000)
Jun 18 16:10:10 Tower kernel: mpt2sas_cm0: enclosure logical id(0x5b8ca3a0fb5ac600), slot(5)
Jun 18 16:10:12 Tower SysDrivers: SysDrivers Build Complete
Jun 18 16:10:20 Tower kernel: mpt2sas_cm0: handle(0xd) sas_address(0x4433221106000000) port_type(0x1)
Jun 18 16:10:20 Tower kernel: scsi 10:0:33:0: Direct-Access     ATA      WDC WD60EZAZ-00Z 0A80 PQ: 0 ANSI: 6
Jun 18 16:10:20 Tower kernel: scsi 10:0:33:0: SATA: handle(0x000d), sas_addr(0x4433221106000000), phy(6), device_name(0x0000000000000000)
Jun 18 16:10:20 Tower kernel: scsi 10:0:33:0: enclosure logical id (0x5b8ca3a0fb5ac600), slot(5) 
Jun 18 16:10:20 Tower kernel: scsi 10:0:33:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Jun 18 16:10:20 Tower kernel: scsi 10:0:33:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
Jun 18 16:10:20 Tower kernel: sd 10:0:33:0: Attached scsi generic sg9 type 0
Jun 18 16:10:20 Tower kernel: end_device-10:33: add: handle(0x000d), sas_addr(0x4433221106000000)
Jun 18 16:10:20 Tower kernel: sd 10:0:33:0: Power-on or device reset occurred
Jun 18 16:10:20 Tower kernel: sd 10:0:33:0: [sdj] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB)
Jun 18 16:10:20 Tower kernel: sd 10:0:33:0: [sdj] 4096-byte physical blocks
Jun 18 16:10:20 Tower kernel: sd 10:0:33:0: [sdj] Write Protect is off
Jun 18 16:10:20 Tower kernel: sd 10:0:33:0: [sdj] Mode Sense: 7f 00 10 08
Jun 18 16:10:20 Tower kernel: sd 10:0:33:0: [sdj] Write cache: enabled, read cache: enabled, supports DPO and FUA
Jun 18 16:10:20 Tower kernel: sd 10:0:33:0: [sdj] Attached SCSI disk

 

Link to comment
Jun 19 10:34:54 Tower kernel: ata2: link is slow to respond, please be patient (ready=0)
Jun 19 10:34:58 Tower kernel: ata2: found unknown device (class 0)
Jun 19 10:34:58 Tower kernel: ata2: softreset failed (device not ready)
Jun 19 10:34:59 Tower kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jun 19 10:34:59 Tower kernel: ata2.00: configured for UDMA/133
Jun 19 10:36:00 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jun 19 10:36:00 Tower kernel: ata3.00: failed command: FLUSH CACHE EXT
Jun 19 10:36:00 Tower kernel: ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 17
Jun 19 10:36:00 Tower kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 19 10:36:00 Tower kernel: ata3.00: status: { DRDY }
Jun 19 10:36:00 Tower kernel: ata3: hard resetting link
Jun 19 10:36:01 Tower kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jun 19 10:36:01 Tower kernel: ata3.00: configured for UDMA/133
Jun 19 10:36:01 Tower kernel: ata3.00: retrying FLUSH 0xea Emask 0x4
Jun 19 10:36:01 Tower kernel: ata3.00: device reported invalid CHS sector 0
Jun 19 10:36:01 Tower kernel: ata3: EH complete

 

Still issues with multiple disks, this time disks 1 and 5, you may have a power/connection problem, any power splitters in use?

Link to comment
2 minutes ago, EH5 said:

Yes there are power splitters.  It is a 600W power supply but not enough HDD plugs

 

You need to be careful not to split a connector on a PSU cable too many ways.    A SATA->SATA splitter should not split more than 2 way reliably, and a Molex->SATA one can normally get away with 4 way.

Link to comment

I moved everything off the emulated disks and unplugged them.  When I reconfigured without them and tried to restart the array and rebuild parity it shows that another disk is disconnected, the Dockers didn't start, and the same issue where the CPU Usage diagnostics don't show and the parity rebuild is not advancing.  Is it possible that my UNRAID USB drive is corrupted somehow?

tower-diagnostics-20240621-1847.zip

Link to comment

The syslog shows what look like connection issues (power or SATA) for disk4 and disk5  (in particular disk4) and eventually you started getting write errors on disk4.  The SMART information for those drives looks OK.   My suspicion would be insufficient power reaching the drives.

 

I do not see how the flash drive could be causing this issue.

Link to comment

I upgrade my power supply to get sufficient sata power connections and switched to a case with easier access to swap out the drives.  Tings appeared fine  but I got a memory error.  I will start a new thread for that as I think all the issues I was dealing with in this tread are fixed.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...