Parity rebuild incredibly slow


Recommended Posts

My problem started with my USB stick crapping out about a week ago.   I got that fixed using a 2 week old backup I had made.  When it first came back on line it attempted to run a parity check and it was running at about 700 KBps.   I tried rebooting a couple times and stopping and starting the array and nothing made it faster 

 

Since then I’ve tried different things.   I tried to take one of the parity drives off line and just use one for the initial parity check.   In the process of doing that it reported that disk 8 was unmountabke and needed formatting.  I attempted to format it a few times but each time it would format for some time and stop.  Each time I came back to the computer it would show that disk 8 was unmountable and ask me to format it.   

 

I decided to remove disk 8 from the array temporarily.   With the 2nd parity disk and disk 8 removed the array was running well and wasn’t wanting to do a parity check or rebuild.   I added the 2nd parity disk back online and it did a parity rebuild in that disk which took the usual 30 hours  (8 TB WD Red).   While that was running I also did a preclear on disk 8.  This ran at 150 MBps through the whole process.   

 

Now ive tried to add Disk 8 back into the array.   Parity rebuild is incredibly slow.  Running at the 600-700 KBps as it did initially.   When it first starts the speed briefly shoots up to 20-30 MBps and then goes back down to 0.  After that it will shoot up to 1.5-2 MBps and then drop back down to 0 again.  It’s like something is holding it back from going any faster

 

I did SMART check on the discs and can’t find a bad one.   I put the server into maintenance mode and checked the file system in each disk.   I couldn’t find any errors,  but to be honest I don’t know what I am looking for there. I tried shutting off all my dockers and that made no difference.   

 

My system is running UnRaid 6.4.  It has a new i5 8400 on an ASRock MB.  It has a mix of 8 and 4 TB WD Red disks.   Disk 8 and the parity’s are all 8s.  I have 2 LSI Raid controllers to hook up the disks which are connected to the 4-1 breakout cables.   There are 13 hard drives total.  It has 3 Cache drives (240, 120, 120).  

 

I'm really lost as to what could be causing this issue.    I’d like some guidance on what to do next

 

im my phone now.  I’ll post the logs when I get home

 

Link to comment
5 hours ago, go69cars said:

m a bit confused what the array is writing to Disk 8 then if it's not trying to rebuild it.


If the system saw you format disk8 then the system will try to restore a formatted disk8 - a disk with zero files since the format command overwrites the index of what is stored on the disk. Format is - by definition - a lossy operation intended to be performed before you start storing files on the disk.

Link to comment

Constant timeout errors on two disks (disk8 and parity2):

 

Apr 14 04:41:12 Tower kernel: sd 8:0:1:0: task abort: SUCCESS scmd(ffff880351682148)
Apr 14 04:41:44 Tower kernel: sd 8:0:1:0: attempting task abort! scmd(ffff8800590f9148)
Apr 14 04:41:44 Tower kernel: sd 8:0:1:0: [sdk] tag#5 CDB: opcode=0x88 88 00 00 00 00 00 00 9b bf 08 00 00 04 00 00 00
Apr 14 04:41:44 Tower kernel: scsi target8:0:1: handle(0x0009), sas_address(0x4433221100000000), phy(0)
Apr 14 04:41:44 Tower kernel: scsi target8:0:1: enclosure_logical_id(0x500605b0060ad510), slot(0)
Apr 14 04:41:44 Tower kernel: sd 8:0:1:0: task abort: SUCCESS scmd(ffff8800590f9148)
Apr 14 04:41:45 Tower kernel: mpt2sas_cm1: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100)
Apr 14 04:41:45 Tower kernel: mpt2sas_cm1: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100)
Apr 14 04:41:57 Tower kernel: mpt2sas_cm1: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100)
Apr 14 04:42:28 Tower kernel: sd 8:0:4:0: attempting task abort! scmd(ffff88037f3f5d48)
Apr 14 04:42:28 Tower kernel: sd 8:0:4:0: [sdn] tag#7 CDB: opcode=0x8a 8a 00 00 00 00 00 00 9c b3 10 00 00 04 00 00 00
Apr 14 04:42:28 Tower kernel: scsi target8:0:4: handle(0x0010), sas_address(0x4433221103000000), phy(3)
Apr 14 04:42:28 Tower kernel: scsi target8:0:4: enclosure_logical_id(0x500605b0060ad510), slot(3)
Apr 14 04:42:28 Tower kernel: sd 8:0:4:0: task abort: SUCCESS scmd(ffff88037f3f5d48)
Apr 14 04:42:59 Tower kernel: sd 8:0:1:0: attempting task abort! scmd(ffff880059193548)
Apr 14 04:42:59 Tower kernel: sd 8:0:1:0: [sdk] tag#1 CDB: opcode=0x88 88 00 00 00 00 00 00 9d 77 18 00 00 04 00 00 00

Check cables, both power and SATA or enclosure.

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.