Parity Check - System fails


Go to solution Solved by Soulflyzz,

Recommended Posts

Hello,

 

I have migrated from my old Unraid server to a new Dell r720 system.

Through the process I just got it all live and running.

After my Parity Check I had 450 errors and all my shares disappeared, all my dockers wont start up and every disk had errors in Fix Common Problems plugin.

 

I restarted the server and everything worked fine for 24 hours till I did another Parity check and then the same thing happened.

I have not had time to reboot the server to see if it came back.

 

Is this going to be an issue moving forward? If so i have no idea how to fix this. I have attach a Diagnostic in hopes to figure out the issues

 

r720-diagnostics-20210624-0810.zip

Link to comment

Look like you had multiple disks drop offline simultaneously.     I would suspect that you have some sort of hardware related issue.  Since a parity check is when the system is under maximum load it could be power related.   Other possibilities that spring to mind are something more obscure like an improperly seated HBA.

  • Like 1
Link to comment

See if disabling spin down helps with the drives getting disabled, also make sure just one cable goes to the MD1200, Unraid doesn't support SAS multi-path, these errors are likely the result of that:

 

Jun 29 07:30:40 r720 emhttpd: device /dev/sdt problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdz problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdu problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdr problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdx problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdv problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sds problem getting id
Jun 29 07:30:41 r720 emhttpd: device /dev/sdah problem getting id
Jun 29 07:30:41 r720 emhttpd: device /dev/sdy problem getting id
Jun 29 07:30:41 r720 emhttpd: device /dev/sdal problem getting id

 

  • Thanks 1
Link to comment

Yes I have my r720 with a LSI 9201-16e card ran to my MD1200 with 2 e-sas cords. Then from e-sas to 4xsata for 8 more drives (Currently not in the array).

Hope this info helps.

38 minutes ago, trurl said:

Still having multiple disk communication issues. Are all those disks on this controller?


Broadcom / LSI SAS2116 PCI-Express Fusion-MPT SAS-2

 

 

I do have two line going from MD1200 to my r720 will do that tonight.

17 minutes ago, JorgeB said:

See if disabling spin down helps with the drives getting disabled, also make sure just one cable goes to the MD1200, Unraid doesn't support SAS multi-path, these errors are likely the result of that:

 


Jun 29 07:30:40 r720 emhttpd: device /dev/sdt problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdz problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdu problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdr problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdx problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdv problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sds problem getting id
Jun 29 07:30:41 r720 emhttpd: device /dev/sdah problem getting id
Jun 29 07:30:41 r720 emhttpd: device /dev/sdy problem getting id
Jun 29 07:30:41 r720 emhttpd: device /dev/sdal problem getting id

 

 

Also I have I plan on changing my cashe drives from btrfs to xfs. I have a feeling that might also be the culprit.

 

Link to comment

So what should be the next plan of attack? Uninstall "Spin Down SAS Drives" Plugin? Or just disable sleep on on my drives?

I have the "Dynamix Cache Directories" Pugin installed so i could have my discs spin down to save life and power? 

If Spin Down is the issue then is there a way to get that working with mover? Actually lets leave that till I test it... I will be in contact in 24-48 hours after I test these theories.

 

Thank you for the help "JorgeB" "trurl" and "itimpi"

Link to comment

*Update*

I have disabled my 5 am Mover.

I did a Maintenance mode repair -L on all 20 drives.

Removed the second e-sas cord between the r720 and md1200

Started a parity Check

*Forgot to disable spin down on the drives

 

I woke up this morning with the parity check at 60% complete with only 2 errors on the first 10 drives.

All my dockes look to be running but all my shares are missing again this morning.

Also when I checked all the drives were spun down

 

r720-diagnostics-20210630-0806.zip

Link to comment
11 minutes ago, Soulflyzz said:

*Forgot to disable spin down on the drives

Like mentioned that would be the first thing you should try, still looks like a spin up related error, these disks spun down:

 

Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdk
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdd
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdj
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdc
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdo
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdl
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdq
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdn
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdb
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdp

 

Then CA backup ran:

Jun 30 02:01:15 r720 CA Backup/Restore: Backing Up appData from /mnt/user/appdata/ to /[email protected]

And woke up the disks, note that the problem was in the same 10 disks that were spun down before:

 

Jun 30 02:01:20 r720 kernel: sd 7:0:0:0: [sdb] tag#6560 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:1:0: [sdc] tag#6561 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:8:0: [sdj] tag#6563 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:10:0: [sdl] tag#6565 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:13:0: [sdo] tag#6567 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:15:0: [sdq] tag#6569 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:2:0: [sdd] tag#6562 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:9:0: [sdk] tag#6564 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:14:0: [sdp] tag#6568 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:12:0: [sdn] tag#6566 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.