Parity Check - System fails - General Support

June 24, 20215 yr

Hello,

I have migrated from my old Unraid server to a new Dell r720 system.

Through the process I just got it all live and running.

After my Parity Check I had 450 errors and all my shares disappeared, all my dockers wont start up and every disk had errors in Fix Common Problems plugin.

I restarted the server and everything worked fine for 24 hours till I did another Parity check and then the same thing happened.

I have not had time to reboot the server to see if it came back.

Is this going to be an issue moving forward? If so i have no idea how to fix this. I have attach a Diagnostic in hopes to figure out the issues

r720-diagnostics-20210624-0810.zip

Quote

June 24, 20215 yr

Community Expert

Look like you had multiple disks drop offline simultaneously. I would suspect that you have some sort of hardware related issue. Since a parity check is when the system is under maximum load it could be power related. Other possibilities that spring to mind are something more obscure like an improperly seated HBA.

Quote

1

June 24, 20215 yr

Author

i have it plugged into a backup battery. I will power it down and try it on a different power bar and give that a try tonight.

both my r720 and my MD1420 have redundant power supplies.

Quote

June 29, 20215 yr

Author

Thank you for the assistance, before still working on this issue and have narrowed down the issue to when Mover starts.

I repaired all of the drives through maintenance mode with -L. Checked parity twice 16+ hours each time with 0 errors.

The moment I evoked mover all my share crashed again.

any ideas, i have attached a new log.

r720-diagnostics-20210629-1029.zip

Quote

June 29, 20215 yr

Community Expert

Still having multiple disk communication issues. Are all those disks on this controller?

Broadcom / LSI SAS2116 PCI-Express Fusion-MPT SAS-2

Quote

1

June 29, 20215 yr

Community Expert

See if disabling spin down helps with the drives getting disabled, also make sure just one cable goes to the MD1200, Unraid doesn't support SAS multi-path, these errors are likely the result of that:

Jun 29 07:30:40 r720 emhttpd: device /dev/sdt problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdz problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdu problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdr problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdx problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdv problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sds problem getting id
Jun 29 07:30:41 r720 emhttpd: device /dev/sdah problem getting id
Jun 29 07:30:41 r720 emhttpd: device /dev/sdy problem getting id
Jun 29 07:30:41 r720 emhttpd: device /dev/sdal problem getting id

Quote

1

June 29, 20215 yr

Author

Yes I have my r720 with a LSI 9201-16e card ran to my MD1200 with 2 e-sas cords. Then from e-sas to 4xsata for 8 more drives (Currently not in the array).

Hope this info helps.

38 minutes ago, trurl said:
Still having multiple disk communication issues. Are all those disks on this controller?
Broadcom / LSI SAS2116 PCI-Express Fusion-MPT SAS-2

I do have two line going from MD1200 to my r720 will do that tonight.

17 minutes ago, JorgeB said:

See if disabling spin down helps with the drives getting disabled, also make sure just one cable goes to the MD1200, Unraid doesn't support SAS multi-path, these errors are likely the result of that:


Jun 29 07:30:40 r720 emhttpd: device /dev/sdt problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdz problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdu problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdr problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdx problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sdv problem getting id
Jun 29 07:30:40 r720 emhttpd: device /dev/sds problem getting id
Jun 29 07:30:41 r720 emhttpd: device /dev/sdah problem getting id
Jun 29 07:30:41 r720 emhttpd: device /dev/sdy problem getting id
Jun 29 07:30:41 r720 emhttpd: device /dev/sdal problem getting id

Also I have I plan on changing my cashe drives from btrfs to xfs. I have a feeling that might also be the culprit.

Quote

June 29, 20215 yr

Community Expert

2 minutes ago, Soulflyzz said:

Also I have I plan on changing my cashe drives from btrfs to xfs. I have a feeling that might also be the culprit.

It's not, it might be related to spin down, i.e., problem occurs when the drives wake up, e.g., when the mover wakes them up to run.

Quote

June 29, 20215 yr

Author

So what should be the next plan of attack? Uninstall "Spin Down SAS Drives" Plugin? Or just disable sleep on on my drives?

I have the "Dynamix Cache Directories" Pugin installed so i could have my discs spin down to save life and power?

If Spin Down is the issue then is there a way to get that working with mover? Actually lets leave that till I test it... I will be in contact in 24-48 hours after I test these theories.

Thank you for the help "JorgeB" "trurl" and "itimpi"

Quote

June 30, 20215 yr

Author

*Update*

I have disabled my 5 am Mover.

I did a Maintenance mode repair -L on all 20 drives.

Removed the second e-sas cord between the r720 and md1200

Started a parity Check

*Forgot to disable spin down on the drives

I woke up this morning with the parity check at 60% complete with only 2 errors on the first 10 drives.

All my dockes look to be running but all my shares are missing again this morning.

Also when I checked all the drives were spun down

r720-diagnostics-20210630-0806.zip

Quote

June 30, 20215 yr

Community Expert

11 minutes ago, Soulflyzz said:

*Forgot to disable spin down on the drives

Like mentioned that would be the first thing you should try, still looks like a spin up related error, these disks spun down:

Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdk
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdd
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdj
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdc
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdo
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdl
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdq
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdn
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdb
Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdp

Then CA backup ran:

Jun 30 02:01:15 r720 CA Backup/Restore: Backing Up appData from /mnt/user/appdata/ to /[email protected]

And woke up the disks, note that the problem was in the same 10 disks that were spun down before:

Jun 30 02:01:20 r720 kernel: sd 7:0:0:0: [sdb] tag#6560 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:1:0: [sdc] tag#6561 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:8:0: [sdj] tag#6563 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:10:0: [sdl] tag#6565 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:13:0: [sdo] tag#6567 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:15:0: [sdq] tag#6569 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:2:0: [sdd] tag#6562 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:9:0: [sdk] tag#6564 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:14:0: [sdp] tag#6568 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jun 30 02:01:20 r720 kernel: sd 7:0:12:0: [sdn] tag#6566 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s

Quote

July 1, 20215 yr

Author
Solution

I have uninstalled the sas spin down plugin. This morning after a fresh reboot.
I will run it for 24 hours and let you know my results.

Quote

July 5, 20215 yr

Author

Hello,

Update from last week. Have not had a single crash since the uninstall of the "sas sleep plugin" i am in the process of running another parity check will run a move after that is done to do the final test.

Quote

July 5, 20215 yr

Community Expert

Yeah, some SAS devices are known to not spin down/up correctly, you can post in the plugin support thread for more info.

Quote

July 12, 20215 yr

Author

Thank you for all the help I pushed the server over the weekend and could not make it crash. Thank you for the help correcting this issue.

Quote

Parity Check - System fails

Featured Replies

Solved by Soulflyzz

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)