Soulflyzz Posted June 24, 2021 Share Posted June 24, 2021 Hello, I have migrated from my old Unraid server to a new Dell r720 system. Through the process I just got it all live and running. After my Parity Check I had 450 errors and all my shares disappeared, all my dockers wont start up and every disk had errors in Fix Common Problems plugin. I restarted the server and everything worked fine for 24 hours till I did another Parity check and then the same thing happened. I have not had time to reboot the server to see if it came back. Is this going to be an issue moving forward? If so i have no idea how to fix this. I have attach a Diagnostic in hopes to figure out the issues r720-diagnostics-20210624-0810.zip Quote Link to comment
itimpi Posted June 24, 2021 Share Posted June 24, 2021 Look like you had multiple disks drop offline simultaneously. I would suspect that you have some sort of hardware related issue. Since a parity check is when the system is under maximum load it could be power related. Other possibilities that spring to mind are something more obscure like an improperly seated HBA. 1 Quote Link to comment
Soulflyzz Posted June 24, 2021 Author Share Posted June 24, 2021 i have it plugged into a backup battery. I will power it down and try it on a different power bar and give that a try tonight. both my r720 and my MD1420 have redundant power supplies. Quote Link to comment
Soulflyzz Posted June 29, 2021 Author Share Posted June 29, 2021 Thank you for the assistance, before still working on this issue and have narrowed down the issue to when Mover starts. I repaired all of the drives through maintenance mode with -L. Checked parity twice 16+ hours each time with 0 errors. The moment I evoked mover all my share crashed again. any ideas, i have attached a new log. r720-diagnostics-20210629-1029.zip Quote Link to comment
trurl Posted June 29, 2021 Share Posted June 29, 2021 Still having multiple disk communication issues. Are all those disks on this controller? Broadcom / LSI SAS2116 PCI-Express Fusion-MPT SAS-2 1 Quote Link to comment
JorgeB Posted June 29, 2021 Share Posted June 29, 2021 See if disabling spin down helps with the drives getting disabled, also make sure just one cable goes to the MD1200, Unraid doesn't support SAS multi-path, these errors are likely the result of that: Jun 29 07:30:40 r720 emhttpd: device /dev/sdt problem getting id Jun 29 07:30:40 r720 emhttpd: device /dev/sdz problem getting id Jun 29 07:30:40 r720 emhttpd: device /dev/sdu problem getting id Jun 29 07:30:40 r720 emhttpd: device /dev/sdr problem getting id Jun 29 07:30:40 r720 emhttpd: device /dev/sdx problem getting id Jun 29 07:30:40 r720 emhttpd: device /dev/sdv problem getting id Jun 29 07:30:40 r720 emhttpd: device /dev/sds problem getting id Jun 29 07:30:41 r720 emhttpd: device /dev/sdah problem getting id Jun 29 07:30:41 r720 emhttpd: device /dev/sdy problem getting id Jun 29 07:30:41 r720 emhttpd: device /dev/sdal problem getting id 1 Quote Link to comment
Soulflyzz Posted June 29, 2021 Author Share Posted June 29, 2021 Yes I have my r720 with a LSI 9201-16e card ran to my MD1200 with 2 e-sas cords. Then from e-sas to 4xsata for 8 more drives (Currently not in the array). Hope this info helps. 38 minutes ago, trurl said: Still having multiple disk communication issues. Are all those disks on this controller? Broadcom / LSI SAS2116 PCI-Express Fusion-MPT SAS-2 I do have two line going from MD1200 to my r720 will do that tonight. 17 minutes ago, JorgeB said: See if disabling spin down helps with the drives getting disabled, also make sure just one cable goes to the MD1200, Unraid doesn't support SAS multi-path, these errors are likely the result of that: Jun 29 07:30:40 r720 emhttpd: device /dev/sdt problem getting id Jun 29 07:30:40 r720 emhttpd: device /dev/sdz problem getting id Jun 29 07:30:40 r720 emhttpd: device /dev/sdu problem getting id Jun 29 07:30:40 r720 emhttpd: device /dev/sdr problem getting id Jun 29 07:30:40 r720 emhttpd: device /dev/sdx problem getting id Jun 29 07:30:40 r720 emhttpd: device /dev/sdv problem getting id Jun 29 07:30:40 r720 emhttpd: device /dev/sds problem getting id Jun 29 07:30:41 r720 emhttpd: device /dev/sdah problem getting id Jun 29 07:30:41 r720 emhttpd: device /dev/sdy problem getting id Jun 29 07:30:41 r720 emhttpd: device /dev/sdal problem getting id Also I have I plan on changing my cashe drives from btrfs to xfs. I have a feeling that might also be the culprit. Quote Link to comment
JorgeB Posted June 29, 2021 Share Posted June 29, 2021 2 minutes ago, Soulflyzz said: Also I have I plan on changing my cashe drives from btrfs to xfs. I have a feeling that might also be the culprit. It's not, it might be related to spin down, i.e., problem occurs when the drives wake up, e.g., when the mover wakes them up to run. Quote Link to comment
Soulflyzz Posted June 29, 2021 Author Share Posted June 29, 2021 So what should be the next plan of attack? Uninstall "Spin Down SAS Drives" Plugin? Or just disable sleep on on my drives? I have the "Dynamix Cache Directories" Pugin installed so i could have my discs spin down to save life and power? If Spin Down is the issue then is there a way to get that working with mover? Actually lets leave that till I test it... I will be in contact in 24-48 hours after I test these theories. Thank you for the help "JorgeB" "trurl" and "itimpi" Quote Link to comment
Soulflyzz Posted June 30, 2021 Author Share Posted June 30, 2021 *Update* I have disabled my 5 am Mover. I did a Maintenance mode repair -L on all 20 drives. Removed the second e-sas cord between the r720 and md1200 Started a parity Check *Forgot to disable spin down on the drives I woke up this morning with the parity check at 60% complete with only 2 errors on the first 10 drives. All my dockes look to be running but all my shares are missing again this morning. Also when I checked all the drives were spun down r720-diagnostics-20210630-0806.zip Quote Link to comment
JorgeB Posted June 30, 2021 Share Posted June 30, 2021 11 minutes ago, Soulflyzz said: *Forgot to disable spin down on the drives Like mentioned that would be the first thing you should try, still looks like a spin up related error, these disks spun down: Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdk Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdd Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdj Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdc Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdo Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdl Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdq Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdn Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdb Jun 29 23:06:05 r720 SAS Assist v0.85: Spinning down device /dev/sdp Then CA backup ran: Jun 30 02:01:15 r720 CA Backup/Restore: Backing Up appData from /mnt/user/appdata/ to /[email protected] And woke up the disks, note that the problem was in the same 10 disks that were spun down before: Jun 30 02:01:20 r720 kernel: sd 7:0:0:0: [sdb] tag#6560 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s Jun 30 02:01:20 r720 kernel: sd 7:0:1:0: [sdc] tag#6561 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s Jun 30 02:01:20 r720 kernel: sd 7:0:8:0: [sdj] tag#6563 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s Jun 30 02:01:20 r720 kernel: sd 7:0:10:0: [sdl] tag#6565 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s Jun 30 02:01:20 r720 kernel: sd 7:0:13:0: [sdo] tag#6567 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s Jun 30 02:01:20 r720 kernel: sd 7:0:15:0: [sdq] tag#6569 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s Jun 30 02:01:20 r720 kernel: sd 7:0:2:0: [sdd] tag#6562 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s Jun 30 02:01:20 r720 kernel: sd 7:0:9:0: [sdk] tag#6564 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s Jun 30 02:01:20 r720 kernel: sd 7:0:14:0: [sdp] tag#6568 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s Jun 30 02:01:20 r720 kernel: sd 7:0:12:0: [sdn] tag#6566 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s Quote Link to comment
Solution Soulflyzz Posted July 1, 2021 Author Solution Share Posted July 1, 2021 I have uninstalled the sas spin down plugin. This morning after a fresh reboot. I will run it for 24 hours and let you know my results. Quote Link to comment
Soulflyzz Posted July 5, 2021 Author Share Posted July 5, 2021 Hello, Update from last week. Have not had a single crash since the uninstall of the "sas sleep plugin" i am in the process of running another parity check will run a move after that is done to do the final test. Quote Link to comment
JorgeB Posted July 5, 2021 Share Posted July 5, 2021 Yeah, some SAS devices are known to not spin down/up correctly, you can post in the plugin support thread for more info. Quote Link to comment
Soulflyzz Posted July 12, 2021 Author Share Posted July 12, 2021 Thank you for all the help I pushed the server over the weekend and could not make it crash. Thank you for the help correcting this issue. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.