BlackMagicCoffee Posted February 27, 2023 Posted February 27, 2023 (edited) EDIT: as of 2023-02-28 I have removed the old diagnostics and added a new one after removing redundant SAS cables I have been having an issue for some time where every few weeks my Unraid will freeze and nothing will work. I can sometimes navigate parts of the WebUI but it will be mostly unresponsive with broken or missing elements. When this happens I will sometimes see errors written in the WebUI ( over random elements on random pages ) stating things like 'fork failed', 'resource temporarily unavailable', 'Unregistered - flash device error', or something to do with docker containers. ( I don't quite remember as I have no screenshot and cant reproduce at will ). The only way I have been able to recover from this error is to power off the server manually. I have managed to get a log from when this happens by enabling syslog writing to the USB. It is attached ( I truncated logs older than 1 day before the issue occurred ) Please help me stop the issue from re-occuring. I am also open to try other diagnostic steps or take any reasonable advice on improving my setup. Unraid 6.11.5 ( this issue happened on prior versions ) Hardware: Model: Dell R730 ( I think ) M/B:0JP31P Version A10 BIOS:Version 2.7.0. Dated: 05/23/2018 CPU:Intel® Xeon® CPU E5-2670 0 @ 2.60GHz HVM:Enabled IOMMU:Enabled Cache:256 KiB, 2 MB, 20 MB, 256 KiB, 2 MB, 20 MB Memory:64 GiB DDR3 Multi-bit ECC (max. installable capacity 1536 GiB) LSI® SAS 9206-16e (JBOD) EMC KTN-STL3 Plugins: Community Applications Compose Manager Dynamix System Tempurature Nerd Tools Things I have tried to fix this issue: Stop some docker images I don't actively use ( I have not tried all of them though ) Stop Vms I don't actively use Update Unraid and Plugins Transfer License to a brand new USB device Tried running the new USB device on a different USB port Searched forum on a few occasions for similar issues but haven't really seen anything that looks like my issue Additional notes: My server does a memory test on every boot, I am not sure how reliable this is in terms of confirming this is not an issue with faulty memory. This only started happening recently ( within the past 4-6 months ) and had run without issues for probably close to a year before that. This error occurs every 2 weeks give or take a few days and I usually discover it has already been in this state by finding jellyfin not loading, and doesn't seem to occur from any direct action I make on the server. Since I initially put the server together I have had warnings for some disks that there are "udma crc error count" it has always remained the same number ( 196 ). These drives came used with my JBOD, I have not seen any reason to believe these errors matter or are related but I think its worth mentioning. I am under the impression they show up because of some feature or lack of feature on the drive that I read about somewhere when I first installed the JBOD. Attached is my diagnostics after the last crash had happened, and the syslog from the day before up to the error and logs from the reboot. syslog.txt fatman-diagnostics-20230228-1607.zip Edited February 28, 2023 by BlackMagicCoffee Removed redundant SAS cables and uploaded new diagnostics zip Quote
JorgeB Posted February 28, 2023 Posted February 28, 2023 Log is spammed with disk ID errors, Unraid doesn't support SAS multipath, connect one cable only from the HBA to the enclosure and post new diags after array start. Quote
BlackMagicCoffee Posted February 28, 2023 Author Posted February 28, 2023 12 hours ago, JorgeB said: Log is spammed with disk ID errors, Unraid doesn't support SAS multipath, connect one cable only from the HBA to the enclosure and post new diags after array start. I have removed the redundant SAS cables and uploaded a new copy of the diagnostics file. Quote
JorgeB Posted March 1, 2023 Posted March 1, 2023 10 hours ago, BlackMagicCoffee said: I have removed the redundant SAS cables and uploaded a new copy of the diagnostics file. Still seeing the same errors, there's only one cable from one HBA going to the enclosure? Quote
BlackMagicCoffee Posted March 6, 2023 Author Posted March 6, 2023 On 3/1/2023 at 2:30 AM, JorgeB said: Still seeing the same errors, there's only one cable from one HBA going to the enclosure? Sorry for the delayed response, I got married! I have 1 cable per controller ( the JBOD has 2 controllers ). I can try again with only one controller plugged in but I am under the assumption it would not allow all my drives to work. I will remove one of the remaining cables tonight such that there is only 1 cable from the JBOD to the HBA and post an update. Quote
JorgeB Posted March 6, 2023 Posted March 6, 2023 28 minutes ago, BlackMagicCoffee said: I have 1 cable per controller ( the JBOD has 2 controllers ). On 2/28/2023 at 8:36 AM, JorgeB said: Unraid doesn't support SAS multipath, connect one cable only from the HBA to the enclosure Quote
BlackMagicCoffee Posted March 7, 2023 Author Posted March 7, 2023 23 hours ago, JorgeB said: Attached is an image of what happens when I remove one of the remaining two cables. There is a single cable to each of 2 controllers on my JBOD, disconnecting one makes half the drives inaccessible. Should I plug one controller into the other controller? Quote
JorgeB Posted March 7, 2023 Posted March 7, 2023 Lets start over, you have two LSI controllers and one enclosure correct? There should be a single cable from one HBA to one controller in the enclosure, if you do that and reboot you only see half of the disks? Quote
BlackMagicCoffee Posted March 7, 2023 Author Posted March 7, 2023 (edited) 26 minutes ago, JorgeB said: Lets start over, you have two LSI controllers and one enclosure correct? There should be a single cable from one HBA to one controller in the enclosure, if you do that and reboot you only see half of the disks? Attached is an image of my setup in its current configuration ( half the drives are missing ). I have 1x LSI HBA, when I am saying "controller" I mean the JBOD has 2x modules that provide a SAS interface. The image is how it is configured right now. I had previously had the following configurations: configuration 1 ( originally ) : A cable from the primary and expansion port of each JBOD controller/module connected to the HBA, totalling 4 SAS cables from the JBOD to the HBA. configuration 2 ( when told to remove cables because of SAS multipath ) : A cable from the primary port of each JBOD controller/module connected to the HBA, totalling 2 SAS cables from JBOD to HBA configuration 3 ( current ) : single SAS cable from HBA to JBOD, only half the drives are detected. As pictured. Sorry if I have been unclear or I am making this difficult, I do really appreciate your help with this. Edited March 7, 2023 by BlackMagicCoffee Typo, additional details Quote
JorgeB Posted March 7, 2023 Posted March 7, 2023 OK, and are you sure the 2nd diasgs posted were the correct ones? Something is missing here, since latest diags still show duplicate devices, so it doesn't make send that now devices are missing, can you post new diags with configuration 2? Quote
BlackMagicCoffee Posted March 7, 2023 Author Posted March 7, 2023 17 minutes ago, JorgeB said: OK, and are you sure the 2nd diasgs posted were the correct ones? Something is missing here, since latest diags still show duplicate devices, so it doesn't make send that now devices are missing, can you post new diags with configuration 2? Here is the new diagnostics, using configuration 2. Both 'primary' SAS ports from the JBOD are connected to my HBA as per the picture. it looks like those 'cannot get id' issues are still present. I restarted the server at 2023-03-07 14:20, the timestamp in the logs confirms that this is the correct diagnostics zip. fatman-diagnostics-20230307-1425.zip Quote
JorgeB Posted March 8, 2023 Posted March 8, 2023 13 hours ago, BlackMagicCoffee said: it looks like those 'cannot get id' issues are still present. Yep, meaning that the disks are being detected twice, I don't know that enclosure but usually multiple controllers are for redundancy, one controller should give access to all the disks, when you tried the single cable did you try it connected to either controller? Quote
BlackMagicCoffee Posted March 8, 2023 Author Posted March 8, 2023 6 hours ago, JorgeB said: Yep, meaning that the disks are being detected twice, I don't know that enclosure but usually multiple controllers are for redundancy, one controller should give access to all the disks, when you tried the single cable did you try it connected to either controller? I had not tried a single cable with the other controller. I will attempt to do so. I do have some manualls and it does show that there is 2 cables but it's not very clear what information it's trying to convey, and only shows controllers linked to eachother. Quote
BlackMagicCoffee Posted March 17, 2023 Author Posted March 17, 2023 It looks like using 1 cable on the primary port of the other controller has resolved the issue with the ids. fatman-diagnostics-20230316-2154.zip Quote
BlackMagicCoffee Posted March 17, 2023 Author Posted March 17, 2023 Do you think that had anything to do with the crashing? Quote
JorgeB Posted March 17, 2023 Posted March 17, 2023 Unlikely, but at least no log spam now, enable the syslog server and post that after a crash. Quote
BlackMagicCoffee Posted March 25, 2023 Author Posted March 25, 2023 (edited) well it crashed again, and despite enabling syslog server to write into one of my shares it didn't write anything apparently. I enabled copy syslog to flash now because i know that did work before. attached is my diagnostics zip after restart ( which i assume is probably useless because it has no crash info ) I should also add that when it crashed i was able to see the error messages i mentioned before showing up over random spots on the UI, here are the ones i saw: Warning: exec(): Unable to fork [docker network ls --format='{{.Name}}={{.Driver}}' 2>/dev/null] in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 1014 Warning: exec(): Unable to fork [pgrep 'rc.docker'] in /usr/local/emhttp/plugins/dynamix/include/Helpers.php on line 225 Compose Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/compose.manager/php/compose_manager_main.php on line 33 Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/compose.manager/php/compose_manager_main.php on line 33 Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/compose.manager/php/compose_manager_main.php on line 33 Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/compose.manager/php/compose_manager_main.php on line 33 Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/compose.manager/php/compose_manager_main.php on line 33 Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/compose.manager/php/compose_manager_main.php on line 33 Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/compose.manager/php/compose_manager_main.php on line 33 fatman-diagnostics-20230325-1524.zip Edited March 25, 2023 by BlackMagicCoffee More details Quote
BlackMagicCoffee Posted April 3, 2023 Author Posted April 3, 2023 On 3/26/2023 at 6:35 AM, JorgeB said: Post the mirrored log after the next crash. I had the issue happen again and was able to get the diagnostics after a few tries. ( the generate diagnostics modal would sometimes fail and say: /boot/logs Warning: shell_exec(): Unable to execute 'logger error: '/webGui/include/Download.php': missing csrf_token' in /usr/local/emhttp/plugins/dynamix/include/local_prepend.php on line 18 ) attached is the diagnostics zip, i also have a copy of the log from my flash device i made just before generating diagnostics fatman-diagnostics-20230403-1138.zip Quote
JorgeB Posted April 3, 2023 Posted April 3, 2023 You are having flash drive issues: Apr 3 01:01:13 fatman emhttpd: Unregistered - flash device error (ENOFLASH7) Apr 3 01:01:14 fatman emhttpd: Pro key detected, GUID: 0..3 FILE: /boot/config/Pro.key Apr 3 01:09:08 fatman emhttpd: Unregistered - flash device error (ENOFLASH7) Apr 3 01:09:09 fatman emhttpd: Pro key detected, GUID: 0..3 FILE: /boot/config/Pro.key Apr 3 01:09:17 fatman emhttpd: Unregistered - flash device error (ENOFLASH7) Apr 3 01:09:18 fatman emhttpd: Pro key detected, GUID: 0..3 FILE: /boot/config/Pro.key Apr 3 01:11:19 fatman emhttpd: Unregistered - flash device error (ENOFLASH7) Apr 3 01:11:20 fatman emhttpd: Pro key detected, GUID: 0..3 FILE: /boot/config/Pro.key Make sure it's using a USB 2.0 port and/or try a different port, if issues continue replace it, unrelated but you are also having multiple disk issues. Quote
BlackMagicCoffee Posted April 3, 2023 Author Posted April 3, 2023 I did actually replace the usb with a brand new flash drive and transfer my license when this issue first started happening. I can check the usb port again later but im pretty sure its in a USB 2.0 port right now. Quote
BlackMagicCoffee Posted April 3, 2023 Author Posted April 3, 2023 15 minutes ago, JorgeB said: You are having flash drive issues: Apr 3 01:01:13 fatman emhttpd: Unregistered - flash device error (ENOFLASH7) Apr 3 01:01:14 fatman emhttpd: Pro key detected, GUID: 0..3 FILE: /boot/config/Pro.key Apr 3 01:09:08 fatman emhttpd: Unregistered - flash device error (ENOFLASH7) Apr 3 01:09:09 fatman emhttpd: Pro key detected, GUID: 0..3 FILE: /boot/config/Pro.key Apr 3 01:09:17 fatman emhttpd: Unregistered - flash device error (ENOFLASH7) Apr 3 01:09:18 fatman emhttpd: Pro key detected, GUID: 0..3 FILE: /boot/config/Pro.key Apr 3 01:11:19 fatman emhttpd: Unregistered - flash device error (ENOFLASH7) Apr 3 01:11:20 fatman emhttpd: Pro key detected, GUID: 0..3 FILE: /boot/config/Pro.key Make sure it's using a USB 2.0 port and/or try a different port, if issues continue replace it, unrelated but you are also having multiple disk issues. i realized i could check from terminal, here is the result: im pretty sure EHCI is usb 2.0? Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.