Dr_Rak Posted January 23 Share Posted January 23 Hey guys, new to unraid and have been loving it so far but for some reason my server started losing it yesterday. No major changes, had NZB downloading stuff and checked my panel to find 2 drives offline with read errors. Tried a restart through the unraid dashboard, it never came back up and the CLI said “bzmodules checksum error”. After some googling I copied the config file off my flash drive, remade it with the unraid tool, and readded my config file, same issue. Also tried multiple USB ports, both motherboard and front panel, both USB 2 and 3, with no luck. Next I remade the flash drive from a backup I had from yesterday morning when the server was fully functional and it booted no issues. Prompted me to run a parity check so I stopped almost all my dockers, started it then went to bed. This morning I’ve now got read errors on 1 drive, with the 2 that previously had errors running fine. I’m running Unraid 6.12.6 on a Ryzen 3700x with 64gb of RAM. Array is 6x8tb Exos drives from Rhinotech through an Adaptek HBA, with 2 as parity. Logs attached. I would be incredibly grateful for any advice! http://cloud.tapatalk.com/s/65afba6ba2ed2/tower-diagnostics-20240123-0754.zip Quote Link to comment
itimpi Posted January 23 Share Posted January 23 8 minutes ago, Dr_Rak said: I copied the config file off my flash drive, Did you mean this? You should have a config folder not a file. 8 minutes ago, Dr_Rak said: “bzmodules checksum error”. This indicates a problem reading some of the archive files off the flash drive. Normally easiest fix is simply to rewrite all the bz* type files on the flash drive. If this does not work it could mean that the flash drive is starting to fail. Quote Link to comment
Dr_Rak Posted January 23 Author Share Posted January 23 1 minute ago, itimpi said: Did you mean this? You should have a config folder not a file. Sorry, I did mean config FOLDER. 1 minute ago, itimpi said: This indicates a problem reading some of the archive files off the flash drive. Normally easiest fix is simply to rewrite all the bz* type files on the flash drive. If this does not work it could mean that the flash drive is starting to fail. That was my concern, but it seems like it only happened when I was using the "latest" config folder, even with fresh bz* files from the unraid flash drive tool. When I copied over a full backup from earlier that day (config + bz* files, etc) it booted first time without issues. Quote Link to comment
itimpi Posted January 23 Share Posted January 23 1 minute ago, Dr_Rak said: Sorry, I did mean config FOLDER. That was my concern, but it seems like it only happened when I was using the "latest" config folder, even with fresh bz* files from the unraid flash drive tool. When I copied over a full backup from earlier that day (config + bz* files, etc) it booted first time without issues. Messages about the bz* files are independent of the config as they are loaded and checked before the later stages of the boot process when the 'config' folder starts coming into play. Quote Link to comment
Dr_Rak Posted January 23 Author Share Posted January 23 Messages about the bz* files are independent of the config as they are loaded and checked before the later stages of the boot process when the 'config' folder starts coming into play.Could a failing flash drive trigger read error warnings in multiple different drives in my array?It seems like trying a different flash drive might be the right move. I see in the guide that the maximum recommended size is 32GB, but I can’t find many reputable drives of that capacity (and my existing flash is 64GB). I know it’s a waste of space but the only other flash drive I have is a Samsung BAR 256GB flash drive, could I use that as a temporary solution and replace with a smaller capacity (looks like 64GB is as low as they go) of the same model if that solves the issue? Samsung is about the only trustable brand I can find on Amazon, and the BAR has a full metal body which I understand can help with heat dissipation. Quote Link to comment
itimpi Posted January 23 Share Posted January 23 1 hour ago, Dr_Rak said: Could a failing flash drive trigger read error warnings in multiple different drives in my array? Not as far as I know. Quote Link to comment
Dr_Rak Posted January 24 Author Share Posted January 24 Update for anyone who may come across this thread in the future: It occurred to me that my issues started right after I finished some CPU intensive tasks, so I wondered if Ryzen C-states might be the culprit, and that the flash drive corruption was a byproduct of dirty shutdowns (maybe CPU stays in low power mode -> shutdown is forced due to time limit?). C-state TL;DR: Ryzen has 3 C-states (power management states a core can have under low/no load, aimed at decreasing idle power draw): C0 = active C1 = halted C6 = deep sleep There have been mixed reports of Linux kernels handling C-states, particularly C6, very poorly. The reports seem somewhat hardware dependent, and many have alleged that various BIOS updates may have resolved the issue, but there is no real consensus. For reference, I am running a 3700X on a Gigabyte B550 Aorus Pro AC using the latest BIOS (F16h at the time of writing, with AMD AGESA V2 1.2.0.B). I changed my BIOS settings as follows: Global C-states: Enabled -> Enabled (testing if I can utilize the C1 "halted" state for idle power savings without compromising stability) Power supply idle control: Low idle current -> Typical idle current (this change disables the C6 "deep sleep" state which seems to be the most problematic with Linux) AMD Cool'n'quiet: Enabled -> Disabled (undervolts/underclocks CPU during decreased utilization, but some have indicated that it does not play well with Linux) CPPC: Enabled -> Disabled (CPPC helps the OS prioritize higher-performance cores for task scheduling, but some reports suggest instability with Linux) I immediately noticed that my server became more responsive and boot time was significantly quicker. I will continue to update once I have more stability information to share. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.