Everything posted by HatSolo
-
Missing Cache Pool and New Config Broke Array After 7.2.3 Update – Need Recovery Help
Haha well that ended up being a simple fix. Array started without issue. I've spot checked a few things and seem to have access to everything with no issues but let me know if you'd recommend checking anything else. Again I really appreciate the help you all are lifesavers! Below is the btrfs if show results if that is still helpful. root@TowerArchives:~# btrfs fi show Label: none uuid: 4ec52a98-d674-4da6-997e-0a4b9d027f30 Total devices 1 FS bytes used 144.00KiB devid 1 size 931.51GiB used 2.02GiB path /dev/sdm1 Label: none uuid: 1a45a4a6-246b-4f7c-83cf-9388f7c27b6c Total devices 1 FS bytes used 128.59GiB devid 1 size 465.76GiB used 160.02GiB path /dev/sdb1 Label: none uuid: f7effdea-9855-4791-831c-8501b27a9d69 Total devices 1 FS bytes used 9.34GiB devid 1 size 20.00GiB used 17.02GiB path /dev/loop2
-
Missing Cache Pool and New Config Broke Array After 7.2.3 Update – Need Recovery Help
- Missing Cache Pool and New Config Broke Array After 7.2.3 Update – Need Recovery Help
Here's the updated diagnostics, thanks for the help! towerarchives-diagnostics-20260109-1600.zip- Missing Cache Pool and New Config Broke Array After 7.2.3 Update – Need Recovery Help
Excellent, the assignments you listed match what I have from my notes so high confidence that they're correct. Unfortunately, when I tried to start the array (with drives assigned and Parity Valid box clicked) I still get the "Wrong Pool State Cache - missing devices" error. Unsure if this is related but my Pool Drives does list an unused Cache device with nothing assigned to it. I'm unsure what it is and can remove it if we think that may address the issue. Here is the output from btrfs fi show root@TowerArchives:~# btrfs fi show Label: none uuid: 1a45a4a6-246b-4f7c-83cf-9388f7c27b6c Total devices 1 FS bytes used 125.70GiB devid 1 size 465.76GiB used 159.02GiB path /dev/sdb1 Label: none uuid: 4ec52a98-d674-4da6-997e-0a4b9d027f30 Total devices 1 FS bytes used 144.00KiB devid 1 size 931.51GiB used 2.02GiB path /dev/sdm1- Missing Cache Pool and New Config Broke Array After 7.2.3 Update – Need Recovery Help
I have the drive assignments written down from when I last added a drive to the array but I believe it's from over a year ago. Is it possible to pull the drive assignments from the diagnostics file from prior to them being unassigned? I'm fairly confident my notes are still accurate but validating would be nice if possible. Thanks for all the help- Missing Cache Pool and New Config Broke Array After 7.2.3 Update – Need Recovery Help
Hello, I updated my UnRAID server from 7.2.0 to 7.2.3. The update completed successfully with no issue but when I restarted I was unable to start the Array and was given an error message stating "Wrong Pool State Cache - missing devices" I foolishly referenced ChatGPT which assured me that this was a known issue and simply needed the cache device manually mounted to a temp_cache directory then to use the New Config tool to rebuild the cache pool. Upon running the New Config all the devices in my Array were unmounted along with a variety of UDMA CRC errors flagging. I do not believe the UDMA CRC errors are hardware related as the system had been running stably for months prior. With all my drives now unmounted and no clear way to address the cache pool issue I was hoping to get some help putting everything back together. I have a recent diagnostics log below. Thanks, Kurt towerarchives-diagnostics-20260108-1303.zip- Data Loss After "New Config" - Seeking Recovery Advice
Excellent! I started the rebuild and everything looks to be working as expected. Rebuild running at 230 mb/s (as expected) for 20 minutes so far and no issues. Thank you again so much for the help!- Data Loss After "New Config" - Seeking Recovery Advice
Sorry about that. I reran Steps 1 & 2, the attached diagnostics log should be the array running in normal mode with Disk6 unassigned. The array did recognize it as missing and was emulating the data. Thanks. towerarchives-diagnostics-20250421-1748.zip- Data Loss After "New Config" - Seeking Recovery Advice
Awesome, thanks! Yeah I changed the power so now the 11 drives are evenly split between 3 different cables so hopefully that's sorted. I went ahead and completed Step 1 and 2 as described above. I also started on Step 3 but ran into issues. When I started the Parity-Check in normal mode (with Disk6 unassigned) it immediately listed a "Sync errors corrected: 1" in the details section which I did not expect given that I had unselected "Write corrections to parity" so I canceled the check. Wondering best course of action from here. I ran a diagnostics pull at this point and figure I have four options unless you can think of a better course of action. 1.) Let the Parity-Check finish and assume the "Sync errors corrected" was related to running a Parity-Check with a disk missing. 2.) Stop the Parity-Check and rerun in Maintenance Mode to try and force it to avoid writing. 3.) Run a separate stress test if you have one you'd recommend. 4.) Scrap the Parity-Check and prioritize rebuilding the missing disk while monitoring for connection issues. The good news is I had the array running for 3-4 hours earlier (it was just idle but positive result nonetheless). I really appreciate all the help, thanks! towerarchives-diagnostics-20250421-1449.zip- Parity-Check following Connection Issue
Running Unraid 6.12.14 Been having system stability issues related to power supply or raid controller. I think I solved the stability issues but now trying to run a Parity-Check with "Write corrections to parity" unselected so that I can asses if any drives have issues before committing to changes. This will also give me a chance to lightly stress test the system to validate the stability issues are resolved before changing data. I already know I have to replace and rebuild Disk6 from parity so I excluded it from the parity test by leaving the HDD unassigned. When I started the Parity-Check it immediately listed "Sync errors corrected: 1" which I did not expect given that "Write corrections to parity" was unselected so I paused the check. Wondering if anyone has had a similar experience or recommendations on how to proceed? Below are the options I'm considering. 1.) Just let it run and assume the "Sync errors corrected" was related to running a Parity-Check with a disk missing. 2.) Stop Parity-Check and rerun in Maintenance Mode to try and force it to avoid writing. 3.) I could also scrap the Parity-Check and prioritize rebuilding the missing disk while monitoring to make sure connection issues were resolved. Thanks for the help!- Data Loss After "New Config" - Seeking Recovery Advice
Sorry that was probably an overly broad response on my part. I spent the weekend reading up more and trying to understand the issues I've run into. One possible problem I noticed was all the HHD's were daisy chained on the same cable coming off the power supply. Unsure if there are separate wattage limits to the individual cables but, just in case, I spread out the HDD's to the other cables coming off the power supply to try and better distribute the load. As for the array here's the order of operations I'm planning on pursuing now. 1.) Re-run your instructions for "New Config" with retain all and parity valid then start array in Maint Mode to get all disks recognized within the array. I don't want to run a parity rebuild on Disk4 or Disk5 right now because they "should" be correct while I know Disk6 needs to be rebuilt. 2.) Stop the array, unmount Disk6 and restart the array in normal mode to validate that Disk6 data is emulated. 3.) Stress test the system by running a (non-correcting) Parity Check in normal mode with Disk6 unassigned (would love any recommendations if you know of a better way to test the system under load, this was just the safest option I could think of). I planned on running the Parity Test with Disk6 unassigned because I trust the emulated data and Disk6 is currently empty. 4.) Stop array and remount Disk6 to force a parity rebuild of the missing data onto Disk6. 5.) If the earlier Parity Check flagged any issues with Disk5, Disk4, or any other disks then I can determine what likely caused the variance and run a parity rebuild or (correcting) parity check based on need. If I run into any stability issues as I move through the process I will assume it's related to the power supply and prioritize that.- Data Loss After "New Config" - Seeking Recovery Advice
Alright, I've replaced the raid controller and SATA cables so hopefully that solves the connection issues. Unsure if there is any stress testing we can do prior to attempting a parity rebuild? I never intentionally turned on turbo mode; within settings the "Tunables (md_write_method)" is set to auto so unsure if that means turbo can be utilized under certain conditions? I'm hoping that the connection issues will be sorted by the raid controller replacement but if I have troubles again the power supply will definitely be the next area of focus. Current Status: Raid controller has been replaced and the system powered on with the array stopped and all HDD currently identified including Disk 4 and the two in the unassigned disk devices section (following the connection issue Drive 6 is now listed as Unassigned instead of Not Installed). Drive 4: Listed as "Device is disabled, contents emulated" but the correct HDD is assigned under the Identification column. Drive 6: "Device is missing, contents emulated" and the Identification column lists it as unassigned. If I assign ZRS1NZ48 or ZRS1NXQA from the unassigned disk devices section then it updates to a blue square with a "New Device" designation. All Other Array Disks: Listed as "Normal Operation, Device is active" This is where I ran into trouble originally. Unsure on how to get Drive 4 re-enabled then initiate a parity rebuild on Drive 6? I think I was running into issues where it was trying to parity rebuild Drive 4 (even though it shouldn't be necessary, unless you think it is necessary in the context of the situation) and/or it would try to disk-clear Drive 6 rather than running a parity rebuild. Thoughts and should I try any stress testing before troubleshooting the drives more to confirm I fixed the connection/power issue? Unsure if another diagnostics pull is helpful but I attached a current one just in case. Thanks for the help! towerarchives-diagnostics-20250418-1037.zip- Data Loss After "New Config" - Seeking Recovery Advice
Sounds good, I went ahead and ordered a replacement raid controller and replacement cables which should be here tomorrow morning. I will plan on swapping everything and powering back on to check connections once I have them. Assuming there are no glaring issues once I power on do you have any recommendations on what to look for or just wait a bit to see if issues arise? Sorry, an important clarification is that I switched Disk 5 to from the PCI controller to the motherboard controller after it disabled on the 2nd parity rebuild attempt. My thought process was that swapping the problematic Disk between controllers would help isolate the issue in the event a drive was disabled again. As it happens Disk 5 has had no issues on the motherboard controller and Disk 4 (on the PCI controller) has now been disabled. So while Disk 5 currently shows up on the motherboard controller, both disabling events actually occurred when the drives in question were connected through the PCI controller. No I hadn't considered power but that's a great thought! My power supply is just over a year old and 600w which, with my ballpark math, feels like it should be enough to handle 11 spinning drives? I also wasn't running a rebuild or parity check when Drive 4 disabled. But I suppose it's always possible that something in the background could have caused a power spike without me realizing. My thought is to go ahead with the PCI controller swap and test, if I'm still seeing issues then I can look at power supply replacement. That is unless you feel 600w is close enough to limit that you think it's worth pursuing concurrently?- Data Loss After "New Config" - Seeking Recovery Advice
Alright looks like we took 1 step forward and 2 steps back haha. Good News: I ran through your instructions and when I unassigned Disk 6 and started the array, Disk 6 remained in the array list and is now listed as "Not Installed" AND the array started emulating the missing data! I did a quick spot check which confirmed ALL missing files appear to be visible (all data now available through emulation). I also pulled a diagnostics report and have it attached, it is the one from 07:50. I am almost positive that this diagnostics report finished pulling prior to the bad news. Bad News: As I was spot checking (maybe 2-3 minutes after starting the array) I had a repeat of the connection issue that I had on my 2nd Parity Rebuild attempt (see above). This time Disk 4 was disabled and had it's contents emulated (instead of Disk 5 which was disabled during the 2nd Parity Attempt) along with Parity2 having a "UDMA CRC error count: 5" (Parity2 had a UDMA CRC error count: 1 when Disk 5 was disabled). I pulled a second diagnostic report at 07:57 then stopped the array and shut down. I have not powered on since. So more troubleshooting but unsure how to proceed. Connection Options: Addressing the connection issue seems like the top priority. I only noticed the problems start when I upgraded to an LSI 9207-8i raid controller so I am fairly confident that is the root cause of the connection issue. I figure I have 3 options... 1.) I could replace the raid controller with a brand new one. (leaning towards this option to rule out as much as possible) 2.) I could change the controller to different PCI slot and purchase higher quality 8087 SATA cables to see if that solves the issue. 3.) Unfortunately, I ran out of ports on my old raid controller so I can't revert back to one that I know works without disconnecting drives. But I do have a GLOTRENDS SA3120J SATA multiplier that I could try (although being a SATA multiplier it is likely too slow for a parity rebuild but could be used for troubleshooting) Array Options: Once I get the connection issue sorted then here's where I stand with the array. Parity2 has had two UDMA CRC errors but I think they are both related to physical connection issues. The parity drives should still be valid once the connection issue is fixed. The data on Disk 4 should be fine but the drive is currently disabled/emulated and needs to be reconnected without triggering a parity rebuild or a disk clear. Disk 6 is empty (as it's a new drive) and the data does appear to be emulating so it "just" needs a full parity rebuild started. Unsure how to proceed as last time this is where I "misplaced" Disk 6 while getting Disk 5 back into the array so my confidence it totally shot lol. towerarchives-diagnostics-20250417-0750.zip towerarchives-diagnostics-20250417-0757.zip- Data Loss After "New Config" - Seeking Recovery Advice
Sounds good, and thanks for all the help with this! I started the array in normal mode with Disk6 unassigned. Unfortunately it did not emulate the data, Disk6 just disappeared from the array list. I pulled the diagnostics and attached it below though. Thanks again! towerarchives-diagnostics-20250417-0711.zip- Data Loss After "New Config" - Seeking Recovery Advice
I have attached the two most recent diagnostics logs that I have. The one from 4/16 is the current state and the one from 4/10 is the second most recent log I have available. I have not used the disks since I ran the New Config. The only changes to the disks that may have occurred was Unraid attempting to disk-clear Disk 6 when I started the array in an attempt to get it to rebuild from parity. However, disk 6 was empty and the two parity drives and disks 1-5, 7, 8 should not have any changes. I did run a number of read/write operations between the 4/10 diagnostic log and my first parity rebuild attempt on 4/15 but I don't think that matters. I had also recently added drives to the array but I am almost positive that this occurred prior to the 4/10 diagnostics log (but memory is starting to fail me with all the troubleshooting haha). I do have the original Disk 6. It's sitting in the unassigned drives section with the 3 TB of data it had before the initial loss of connection due to the faulty SATA cable. I started using a fresh hard drive to save the 3 TB in case I was unable to get parity to restore the remaining 12 TB. But it's there if needed. towerarchives-diagnostics-20250410-1540.zip towerarchives-diagnostics-20250416-1034.zip- Data Loss After "New Config" - Seeking Recovery Advice
Unraid Version: 6.12.4 Hello Unraid community, Before I begin it’s worth mentioning that already resigned to the idea that I’ve lost the data in question. I am reaching out to the forum just to make sure I’ve exhausted every avenue prior to writing the data off as lost and to learn as much as I can from the numerous mistakes. I am currently using Unraid 6.12.4 and lost access to about 12 TB of emulated data due to failed drive connections during a parity rebuild which culminated in an ill-advised "Tools > New Config" operation. Here's the situation: Original Disk 6 Failure: I was in the process of an array expansion and had installed a 6th disk into my array. I had added about 3 TB of data before it failed due to a physical connection issue. Unraid emulated Disk 6 using dual parity drives and I foolishly put off fixing the connection issue until after 12 TB of additional data had accumulated. First Parity Rebuild Attempt: I replaced Disk 6 with a new hard drive to preserve the 3 TB I had while was troubleshooting. I started the parity rebuild, but was encountering slow rebuild speeds (approx. 20 mb/s). Because of this I stopped the rebuild and upgraded the raid controller which was the most likely bottleneck. Second Parity Rebuild Attempt: After the controller install I restarted the parity rebuild but again had slow speeds (approx. 400 mb/s). Within 60 to 90 seconds of starting the rebuild (and before I had a chance to troubleshoot) Parity 2 flagged a UDMA CRC error and Disk 5 lost connection and was disabled. Third Parity Rebuild Attempt: I believe the issue with Parity 2 and Disk 5 were both again related to physical connection issues. So after I burned all my existing SATA cables and replaced them with new ones I have had no connection issues. Unfortunately, following the second rebuild attempt, I was left with a new issue of Disk 5 being connected but remaining in a “Device Disabled” status. "New Config" Disaster: Sadly in my haste to fix Disk 5 so I could get back to fixing Disk 6 (my fault for rushing), I followed a recommendation from ChatGPT to run "Tools > New Config" with "Retain current assignments" and "Parity is already valid." This fixed the Disk 5 issue but cleared the association between Disk 6 and its emulated data which is no longer available. Now, Unraid sees the new hard drive assigned to Disk 6 as a brand new drive, and won’t rebuild from Parity. Recovery Attempts: I've tried various mdcmd commands and examined disk.cfg, but Unraid still wants to format the drive. My Goal: To recover the 12TB of emulated data from the parity drives and restore Disk 6 if possible. My Question: Again I’m resigned to the fact that I’ve probably lost the data but wanted to ask if anyone has any thoughts or options to try? Is there a way to force Unraid to recognize the parity information and rebuild Disk 6 correctly? I'm open to any suggestions, but mostly just want to ensure I’ve tried everything before calling it a loss. I can provide any additional information or logs as needed. Thank you for your time and expertise! Crucial Information: Unraid version: 6.12.4 Parity drives are healthy. I do NOT have a complete backup of the emulated 12TB. I have a disk.old file, but it's identical to my current disk.cfg. Diagnostics logs show "Invalid superblock magic number" errors for Disk 6. I have a diagnostics zip from shortly before the "New Config" operation (April 10, 2025, if relevant). - Missing Cache Pool and New Config Broke Array After 7.2.3 Update – Need Recovery Help