Jump to content

Is Unraid corrupted? Can it be recovered without starting from scratch?


Big Ry

Recommended Posts

Unraid version: 6.12.4

Plex version: Latest Default linuxserver

Radarr/Sonarr version: Latest Dev linuxserver

SAB version: Latest default linuxserver

Hardware: Asrock Z790 steel legend, i7-12700K (no OC), 2x32GB G. SKill DDR5 (no XMP), 2x Samsung 980 Pro 2TB nvme (ZFS cache mirror), 3x Seagate Exos x18 12TB (XFS single parity array)

 

I built my first unraid server in July.  In late August to early September, I began having some odd issues.  After several days of troubleshooting, I eventually determined that I had a memory stick failure.  I had the server running up until the point that I determined it was the memory.

 

I RMA'd the memory kit, which I just got back this week.  I ran Memtest again on the new kit, and it cleared all 4 passes with 0 errors.  So I rebooted unraid.  Everything seemed to be working fine initially, but I noticed I was having issues with Plex.  Plex wasn't recognizing new media that I downloaded with Radarr, despite the files being renamed/moved correctly by Radarr.  I also noticed a weird mismatch for a movie that ive had on my disk for a long time.  Plex would not let me fix the match.

 

I checked to make sure unraid and all apps were up to date.  I ran Fix Common Problems, which found no problems.  I restarted Plex.  All to no avail.  I was thinking I might need to reinstall Plex, but then I checked the unraid system logs and saw several errors and warnings that sound pretty serious.  I am very concerned that the memory stick failure, and my subsequent operation of unraid while troubleshooting, may have caused unraid OS corruption and/or app corruption / data corruption.

 

Can anyone make sense of the attached logs and diagnostics?  I downloaded these prior to rebooting, and it looks like they cover about 2 days (new memory kit was installed 3 or maybe 4 days ago max).  Is there a way to confirm whether or not Unraid is corrupted? And if it is in fact corrupted, is there any way to fix it without having to rebuild from scratch?  I did not get backups configured in Unraid yet, so I do not have a restore point to use (even if I knew a safe point to go back to).  I know I should have prioritized backing up the system, but I never would have guessed that I'd have a major failure only a mere month after setting everything up.  I'm new to unraid, so I am learning as I go here.

funraid-syslog-20230930-1203.zip funraid-diagnostics-20230930-0826.zip

Edited by Big Ry
Link to comment
1 minute ago, itimpi said:

A point to note is that Unraid itself effectively cannot get corrupted as a fresh copy is loaded into RAM every time the system is booted.    I did not see any obvious signs of file level corruption on the drives either.

Well that is fantastic news then!  But what else could it be?  USB stick corruption?  Or just simply app corruption?

Link to comment
3 hours ago, JorgeB said:

So looks like despite your resistance I was right about bad RAM, glad that's fixed, while like mentioned the OS itself cannot get corrupt, other flash drive files can, not to mention any data written to the array during that time, ideally you'd start over with a new Unraid config to make sure no corrupt settings files remain and restore all the data from a backup if available.

I'm sorry that I thought it was highly unlikely that 1 month old ram would take a dump right after it tested good on 4 passes of memtest. That's a pretty unlikely occurrence by anyone's standard. 

 

Now your suggestion is just to start completely from scratch with no attempt to diagnose anything? That's a nightmare that I'm trying to avoid.  I didn't even get backups configured yet, so I have none. I only had this server up for a month, and I was learning everything as I go. So I didn't get backups setup yet.  Never in a million years would I have expected it to take a dump so quickly. 

 

Any data can be corrupted, including unraid OS. This suggestion that the system files for unraid cannot be corrupted is nonsensical. What basis do you have for that claim? The system resides on volatile flash storage, so of course it can be corrupted. Though it does not matter the storage medium, any system files for any OS can be corrupted at any time. The fact that unraid loads into and runs in memory means nothing for whether or not the system files can be corrupted. They can be corrupted on the flash drive or corrupted in memory. The latter is not a big deal, the former is not going to fix itself without intervention by the user. There's no such thing as a file that cant be corrupted. 

 

Instead of me potentially wasting a whole bunch of time rebuilding everything from scratch, I want to actually run some diagnostics to rule out individual components. I've already tried reinstalling plex twice, and I'm still having issues. So it's not that. I have all other containers off, so presumably it can't be them causing issues. My media *may* be partially corrupted, but since that's not system or config files, my assumption is that I can deal with that issue later. Actually, this is easy enough to test. So maybe I'll just create a whole new TV folder with new files and see if it'll load in plex. If it does, then it's going to be a heavy lift to replace all media. 

 

So what does that leave as far as diagnostics?

1. Test the flash drive hardware (never done this before, but I know there's software available) 

2. Test psu output (annoying but relatively easy) 

3. Further test nvme drives (not sure if there's any other way to do this besides pulling them, which would be a challenge since I don't have anything to test them in) 

4. Rebuild cache pool from scratch (a lot of work. Might as well rebuilt unraid while at it) 

5. Test the mother board (presumably a massive headache. Probably easier to try reinstalling unraid first) 

 

It goes without saying that the first thing I plan to do after finally fixing everything is to setup backups. But it would be pointless to do this until I have confirmed that the hardware isn't defective and that the system is running as intended. So that is the necessary next step, and doing the quickest/easiest tests first is the obvious order to follow, as it is with any system. 

Link to comment
2 hours ago, Big Ry said:

Any data can be corrupted, including unraid OS. This suggestion that the system files for unraid cannot be corrupted is nonsensical

I think it is really trying to say that if Unraid successfully boots the UnraidOS files cannot be corrupt as the boot process checks that the archive files match their checksums as part of the boot process of loading the OS into RAM.    If the archive files actually DO get corrupted and the boot process thus fails then it is easy to rewrite good copies to the flash.

Link to comment
48 minutes ago, itimpi said:

I think it is really trying to say that if Unraid successfully boots the UnraidOS files cannot be corrupt as the boot process checks that the archive files match their checksums as part of the boot process of loading the OS into RAM.    If the archive files actually DO get corrupted and the boot process thus fails then it is easy to rewrite good copies to the flash.

If that is the case, then why would anyone ever rebuild unraid? That is what everyone is telling me to do right now, but if unraid cannot be corrupted then why would I waste my time rebuilding it? It's seems contradictory to say unraid can't be corrupted but I still need to rebuild it.  What am I missing here? 

Link to comment
34 minutes ago, Big Ry said:

If that is the case, then why would anyone ever rebuild unraid? That is what everyone is telling me to do right now, but if unraid cannot be corrupted then why would I waste my time rebuilding it? It's seems contradictory to say unraid can't be corrupted but I still need to rebuild it.  What am I missing here? 


The flash drive can become corrupted - but that can easily be rebuilt with the standard OS files.   
 

There is always the possibility of configuration files becoming corrupted.

Link to comment
40 minutes ago, Big Ry said:

If that is the case, then why would anyone ever rebuild unraid? 

There is no reason to "rebuild unraid". As mentioned the OS itself can't become corrupted without it being extremely obvious (i.e. it won't boot), but any user config and data can be and that's what may need rebuilding/restoring to a known good state before hardware issues occurred.  

Edited by Kilrah
Link to comment
2 minutes ago, itimpi said:


The flash drive can become corrupted - but that can easily be rebuilt with the standard OS files.   
 

There is always the possibility of configuration files becoming corrupted.

Well that's basically what I've been trying to say. Maybe I'm not using the correct terminology, but I've been saying that it's absolutely possible for any file on that flash drive to be corrupted or otherwise become unusable. Or the drive itself could have a hardware failure.  This may prevent the OS from loading, or it may not. If the checksums are stored on the USB, then they're also vulnerable.  So anything can happen. The checksum process itself could fail. Without intimate knowledge of the code and the built in failsafes, nobody can say for sure what might happen in any given situation of file corruption.  I'm not saying it's likely at all, just saying it's possible. 

 

But forget that whole argument anyway. I just want to know what will be the most likely cause of my issues if nobody can derive anything from my logs and diagnostics. If this stick failure could have 'messed my system up' (for lack of more precise terminology), then it would be reasonable to say that I might need to rebuild unraid, right? That's a bit of work on my end, but if this is the most likely cause of my problems then I'll do it. I was just trying to check off the easy stuff first, like checking the USB stick or psu voltages or whatever else. 

Link to comment
18 minutes ago, Kilrah said:

There is no reason to "rebuild unraid". As mentioned the OS itself can't become corrupted without it being extremely obvious (i.e. it won't boot), but any user config and data can be and that's what may need rebuilding/restoring to a known good state before hardware issues occurred.  

How would I restore config without rebuilding? I don't have backups. So if I can't at a minimum verify config in the gui, then I have to rebuild, right? 

Link to comment

If you don't have data backups keep the data, most should be fine and if they are media files corruption could just be a small of even imperceptible glitch during playback.

 

For the rest and to make sure there are no corrupt config files and docker and VM services corruption, I would delete docker/VM images, appdata folder, then redo the flash drive, restore from the old flash drive only the key, super.dat and pools folder for the device assignments, and re-configure everything else, this would basically guarantee no corruption possible with anything except possibly the existing data.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...