Send notification for file system corruption


Recommended Posts

34 minutes ago, itimpi said:

This sounds like it is something that might be quite easy to do (at least for SATA drives) in a plugin.  Does anyone know if there is already a plugin (or docker)n that attempts this - if not I might look into trying to put together a plugin myself.   Having said that I am sure it will end up being harder than it sounds - these things nearly always are.

 

It's thankfully even much easier than that, at least for the 'short' tests - just need to run:

smartctl --smart=on --offlineauto=on --saveauto=on /dev/sdX

Output looks like:

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.19.17-Unraid] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.
SMART Attribute Autosave Enabled.
SMART Automatic Offline Testing Enabled every four hours.

 

As to the 'long' tests though, I'd not really looked at my disk monitoring setup since the 6.9 (?) changes to how modprobe functioned in UnRAID (not that it's directly related lol), but it should *REALLY* simplify this since smartd comes with the base OS as part of smartmontools. Copy /etc/smartd.conf to to your flash drive, then denote the drives to run against and when you'd like them ran - then link the file to the source location, and you're done (I added it to my list of symlinks which get generated on first array start).

 

I just removed my original scripts for this, much cleaner now! I'd imagine there's a relatively easy way to populate the conf file based on the disk settings given the user chooses the controller type for their drives as part of the disks config within UnRAID already, so finding / sorting / populating based on that UnRAID config should make it pretty well fully hands off for the user 🎉

Edited by BVD
Link to comment

Also wanted to note - while a plugin version would be great as a bandaid, this kind of thing really should be part of the core OS longer term imo (so it doesn't require the user to seek out the plugin to monitor their drives)... It's one of the few areas where I feel that UnRAID lags significantly behind as a platform.  It's one of the most basic functions of a NAS, so hopefully this can eventually make it in to the OS - no new packages needed, just management / UI thankfully!

 

My justification here is mainly that every other NAS OS out there, both free / open source, as well as commercial options, they all have this baked in (all I've ever put my hands on at least!) - 

 

OpenMediaVault:

18472-smart2-jpg

 

Truenas:

TasksSMARTTestsAdd.png

 

Rockstor - it requires manual input, but is still all UI with tooltips, so... I guess that counts? lol:

devicescan_a_directive.png

 

(etc etc)

Link to comment
16 minutes ago, BVD said:

Also wanted to note - while a plugin version would be great as a bandaid, this kind of thing really should be part of the core OS

I agree, before the main storage was mostly on the array, and since running a scheduled parity check accomplishes the same as an extended SMART test it wasn't really needed, but now with zfs and people having possibly large raidz pools, scheduled SMART tests would be a good idea, since a scrub only checks the used part of the disk, not the full disk, as a parity check or extended SMART test.

Link to comment

Fix Common Problems will not solve it because of what i have read it will only trigger if the FS is unmountable.

This will only occur if you start/stop the array, which isnt the case so often.

 

And in my case even with the metadata corruption the FS was mountable.

 

Also a Smart Check would have not captured the error, the disks are fine, and no they are not 10 years old.

 

Would a regular parity check capture the problem? Dont know but in worst case you carry around the metadata corruption for four weeks before you know, if you do them monthly.

 

On the other hand the solution would be simple, include something like the syslog notification script directly into unRaid. I mean the solution is there.

 

Link to comment
  • 1 month later...
  • 5 weeks later...

Btw. the HDD which had the problem with the corrupted XFS FS (which i hadnt had a notification about) also was the cause for my current data loss under unRaid.

The HDD seem to had some silent corruption problem which led to Parity Sync errors, which invalidates Parity during a correcting check.

 

Means after that data loss is inevitable, because after that you can only rebuild the HDD in an unclean state. I dont know if Dual Parity etc. would have change something here.

But i know now: If you have FS Problems or Sync Errors in your unRaid you are pretty lost.

Especially Sync Errors mean that you have to check every component with two parity checks which put a lot of load on all components.

Link to comment
On 9/1/2023 at 6:24 PM, itimpi said:

if not I might look into trying to put together a plugin myself.

I'm a former OpenMediaVault user and the SMART checking feature is one of the features I focus on using.
So it would be very helpful if someone could make a plugin that can schedule tasks to run SMART checks.

Link to comment
  • 1 month later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.