[Plugin] ZFS-companion


Recommended Posts

What? Why?

Consider this plugin like topping on steini84's ZFS Plugin.

 

I love how Unraid makes it easy to run Docker and VMs and to allow for expansion with mismatched drives, but coming from another software I learned to trust ZFS more than other filesystems. If you're reading this, I guess you prefer it too. While the ZFS Plugin brings our loved filesystem, and I fully understand and share steini84's opinion about keeping the plugin pure and simple with just the binaries, I missed a way to keep an eye on the status of my pool without resorting to shell commands or copy-pasted scripts. In fact I was not fully trusting the pool just because I was not monitoring it adequately. Judging by some threads I was not the only one, so...

 

Enter ZFS-companion.

 

What does it do?

Right now it's just a dashboard widget. It shows the general health of all your pools, plus a list of all the zpools with their status and last scrub information.

image.png.ae1f92b8ec6a8023c91b0f3886843502.png

 

I don't have ETAs, but I have some ideas of what could be added to make it more useful (not necessarily in order):

  • Full (secondary?) widget in the disks section of the dashboard
  • Section in the Main screen, something like Unassigned Devices does for other filesystems.
  • Integrated scripts for scrubbing and error reporting, to avoid copy-pasting from different places
  • Shares management
  • Maybe with some detailed page about more detailed info (pool properties? snapshot list?)

 

How to install

Install it directly (Plugins -> Install Plugin -> Enter Url then click INSTALL):

https://raw.githubusercontent.com/GiorgioAresu/ZFS-companion-unraid/main/ZFS-companion.plg

 

 

 

If you have suggestions or issues you can post them below.

If you can provide examples of different messages for pools status, scrub results, errors, and so on please write them (PM if you want) because I'm having difficulties finding all possible values.

 

Troubleshooting

If you're having issues or the state is not what you'd expect, please post the output of the following commands:

zpool status -x

zpool status -v

zpool list

 

Edited by campusantu
Troubleshooting info
  • Like 1
Link to comment

Tnx. Updated and its already better but only the 1st of the 3 pools now show info.
The last 2 (virtuals and virtuals2) keep showing no info.
Any info i can get you to help debug, just let me know.

p.s.
> zpool version
zfs-2.0.3-1
zfs-kmod-2.0.3-1

 

edit: 

Looks like something with your grabbing of fields/delimiters

image.png.c64fb91a054e52e326c07c47d6907eb2.png

Edited by glennv
Link to comment

Plugin shows pool status as "unhealthy".

grafik.png.d0e2138174476a5a08c2b724390124a6.png

On the shell I'v got:

root@server:~# zpool status -v
  pool: SSD
 state: ONLINE
status: One or more devices are configured to use a non-native block size.
        Expect reduced performance.
action: Replace affected devices with devices that support the
        configured block size, or migrate data to a properly configured
        pool.
  scan: scrub repaired 0B in 00:02:14 with 0 errors on Wed Apr 14 16:23:24 2021
config:

        NAME                                STATE     READ WRITE CKSUM
        SSD                                 ONLINE       0     0     0
          sdi                               ONLINE       0     0     0  block size: 512B configured, 4096B native
          ata-INTENSO_270E0782016C00812123  ONLINE       0     0     0

errors: No known data errors
root@server:~# 

 

Any ideas how to fix the status message to become the pool healthy (without data loosing, of course)?

Link to comment

What is the output of

zpool status -x

and

zpool list

?

On mine it says

root@Unraid:~# zpool status -x
all pools are healthy

 

I'll look into it, to see why it reports it as "not healthy" even if the pool is in fact online

Edited by campusantu
Ask for other command
Link to comment
14 hours ago, campusantu said:

What is the output of



zpool status -x

 

 

root@server:~# zpool status -x
  pool: SSD
 state: ONLINE
status: One or more devices are configured to use a non-native block size.
        Expect reduced performance.
action: Replace affected devices with devices that support the
        configured block size, or migrate data to a properly configured
        pool.
  scan: scrub repaired 0B in 00:02:14 with 0 errors on Wed Apr 14 16:23:24 2021
config:

        NAME                                STATE     READ WRITE CKSUM
        SSD                                 ONLINE       0     0     0
          sdi                               ONLINE       0     0     0  block size: 512B configured, 4096B native
          ata-INTENSO_270E0782016C00812123  ONLINE       0     0     0

errors: No known data errors
root@server:~#

 

14 hours ago, campusantu said:

and



zpool list

?

root@server:~# zpool list
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
SSD    349G  47.9G   301G        -         -    19%    13%  1.00x    ONLINE  -
root@server:~#

 

 

Edited by JoergHH
Typo
Link to comment

Personally i think the plugin is correct as your pool is not 100% healthy as the message indicates. Its working but it is not as it should be.

This is how it should look :

image.png.3bf026d5947dcc220e076022e8ad1060.png

 

But what would be good is that "if" the status is not 100% healthy, the plugin shows the status message , so you know why and can act on it or ignore if its ok with you. The whole pupose of a dashboard.

So your situation should "not" be reported as healthy but maybe as warning or attention.

Edited by glennv
Link to comment

I'm no ZFS expert, so I'm open to discussion.

From what I found, zpool status -x is the preferred way of getting a synthetic status report. I was thinking of providing an alternative method of checking if all pools are reported as ONLINE, but found examples of pools with errors still being reported as ONLINE (see: https://docs.oracle.com/cd/E19253-01/819-5461/gavwg/index.html, under 

Determining the Type of Device Failure. That's Oracle's ZFS documentation, not the one we're using but I assume they work the same way), hence I think it's not a good idea because you would have no idea something is wrong with the pools.

So I would agree with glennv here about not saying the pool is healthy.

I'm ok with introducing a "warning" state instead of just healthy/unhealthy, but what would the criteria be for the pools reported by zfs status -x

  1. The pools are ONLINE
  2. status is ok-ish (we would need to define a whitelist)
  3. errors is "No known data errors"
  4. something else?
Link to comment

I would not overthink it .  

My 2 cents:

If the zfs status -x does not show all is healthy , just flag it as not healthy and additionaly show the status and action fields contents, which are designed to tell you what is going on.

So rather then trying to interpret and grade the level of severity , you just spit out what zfs gives us.

 

  • Like 1
Link to comment

I added a tooltip.

I might add the ability to ignore an unhealthy status (that would reset when the status changes). While the pool may not be 100% healthy, in JoergHH's case he may choose not to resolve the issue but the persistent unhealthy status could lead to him not noticing should a different problem/warning arise. What do you think?

 

On 4/14/2021 at 6:54 PM, JoergHH said:

Any ideas how to fix the status message to become the pool healthy (without data loosing, of course)?

Sorry for ignoring your question, from what I found you would need to move the data away, reformat the pool, and move it back, as the block size cannot be changed.

Link to comment
14 hours ago, campusantu said:

I added a tooltip.

I might add the ability to ignore an unhealthy status (that would reset when the status changes). While the pool may not be 100% healthy, in JoergHH's case he may choose not to resolve the issue but the persistent unhealthy status could lead to him not noticing should a different problem/warning arise. What do you think?

 

An ignore option for a specific state/status message that resets when the state and or status message changes. That is an interesting idea. That way you will still notice when it changes to a different state than the one you ignored.

 

Link to comment
15 hours ago, campusantu said:

Sorry for ignoring your question, from what I found you would need to move the data away, reformat the pool, and move it back, as the block size cannot be changed.

Never mind. I have already found out myself that SSDs under ZFS "lie" to you about block size. See https://github.com/openzfs/zfs/issues/6373

I will rebuild the pool soon but then with ashift=12.

 

And the pool is actually not really "unhealthy", but rather only unfavorably configured, because it runs error-free. I would appreciate if the plugin in such cases displays the pool as "healthy" but with a warning or even better info.

 

Otherwise, it seems as if you have to intervene, which is not absolutely necessary.

Link to comment

Thanks for this, lost passwordless ssh due to some permission issue, and since it happened right after installing the plugin I suspected it was the problem, just tested now on another server and it happened again, this was the error logged:

 

Apr 20 18:27:30 Tower15 sshd[15259]: Authentication refused: bad ownership or modes for directory /

 

Rebooting solves the problem it but it should be fixed in the plugin install, I can't really help with what's needed to fix it but @Squidshould be able to help you if needed.

 

Link to comment

Me too, I thought it couldn't possibly be the plugin but it seems that when extracting plugins it overwrites the filesystem permissions with those from the package:

 

 

It should be fixed now, the problem was I forgot to run the build command as root so it couldn't change permissions before packaging. Sorry for that! :)

 

Unrelated: @JoergHH the ability to ignore a specific unhealthy status is in the works, I'm working on how plugins are supposed to use settings, hang on a couple more days ;) You will be able to flag your current status and it will report as healthy until the status of any pool changes, so you won't miss any warning/errors/etc.

Edited by campusantu
Fix link and add progress update
  • Like 1
Link to comment

Aha, so thats why i suddenly had ssh issues as well.
Fixing the mentioned permissions on the flash drive after i found it after a long time fruitless ssh troubleshooting solved it but had no idea why this suddenly happened after years of working fine without issues or changes.
But indeed i had also recently installed this plugin.
Good to know. I dont like unsolved misteries ;-)

Link to comment
18 hours ago, campusantu said:

Unrelated: @JoergHH the ability to ignore a specific unhealthy status is in the works, I'm working on how plugins are supposed to use settings, hang on a couple more days ;) You will be able to flag your current status and it will report as healthy until the status of any pool changes, so you won't miss any warning/errors/etc.

All right. Don't worry, I can wait, especially since I'm going to recreate the pool with a different ashift value in the future anyway, as described above.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.