[Plugin] ZFS-companion

campusantu · April 12, 2021

What? Why?

Consider this plugin like topping on steini84's ZFS Plugin.

I love how Unraid makes it easy to run Docker and VMs and to allow for expansion with mismatched drives, but coming from another software I learned to trust ZFS more than other filesystems. If you're reading this, I guess you prefer it too. While the ZFS Plugin brings our loved filesystem, and I fully understand and share steini84's opinion about keeping the plugin pure and simple with just the binaries, I missed a way to keep an eye on the status of my pool without resorting to shell commands or copy-pasted scripts. In fact I was not fully trusting the pool just because I was not monitoring it adequately. Judging by some threads I was not the only one, so...

Enter ZFS-companion.

What does it do?

Right now it's just a dashboard widget. It shows the general health of all your pools, plus a list of all the zpools with their status and last scrub information.

image.png.ae1f92b8ec6a8023c91b0f3886843502.png

I don't have ETAs, but I have some ideas of what could be added to make it more useful (not necessarily in order):

Full (secondary?) widget in the disks section of the dashboard
Section in the Main screen, something like Unassigned Devices does for other filesystems.
Integrated scripts for scrubbing and error reporting, to avoid copy-pasting from different places
Shares management
Maybe with some detailed page about more detailed info (pool properties? snapshot list?)

How to install

Install it directly (Plugins -> Install Plugin -> Enter Url then click INSTALL):

https://raw.githubusercontent.com/GiorgioAresu/ZFS-companion-unraid/main/ZFS-companion.plg

If you have suggestions or issues you can post them below.

If you can provide examples of different messages for pools status, scrub results, errors, and so on please write them (PM if you want) because I'm having difficulties finding all possible values.

Troubleshooting

If you're having issues or the state is not what you'd expect, please post the output of the following commands:

zpool status -x

zpool status -v

zpool list

Edited April 14, 2021 by campusantu
Troubleshooting info

glennv · April 13, 2021

Cool. Tnx but dongt see anythingyet other then the names of my zfs pools:

image.png.260efe1084fe3145efb7cd4bc8124689.png

Here the output of zpool status:

image.png.9ef4ead8b3185bf5824edd22e2cb54a5.png

p.s. ZFS is compiled into the kernel with the ich777 kernel build docker

Edited April 13, 2021 by glennv

campusantu · April 13, 2021

It should be fixed now. My pool is not upgraded to the latest features so it was showing "status" and "action" even if healthy

glennv · April 13, 2021

Tnx. Updated and its already better but only the 1st of the 3 pools now show info.
The last 2 (virtuals and virtuals2) keep showing no info.
Any info i can get you to help debug, just let me know.

p.s.
> zpool version
zfs-2.0.3-1
zfs-kmod-2.0.3-1

edit:

Looks like something with your grabbing of fields/delimiters

image.png.c64fb91a054e52e326c07c47d6907eb2.png

Edited April 13, 2021 by glennv

campusantu · April 13, 2021

Could you post the output of zpool status as text (or file) instead of image to better test the regex please?

thanks :)

glennv · April 13, 2021

sure. Here you go.

zpool_status.txt

campusantu · April 13, 2021

I made a new version, let me know if it fixes it

glennv · April 13, 2021

51 minutes ago, campusantu said:

I made a new version, let me know if it fixes it

Great job. working perfectly fine. Tnx for the quick support

campusantu · April 13, 2021

1 hour ago, glennv said:

Great job. working perfectly fine. Tnx for the quick support

You're welcome, glad it works now have a nice day

JoergHH · April 14, 2021

Plugin shows pool status as "unhealthy".

grafik.png.d0e2138174476a5a08c2b724390124a6.png

On the shell I'v got:

root@server:~# zpool status -v
  pool: SSD
 state: ONLINE
status: One or more devices are configured to use a non-native block size.
        Expect reduced performance.
action: Replace affected devices with devices that support the
        configured block size, or migrate data to a properly configured
        pool.
  scan: scrub repaired 0B in 00:02:14 with 0 errors on Wed Apr 14 16:23:24 2021
config:

        NAME                                STATE     READ WRITE CKSUM
        SSD                                 ONLINE       0     0     0
          sdi                               ONLINE       0     0     0  block size: 512B configured, 4096B native
          ata-INTENSO_270E0782016C00812123  ONLINE       0     0     0

errors: No known data errors
root@server:~#

Any ideas how to fix the status message to become the pool healthy (without data loosing, of course)?

campusantu · April 14, 2021

What is the output of

zpool status -x

and

zpool list

?

On mine it says

root@Unraid:~# zpool status -x
all pools are healthy

I'll look into it, to see why it reports it as "not healthy" even if the pool is in fact online

Edited April 14, 2021 by campusantu
Ask for other command

JoergHH · April 15, 2021

14 hours ago, campusantu said:
What is the output of
zpool status -x

root@server:~# zpool status -x
  pool: SSD
 state: ONLINE
status: One or more devices are configured to use a non-native block size.
        Expect reduced performance.
action: Replace affected devices with devices that support the
        configured block size, or migrate data to a properly configured
        pool.
  scan: scrub repaired 0B in 00:02:14 with 0 errors on Wed Apr 14 16:23:24 2021
config:

        NAME                                STATE     READ WRITE CKSUM
        SSD                                 ONLINE       0     0     0
          sdi                               ONLINE       0     0     0  block size: 512B configured, 4096B native
          ata-INTENSO_270E0782016C00812123  ONLINE       0     0     0

errors: No known data errors
root@server:~#

14 hours ago, campusantu said:
and
zpool list
?

root@server:~# zpool list
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
SSD    349G  47.9G   301G        -         -    19%    13%  1.00x    ONLINE  -
root@server:~#

Edited April 15, 2021 by JoergHH
Typo

glennv · April 15, 2021

Personally i think the plugin is correct as your pool is not 100% healthy as the message indicates. Its working but it is not as it should be.

This is how it should look :

image.png.3bf026d5947dcc220e076022e8ad1060.png

But what would be good is that "if" the status is not 100% healthy, the plugin shows the status message , so you know why and can act on it or ignore if its ok with you. The whole pupose of a dashboard.

So your situation should "not" be reported as healthy but maybe as warning or attention.

Edited April 15, 2021 by glennv

campusantu · April 15, 2021

I'm no ZFS expert, so I'm open to discussion.

From what I found, zpool status -x is the preferred way of getting a synthetic status report. I was thinking of providing an alternative method of checking if all pools are reported as ONLINE, but found examples of pools with errors still being reported as ONLINE (see: https://docs.oracle.com/cd/E19253-01/819-5461/gavwg/index.html, under

Determining the Type of Device Failure. That's Oracle's ZFS documentation, not the one we're using but I assume they work the same way), hence I think it's not a good idea because you would have no idea something is wrong with the pools.

So I would agree with glennv here about not saying the pool is healthy.

I'm ok with introducing a "warning" state instead of just healthy/unhealthy, but what would the criteria be for the pools reported by zfs status -x?

The pools are ONLINE
status is ok-ish (we would need to define a whitelist)
errors is "No known data errors"
something else?

glennv · April 15, 2021

I would not overthink it .

My 2 cents:

If the zfs status -x does not show all is healthy , just flag it as not healthy and additionaly show the status and action fields contents, which are designed to tell you what is going on.

So rather then trying to interpret and grade the level of severity , you just spit out what zfs gives us.

campusantu · April 15, 2021

I added a tooltip.

I might add the ability to ignore an unhealthy status (that would reset when the status changes). While the pool may not be 100% healthy, in JoergHH's case he may choose not to resolve the issue but the persistent unhealthy status could lead to him not noticing should a different problem/warning arise. What do you think?

On 4/14/2021 at 6:54 PM, JoergHH said:

Any ideas how to fix the status message to become the pool healthy (without data loosing, of course)?

Sorry for ignoring your question, from what I found you would need to move the data away, reformat the pool, and move it back, as the block size cannot be changed.

loomitz · April 15, 2021

Working All good :), thanks

glennv · April 16, 2021

14 hours ago, campusantu said:

I added a tooltip.

I might add the ability to ignore an unhealthy status (that would reset when the status changes). While the pool may not be 100% healthy, in JoergHH's case he may choose not to resolve the issue but the persistent unhealthy status could lead to him not noticing should a different problem/warning arise. What do you think?

An ignore option for a specific state/status message that resets when the state and or status message changes. That is an interesting idea. That way you will still notice when it changes to a different state than the one you ignored.

JoergHH · April 16, 2021

15 hours ago, campusantu said:

Sorry for ignoring your question, from what I found you would need to move the data away, reformat the pool, and move it back, as the block size cannot be changed.

Never mind. I have already found out myself that SSDs under ZFS "lie" to you about block size. See https://github.com/openzfs/zfs/issues/6373

I will rebuild the pool soon but then with ashift=12.

And the pool is actually not really "unhealthy", but rather only unfavorably configured, because it runs error-free. I would appreciate if the plugin in such cases displays the pool as "healthy" but with a warning or even better info.

Otherwise, it seems as if you have to intervene, which is not absolutely necessary.

JorgeB · April 20, 2021

Thanks for this, lost passwordless ssh due to some permission issue, and since it happened right after installing the plugin I suspected it was the problem, just tested now on another server and it happened again, this was the error logged:

Apr 20 18:27:30 Tower15 sshd[15259]: Authentication refused: bad ownership or modes for directory /

Rebooting solves the problem it but it should be fixed in the plugin install, I can't really help with what's needed to fix it but @Squidshould be able to help you if needed.

JoergHH · April 20, 2021

@JorgeB

Are you sure you are in the right thread?

I don't see any connection between your problem and the ZFS-Companion plugin discussed here.

campusantu · April 20, 2021

Me too, I thought it couldn't possibly be the plugin but it seems that when extracting plugins it overwrites the filesystem permissions with those from the package:

It should be fixed now, the problem was I forgot to run the build command as root so it couldn't change permissions before packaging. Sorry for that!

Unrelated: @JoergHH the ability to ignore a specific unhealthy status is in the works, I'm working on how plugins are supposed to use settings, hang on a couple more days You will be able to flag your current status and it will report as healthy until the status of any pool changes, so you won't miss any warning/errors/etc.

Edited April 20, 2021 by campusantu
Fix link and add progress update

glennv · April 21, 2021

Aha, so thats why i suddenly had ssh issues as well.
Fixing the mentioned permissions on the flash drive after i found it after a long time fruitless ssh troubleshooting solved it but had no idea why this suddenly happened after years of working fine without issues or changes.
But indeed i had also recently installed this plugin.
Good to know. I dont like unsolved misteries ;-)

JorgeB · April 21, 2021

12 hours ago, campusantu said:

It should be fixed now

Thanks, happy to report it is fixed.

JoergHH · April 21, 2021

18 hours ago, campusantu said:

Unrelated: @JoergHH the ability to ignore a specific unhealthy status is in the works, I'm working on how plugins are supposed to use settings, hang on a couple more days You will be able to flag your current status and it will report as healthy until the status of any pool changes, so you won't miss any warning/errors/etc.

All right. Don't worry, I can wait, especially since I'm going to recreate the pool with a different ashift value in the future anyway, as described above.

[Plugin] ZFS-companion

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation