Jump to content

ZFS failed drive - no notification - failed drive has matching serial number


Go to solution Solved by JorgeB,

Recommended Posts

Okay, long story short i hope. I have a ZFS cache pool. It has 2 nvme drives. I never noticed it, but both drives show up with the same serial number like this:

image.thumb.png.a6fa2951e313fe8ecb43761724cbd238.png

 

I have been fighting some high IOWAIT times for a while now. Finally bit the bullet, and in the process of backing up, taking screenshots of things. I noticed that the serial numbers matched. I thought that was weird, so I went to look into the the Cache drive. And I found this:

image.png.3b39e441c64a007071360dc08c8494f1.png

 

How could a drive with the same serial propagate into both drive slots? How could an array with a failed drive NOT SAY THERE IS A PROBLEM?

For the record, this system has been up for over a year on this hardware. Here is the config for the mirror settings:

image.thumb.png.0da1ee812f5d35d0e223a733f4c396b1.png

 

Going forward to fix this my plan is to LEAVE the drive with serial ending in 1102, and PULL the other drive. Then let a rebuild occur after switching the drive out. 

 

Any other ideas? This is wild, that I never got a notification about a degraded drive. Or even an obvious error anywhere, I can only see the issue in the cache drive config page.

Link to comment
  • Solution
1 hour ago, Cody Peters said:

I noticed that the serial numbers matched.

They are similar but don't match, they are two different devices, also note nvme0n1 and nvme1n1.

 

1 hour ago, Cody Peters said:

How could an array with a failed drive NOT SAY THERE IS A PROBLEM?

At the moment there are no notifications for pool issues, this a a long request of mine, maybe soon, for now see here:

 

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=700582

 

Link to comment
2 minutes ago, JorgeB said:

They are similar but don't match, they are two different devices, also note nvme0n1 and nvme1n1.

 

At the moment there are no notifications for pool issues, this a a long request of mine, maybe soon, for now see here:

 

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=700582

 

Haha, good catch.... what luck to buy 2 drives at the same time with the last 5 digits of the serial matching. Okay, so I will pull the identified failed drive MKM0042000204P1102, let the server rebuild before doing anything else.

 

I am dumbfounded that this has been an issue since 2018. Why why why.... I thought this was me specific, I am very unhappy to see this to be an OS issue. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...