[found to most likely be SSD failure] Help - fix common issue reports "unraid Drive mounted read-only or completely full" -- found to be btrfs cashe nvme issue


Go to solution Solved by JorgeB,

Recommended Posts

Hello,

 

Sorry, am a user but not a very technical one, so this has just happened and not very sure of how to resolve yet.

Fix Common Issues plugging has reported to me that a drive, not my default cache drive but a second one is read only. As in the title: "unraid Drive mounted read-only or completely full. Begin Investigation Here"

 

I think I found in the system log the error it's mentioning, yet I don't know how to decipher it.

 

I attacked the diagnostic zip, as well as a txt file for what I see when I click on the disk log info for the drive.

Don't think it helps at all, but made sure the drive was balance and scrubbed, did also perform a filesystem with the following (--readonly):

Quote

[1/7] checking root items

[2/7] checking extents

Error reading 1162723328, -1

Error reading 1162723328, -1

bad tree block 1162723328, bytenr mismatch, want=1162723328, have=0

owner ref check failed [1162723328 16384]

ERROR: errors found in extent allocation tree or chunk allocation

[3/7] checking free space tree

[4/7] checking fs roots

Error reading 1162723328, -1

Error reading 1162723328, -1

bad tree block 1162723328, bytenr mismatch, want=1162723328, have=0

[5/7] checking only csums items (without verifying data)

Error reading 1162723328, -1

Error reading 1162723328, -1

bad tree block 1162723328, bytenr mismatch, want=1162723328, have=0

Error going to next leaf -5

[6/7] checking root refs

[7/7] checking quota groups skipped (not enabled on this FS)

Opening filesystem to check...

Checking filesystem on /dev/nvme0n1p1

UUID: 4ad3bcf9-112e-4303-b54e-ca4ba41c8365 found 287609282560 bytes used, error(s) found

total csum bytes: 279989224 total tree bytes: 895320064

total fs tree bytes: 493961216

total extent tree bytes: 48168960

btree space waste bytes: 206223021

file data blocks allocated: 288377303040

 referenced 286613630976



Was thinking of doing a filesystem check with "--repair", but from what I could find, it's very much advised to avoid, plus I wouldn't know how to enter it in, cause I think it wants a input after to confirm, so either would need to do it from the console, or something like "--repair -y"

 

My apologies if if this seems a easy thing, just not the most technical. Also worried it's the drive, it's a Samsung 970 evo plus 1tb, a bit over a year old. Am planning on adding another for parity this year, just figured they'd last longer (I did read recently about samsung drives failing, but thought my model was in the clear)

 

Thanks in advance for any assistance provided, though I'll thank you again in the replies.

oracle-diagnostics-20230301-2036.zip nvme0n1p1.txt

Edited by Naustradamus
Link to comment
  • Naustradamus changed the title to Help - fix common issue reports "unraid Drive mounted read-only or completely full"

So as I've been looking into it, I had found this post linking to this.

 

Am having a problem with step 2, as dumb as it sounds, I don't know how to create a folder on a single drive outside of a share. I'm assuming it must be while unraid is in maintenance mode, but I'm not sure of that either.

 

On side note, found some btrfs commands and tried rescue fix-device-size, no device size related problems found.

Bellow was the second command I tried, and what returned.

 

Quote

btrfs rescue chunk-recover /dev/nvme0n1p1
Scanning: DONE in dev0                
corrupt leaf: root=1 block=1311653888 slot=0, unexpected item end, have 16283 expect 0
Couldn't read tree root
open with broken chunk error

 

Tried clear-space-cache v1 and v2, long shot at has noting to do with the error message I think but yeah, still same error on btrfs check.

 

Not sure if a --repair would fix the type of issue, or if it is the nvme going bad. Unless someone advises me to try the repair, think my next step is the restore method but me being me, not getting the steps to get things ready before 'btrfs restore -v'

 

Just updating to where I'm at currently

Link to comment
  • Naustradamus changed the title to Help - fix common issue reports "unraid Drive mounted read-only or completely full" -- found to be btrfs cashe nvme issue - not fixed yet

Thanks for replying!
Sorry for the delay, had went to bed.

 

Posted bellow. Restarted, kept dockers off as the error seems to happen after a certain docker opens and attempts to use the nvme.

 

For further info, this nvme is plex dedicated. Was trying to get the thumbnails, scroll preview, metadata and all to be on SSD to make navigations and such very responsive. Not sure if that info helps, but yeah. I didn't see any error in the new diagnostics, but most likely I don't know what I'd be looking for.

 

Thanks again for the support.

oracle-diagnostics-20230302-0928.zip

Link to comment

Thanks!

 

Dang, looks like I might be out of luck, cause yeah, error happens again once I start the container.

To be fare, I don't think the container was the problem, just that due to the setup listed above, it was the only container that accessed the SSD.

 

For the moment in time, I'll move the files onto my main cache drive, and run the container from there. If that works, I can try and re-format the nvme ... see where it goes from there. Not the vacation I was expecting, oh well. It's only been over a year, still under warranty if needed.

 

Thanks again. Incase you notice anything further, I'm adding one more diagnostic, after the container runs and causes the issue.

oracle-diagnostics-20230302-1010.zip

Link to comment
  • Naustradamus changed the title to [found to most likely be SSD failure] Help - fix common issue reports "unraid Drive mounted read-only or completely full" -- found to be btrfs cashe nvme issue
  • Solution
Mar  2 10:08:17 Oracle kernel: nvme0n1: I/O Cmd(0x2) @ LBA 2272992, 32 blocks, I/O Error (sct 0x2 / sc 0x81) MORE DNR
Mar  2 10:08:17 Oracle kernel: critical medium error, dev nvme0n1, sector 2272992 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0

 

Yep, looks like a failing device.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.