Jump to content

CRC Errors on cache drive


nmills3

Recommended Posts

I just started getting crc errors on one of my cache drives. i've never had issues with this drive before, but in the last hour or so it's racked up about 4000 errors. I'm assuming i need to replace the drive but i'm not sure how to replace the cache drive. it's part of a 3 drive raid 1 pool, so am i safe to just stop the array, rip out the old drive and throw a new one in, or do i need to move all the files of the cache drives?

Link to comment

so i just tried stopping the array so that it wouldn't keep getting errors while i wait for a replacement cable. The server just sat with the loading thing for a while and wouldn't load any other pages. Now the web ui is responsive again but the array still isn't stopped and the log is filled with this message

 

Nov  8 19:32:50 Tower kernel: BTRFS error (device sdf1): error writing primary super block to device 2

 

sdf is the first drive in the cache pool. but sdg which is the second in that pool is the one that was giving the crc errors

Link to comment

so i just reseated the cables and restarted the server and now most of my docker containers using appdata won't start and a few of them have errors saying about a read only file system. The syslog also has some errors about a checksum and it giving a lot of I/O errors from the same drive as before. I'll attach a new diagnostic to this post. So on a scale of 1-10, how screwed am i in terms of the data on the cache pool?

 

tower-diagnostics-20231108-2045.zip

Link to comment

ok, i've disconnected both of the bad drives and it seems to be working now. The cache pool had enough space to just convert into a 2 drive raid 1 and the main array has been running with a missing disk for about 2 months already. So everything seems to be working now. I'm going to try and reinstall the ssd once the replacement cable shows up, but until then, at least the server can limp along.

I'll attach new diagnostics anyway just incase there's anything useful

 

tower-diagnostics-20231109-1056.zip

Link to comment

@JorgeB so my situation has got even more strange. So i just replaced the cable and transfered the server to a new case without a cheap hot swap backplane and now the normal drives all seem to be working fine, but now my cache pool is completely dead.

 

So i moved everything, checked all the disks showed up, re-assigned a disk to the main array that had a faulty cable and the started the array and now the cache that was working this morning as a 2 drive raid 1 is just showing 2 disks with the error "Unmountable: Unsupported or no file system". I have no idea what could have happened to them. I did a clean shutdown before hand and all seemed to be working fine since my last update, nothing seemed to be corrupted and everything was working, and now this has happened.

 

tower-diagnostics-20231111-1817.zip

Link to comment

So i think i found a bug in unraid. It seems like that ssd that i removed a few days ago was causing the cache to not mount properly. With that drive installed but not assigned to the cache pool (it was a 3 drive pool with the 2nd slot unassigned) the pool would fail to mount, there was also an issue with the superblock size but that was fixed with the command "btrfs rescue fix-device-size" after running a check on the pool. Then when running a "btrfs filesystem show" command, i noticed it was listing the installed but not assigned drive as a part of the pool but with a different storage size (about 80gb less than the drives that were actually assigned to the pool). So i shut down and ripped out that drive again and now the server starts fine and the pool seems to be functioning properly. So i think that maybe when unraid removed the disk from the pool when i removed it the first time, it didn't actually remove it properly and it was still trying to use it as a part of the pool

Link to comment
Nov 11 15:42:30 Tower emhttpd:  Total devices 3 FS bytes used 847.91GiB
Nov 11 15:42:30 Tower emhttpd:  devid    1 size 931.51GiB used 787.03GiB path /dev/sdc1
Nov 11 15:42:30 Tower emhttpd:  devid    2 size 931.51GiB used 718.03GiB path /dev/sdd1
Nov 11 15:42:30 Tower emhttpd:  devid    3 size 931.51GiB used 787.03GiB path /dev/sdb1
Nov 11 15:42:30 Tower emhttpd: cache: invalid config: total_devices 3 num_misplaced 1 num_missing 0

 

The pool currently consists of 3 devices, but only two are assigned, so it doesn't mount, this is not a bug but by design.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...