Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

CRC Errors on cache drive

Featured Replies

I just started getting crc errors on one of my cache drives. i've never had issues with this drive before, but in the last hour or so it's racked up about 4000 errors. I'm assuming i need to replace the drive but i'm not sure how to replace the cache drive. it's part of a 3 drive raid 1 pool, so am i safe to just stop the array, rip out the old drive and throw a new one in, or do i need to move all the files of the cache drives?

  • Community Expert

Replace the SATA cable first.

  • Author

well it's using one of those sas to 4x sata adapters from the hba. is it likely that the cable suddenly went bad?

  • Author

also once i replace the cable, do i need to do anything else? if it was the array i'd run a parity check after, but i don't know if the cache needs anything like that

  • Community Expert

UDMA CRC errors are almost always a SATA cable problem, and any cable can go bad at any time.

  • Author

so i just tried stopping the array so that it wouldn't keep getting errors while i wait for a replacement cable. The server just sat with the loading thing for a while and wouldn't load any other pages. Now the web ui is responsive again but the array still isn't stopped and the log is filled with this message

 

Nov  8 19:32:50 Tower kernel: BTRFS error (device sdf1): error writing primary super block to device 2

 

sdf is the first drive in the cache pool. but sdg which is the second in that pool is the one that was giving the crc errors

  • Author

ok, the array finally stopped. Hopefully my cache data still exists

  • Author

so i just reseated the cables and restarted the server and now most of my docker containers using appdata won't start and a few of them have errors saying about a read only file system. The syslog also has some errors about a checksum and it giving a lot of I/O errors from the same drive as before. I'll attach a new diagnostic to this post. So on a scale of 1-10, how screwed am i in terms of the data on the cache pool?

 

tower-diagnostics-20231108-2045.zip

  • Author

also ignore the drive errors from sde. that's an unassigned drive in a slot that i know is bad

  • Author

@JorgeB any other advice you have for what I can do would be appreciated

  • Community Expert

Looks like the pool is corrupt but there's a lot of log spam because of the bad drive, disconnect it and post new diags after array start.

  • Author

ok, i've disconnected both of the bad drives and it seems to be working now. The cache pool had enough space to just convert into a 2 drive raid 1 and the main array has been running with a missing disk for about 2 months already. So everything seems to be working now. I'm going to try and reinstall the ssd once the replacement cable shows up, but until then, at least the server can limp along.

I'll attach new diagnostics anyway just incase there's anything useful

 

tower-diagnostics-20231109-1056.zip

  • Author

@JorgeB so my situation has got even more strange. So i just replaced the cable and transfered the server to a new case without a cheap hot swap backplane and now the normal drives all seem to be working fine, but now my cache pool is completely dead.

 

So i moved everything, checked all the disks showed up, re-assigned a disk to the main array that had a faulty cable and the started the array and now the cache that was working this morning as a 2 drive raid 1 is just showing 2 disks with the error "Unmountable: Unsupported or no file system". I have no idea what could have happened to them. I did a clean shutdown before hand and all seemed to be working fine since my last update, nothing seemed to be corrupted and everything was working, and now this has happened.

 

tower-diagnostics-20231111-1817.zip

  • Author

So i think i found a bug in unraid. It seems like that ssd that i removed a few days ago was causing the cache to not mount properly. With that drive installed but not assigned to the cache pool (it was a 3 drive pool with the 2nd slot unassigned) the pool would fail to mount, there was also an issue with the superblock size but that was fixed with the command "btrfs rescue fix-device-size" after running a check on the pool. Then when running a "btrfs filesystem show" command, i noticed it was listing the installed but not assigned drive as a part of the pool but with a different storage size (about 80gb less than the drives that were actually assigned to the pool). So i shut down and ripped out that drive again and now the server starts fine and the pool seems to be functioning properly. So i think that maybe when unraid removed the disk from the pool when i removed it the first time, it didn't actually remove it properly and it was still trying to use it as a part of the pool

  • Author

Should i make this into a new post? i think this might be a bit outside of the scope of the original issue at this point

  • Community Expert
Nov 11 15:42:30 Tower emhttpd:  Total devices 3 FS bytes used 847.91GiB
Nov 11 15:42:30 Tower emhttpd:  devid    1 size 931.51GiB used 787.03GiB path /dev/sdc1
Nov 11 15:42:30 Tower emhttpd:  devid    2 size 931.51GiB used 718.03GiB path /dev/sdd1
Nov 11 15:42:30 Tower emhttpd:  devid    3 size 931.51GiB used 787.03GiB path /dev/sdb1
Nov 11 15:42:30 Tower emhttpd: cache: invalid config: total_devices 3 num_misplaced 1 num_missing 0

 

The pool currently consists of 3 devices, but only two are assigned, so it doesn't mount, this is not a bug but by design.

  • Author

but when i removed the ssd and started the array a few days ago. it asked me if i wanted to remove the device and i said yes, shouldn't that mean it was removed from the pool?

  • Community Expert

Possibly it should have but it was not, so all three are required now, you can re-import the pool with all 3 (not just adding the unassigned device, if you need help to re-import let me know) then post new diags.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.