Cache Drive: Adding 2 more disks to a 2 disk existing cache

December 14, 20178 yr

I guess I'm not understanding what's going on under the hood in Unraid.

I have 2x 500 GB Samsung Evo SSDs in my cache pool right now. They have been operating as expected and total capacity is 500 GB.

I bought 2 more 500 GB Samsung Evo SSDs and put them in. I expected the cache pool to expand to at least 1TB, if not 1.5TB... but that's not the case. The cache pool is still only using 2 devices and the size remains the same. All 4 cache drives are listed under Cace Devices, labled Cache, Cache 2, Cache 3, Cache 4.

There's no configuration option to change things up... so I'm not really sure what I should do at this point to capture that additional space. Ideally, this would be running as a RAID 5 and I could use 1.5 TB and maintain some redundancy. But if it needs to run RAID 1 without any option to do RAID 5, that's ok, as it should give me 1TB of space at least.

Can anyone help?

Thanks

Quote

December 14, 20178 yr

Community Expert

Post your diagnostics.

Quote

December 14, 20178 yr

Author

Which diagnostics do you mean?

Apparently, I have to rebalance the cache pool? Upon reading more about btrfs raid 5, it seems it's unstable and prone to data loss, is that correct? So doing a raid 5 with btrfs is a bad idea?

Quote

December 14, 20178 yr

Community Expert

10 minutes ago, joshz said:

Which diagnostics do you mean?

There is only one: Tools -> Diagnostics

11 minutes ago, joshz said:

Apparently, I have to rebalance the cache pool?

It should add the new devices automatically in raid1 mode, totaling 1TB usable, that's why I'd like to see the diags

11 minutes ago, joshz said:

Upon reading more about btrfs raid 5, it seems it's unstable and prone to data loss, is that correct? So doing a raid 5 with btrfs is a bad idea?

Yes, it should be only used for testing, here are the available profiles, with 4 devices I'd use raid10:

https://forums.lime-technology.com/topic/46802-faq-for-unraid-v6/?do=findComment&comment=480421

Quote

December 14, 20178 yr

Author

I rebalanced with the parameters --dconvert=raid10 --mconvert=raid10 and it's showing 1TB now. It most definitely did not do it automatically though; it required a forced rebalance.

All appears to be well at this point, but the additional drives still don't show up in the diagnostics or listing.

newmediaserver-diagnostics-20171214-1634.zip

Quote

December 14, 20178 yr

Community Expert

8 minutes ago, joshz said:

All appears to be well at this point,

No not well, pool is using single profile with only 2 drives.

Data, single: total=200.00GiB, used=92.63GiB
System, single: total=64.00MiB, used=48.00KiB
Metadata, single: total=2.00GiB, used=506.22MiB
GlobalReserve, single: total=111.39MiB, used=0.00B

Label: none  uuid: 94563407-72d4-414d-94b1-174f382195f5
    Total devices 2 FS bytes used 93.12GiB
    devid    1 size 465.76GiB used 101.00GiB path /dev/sde1
    devid    2 size 465.76GiB used 101.06GiB path /dev/sdf1

It is also corrupt:

Dec 14 14:08:24 NewMediaServer kernel: BTRFS info (device sdf1): read error corrected: ino 1 off 1805320192 (dev /dev/sde1 sector 3526016)
Dec 14 14:08:24 NewMediaServer kernel: BTRFS info (device sdf1): read error corrected: ino 1 off 1805324288 (dev /dev/sde1 sector 3526024)
Dec 14 14:08:24 NewMediaServer kernel: BTRFS info (device sdf1): read error corrected: ino 1 off 1805328384 (dev /dev/sde1 sector 3526032)
Dec 14 14:08:24 NewMediaServer kernel: BTRFS info (device sdf1): read error corrected: ino 1 off 1805332480 (dev /dev/sde1 sector 3526040)
Dec 14 14:08:24 NewMediaServer kernel: BTRFS error (device sdf1): parent transid verify failed on 1805352960 wanted 114187 found 113260
Dec 14 14:08:24 NewMediaServer kernel: BTRFS info (device sdf1): read error corrected: ino 1 off 1805352960 (dev /dev/sde1 sector 3526080)
Dec 14 14:08:24 NewMediaServer kernel: BTRFS info (device sdf1): read error corrected: ino 1 off 1805357056 (dev /dev/sde1 sector 3526088)
Dec 14 14:08:24 NewMediaServer kernel: BTRFS error (device sdf1): parent transid verify failed on 1805369344 wanted 114187 found 113260
Dec 14 14:08:24 NewMediaServer kernel: BTRFS error (device sdf1): parent transid verify failed on 1805467648 wanted 114187 found 113260
Dec 14 14:08:24 NewMediaServer kernel: BTRFS error (device sdf1): parent transid verify failed on 1805484032 wanted 114187 found 113260
Dec 14 14:08:24 NewMediaServer kernel: BTRFS error (device sdf1): parent transid verify failed on 1805500416 wanted 114187 found 113260
Dec 14 14:08:24 NewMediaServer kernel: BTRFS error (device sdf1): parent transid verify failed on 1805516800 wanted 114187 found 113260
Dec 14 14:08:24 NewMediaServer kernel: BTRFS error (device sdf1): parent transid verify failed on 1805598720 wanted 114187 found 113260
Dec 14 14:08:24 NewMediaServer kernel: BTRFS error (device sdf1): parent transid verify failed on 1806237696 wanted 114187 found 113261

You'll want to backup, re-format and restore data.

Quote

December 14, 20178 yr

Author

Hmm... well, BTRFS strikes again. It's a real shame you're forced in to using it. It's not production ready and I really detest it as a file system for production use. It's great for hobbiest and testing, but it's garbage when it comes to a live environment.

Any idea why Unraid forces the cache drives to be btrfs?

Quote

December 14, 20178 yr

Community Expert

32 minutes ago, joshz said:

BTRFS strikes again. It's a real shame you're forced in to using it. It's not production ready and I really detest it as a file system for production use.

btrfs works very well (except rai5/raid6) in a stable server, it has issues when there are hardware issues in a multi.device pool, can you post the output of:

btrfs dev stats /mnt/cache

Quote

December 15, 20178 yr

Author

That's the problem with BTRFS, it has no graceful recovery from problems like production file systems. All software works great when there's no problems. Good software is differentiated from bad software when it can handle issues and not crap the bed. BTRFS craps the bed at the slightest provocation. As evidenced here.

The fact that I have to literally blow out the whole raid, reformat, and recreate it is indicative of not being ready for prime time. Is there any way to switch my cache drive to something more stable?

Anyway, here is the requested output:

[/dev/sde1].write_io_errs 0
[/dev/sde1].read_io_errs 0
[/dev/sde1].flush_io_errs 0
[/dev/sde1].corruption_errs 0
[/dev/sde1].generation_errs 0
[/dev/sdf1].write_io_errs 0
[/dev/sdf1].read_io_errs 0
[/dev/sdf1].flush_io_errs 0
[/dev/sdf1].corruption_errs 0
[/dev/sdf1].generation_errs 0

Quote

December 15, 20178 yr

Community Expert

Stats look normal, you can change to xfs but will be limited to a single device.

Quote

December 15, 20178 yr

7 hours ago, joshz said:

That's the problem with BTRFS, it has no graceful recovery from problems like production file systems. All software works great when there's no problems. Good software is differentiated from bad software when it can handle issues and not crap the bed. BTRFS craps the bed at the slightest provocation. As evidenced here.

I have a number of thousand BTRFS file systems installed in vehicles and haven't seen any evidence that they should be extra fragile. When the vehicle power is cut, the devices will die without any way to perform any ordered shutdown. However, the units seems to recover ok.

I have also been using BTRFS in quite a number of server installations with no bad outcome.

BTRFS isn't perfect, but at least a notch or two better than your post suggests.

Quote

December 15, 20178 yr

Community Expert

22 minutes ago, pwm said:

BTRFS isn't perfect, but at least a notch or two better than your post suggests.

Agree, I've been using it as the only filesystem (array + cache + unassigned devices) on all my servers for some time without any major issues.

Without the logs showing the start of the issues I can't guess what happened to your filesystem, stats are OK so no apparent hardware device issues but most likely something serious happened.

Edited December 15, 20178 yr by johnnie.black

Quote

December 15, 20178 yr

All file systems are quite vulnerable to transfer errors, or software/hardware issues that makes the machine send bad data.

Journaling, copy-on-write etc are great for recovery after partial writes but can't protect from garbage writes. Garbage writes will not just be able to write bad file data but can also write bad data to the internal file system structures the file system is using for recovering from a crash or power loss. That's also why critical servers are making use of ECC memory, and why the internal cache and busses of server-class processors are making use of ECC.

Quote

December 15, 20178 yr

Community Expert

27 minutes ago, pwm said:

That's also why critical servers are making use of ECC memory, and why the internal cache and busses of server-class processors are making use of ECC.

Agree again, ECC is definitely recommended for a storage server and what I use on all my servers.

Quote

Cache Drive: Adding 2 more disks to a 2 disk existing cache

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)