Jump to content

Many errors after adding drive to cache pool - BTRFS read only


Recommended Posts

Hi Guys,

Hope you're doing fine!

I found a problema that I cant solve on my own. Need help!

 

I recently added one more disk to my cache pool so I could get more free space but I started having many problems that I cant solve...

I had a 2Tb nvme as cache drive so I can run my VMs of of it but it got full. VMs begun pausing and causing issues to my users.

To add some free space I added a 500Gb SSD drive to the pool.

Took a while but it finished.


When the processes finished I started obvserving really weird stuff on my VMs and logs as cache drive seemed to be read only... At this point VMs were locking and I started observing BTRFS erros on log.

 

In addition to that (at the same time, of course) my USB stick failed (bad luck, I guess?).

Got a new one, replaced the key. Data was safe! Nothing was lost, for what I can tell for now.

 

BUT I see a lot of these on my logs:
Jun 27 11:52:33 VEYRON kernel: blk_update_request: I/O error, dev loop2, sector 78080 op 0x1:(WRITE) flags 0x1800 phys_seg 16 prio class 0
Jun 27 11:52:33 VEYRON kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
Jun 27 11:52:33 VEYRON kernel: blk_update_request: I/O error, dev loop2, sector 602368 op 0x1:(WRITE) flags 0x1800 phys_seg 16 prio class 0
Jun 27 11:52:33 VEYRON kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
Jun 27 11:52:33 VEYRON kernel: BTRFS: error (device loop2) in btrfs_commit_transaction:2438: errno=-5 IO failure (Error while writing out transaction)
Jun 27 11:52:33 VEYRON kernel: BTRFS info (device loop2): forced readonly
Jun 27 11:52:33 VEYRON kernel: BTRFS warning (device loop2): Skipping commit of aborted transaction.
Jun 27 11:52:33 VEYRON kernel: BTRFS: error (device loop2) in cleanup_transaction:2011: errno=-5 IO failure
Jun 27 11:52:33 VEYRON kernel: BTRFS error (device loop2): commit super ret -5

 

Any advice to how can I stop this erros? I guess cache is still in read only.

Thank you!

 

veyron-diagnostics-20220627-1634.zip

Link to comment
11 minutes ago, CDebarry said:

I had a 2Tb nvme as cache drive so I can run my VMs of of it but it got full. VMs begun pausing and causing issues to my users.

To add some free space I added a 500Gb SSD drive to the pool.

By default 2 members in a Pool will be configured as RAID1, which means the total space available is limited to the capacity of the smaller of the two drives. You went from a pool with a 2TB capacity to a redundant pool with 500GB capacity. I do not know what happens to the data that is beyond the 500GB limit. Hopefully it's still there, and possibly if you balance to a single profile so it spreads the data across both drives it may correct the issue.

Link to comment

Also, that original 2TB cache should have been plenty. You must have had some shares setup wrong so it got filled.

 

Your diagnostics showed all shares configured to not use cache, but of course that isn't how they were configured when you filled it.

 

Typically, you want appdata, domains, system shares on cache and set to stay on cache (cache-prefer or cache-only), so your docker/VM performance won't be impacted by parity and so array disks can spin down since these files are always open.

 

Other shares should be cache-yes so they get moved to the array, or cache-no so they get written directly to the array.

 

Probably you had some of your shares set to stay on cache (cache-prefer) when they should have been moved to the array (cache-yes).

Link to comment
Jun 27 10:01:34 VEYRON kernel: BTRFS: error (device nvme0n1p1) in btrfs_run_delayed_refs:2150: errno=-28 No space left

 

As mentioned the pool ran out of space, that's why it went read-only, there are other options but easiest way to get out of this would be to backup the pool to the array (or use existing backups) and re-format.

Link to comment

Hi everyone,

Thanks for the reply.
Yesterday I wrote a reply but it got lost... sorry.

 

The parity rebuild just finished. Data seams to be safe now! Great news.

 

Data on cache drive does not get transferred to the array when mover runs. It doesn't recognize it as part of data from shared drives on the array.

I can SSH into the cache drive and copy all data (VMs and some folders). So all cache data is safe now.

 

I'm moving VMs to another box so I can test it. I'm also going to keep it as live backup or alternative to the main server.

 

After copying all data how can I clear the array and start over?

Is it possible to use the second drive as additional space to the array?
Any options on GUI that I can use? I'm kindda of a newbie on linux.

 


Thanks again!

Link to comment
5 minutes ago, CDebarry said:

After copying all data how can I clear the array and start over?

You mean the pool right? You can clear it by wiping both devices with 'blkdiscard -f /dev/sdX' and 'blkdiscard -f /dev/nvme0n1', then start the array and format the pool.

 

7 minutes ago, CDebarry said:

Is it possible to use the second drive as additional space to the array?

Again assuming the pool, yes use the single profile:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=480421

 

Link to comment
3 hours ago, JorgeB said:

You mean the pool right? You can clear it by wiping both devices with 'blkdiscard -f /dev/sdX' and 'blkdiscard -f /dev/nvme0n1', then start the array and format the pool.

 

Again assuming the pool, yes use the single profile:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=480421

 

 

Thank you!!!!!!!!!!!!

It worked!

 

Now, I have two pools.

One, the nvme drive, will be used only for VMs and the other (ssd) will be used for temporary cache for the array on some shares.

 

 

Link to comment

Here you go. New diagnostics.

 

I guess everything is working fine.
One thing that bothers me are those lines on log:

 

Jun 29 09:19:51 VEYRON nginx: 2022/06/29 09:19:51 [error] 1572#1572: *605973 limiting requests, excess: 20.587 by zone "authlimit", client: 192.168.1.99, server: , request: "PROPFIND /login HTTP/1.1", host: "192.168.1.174" Jun 29 09:19:51 VEYRON nginx: 2022/06/29 09:19:51 [error] 1572#1572: *605975 limiting requests, excess: 20.571 by zone "authlimit", client: 192.168.1.99, server: , request: "PROPFIND /login HTTP/1.1", host: "192.168.1.174" Jun 29 09:19:51 VEYRON nginx: 2022/06/29 09:19:51 [error] 1572#1572: *605977 limiting requests, excess: 20.557 by zone "authlimit", client: 192.168.1.99, server: , request: "PROPFIND /login HTTP/1.1", host: "192.168.1.174"

 

.99 machine is mine. Should I worry about them?

veyron-diagnostics-20220629-1242.zip

Link to comment
4 hours ago, CDebarry said:

One thing that bothers me are those lines on log

Do you have any idea what is causing that? Do you have multiple browsers open to your server? Something on your PC constantly trying to connect to your server? Better if you don't have things spamming syslog, it makes it difficult to see other things that might be going on, and it will eventually fill log space.

 

4 hours ago, CDebarry said:

I guess everything is working fine.

Your appdata, domains, system shares have files on the array. Typically you want these all on fast pool, and set to stay there. With them on the array, your docker/VM performance will be impacted by slower array, and array disks can't spin down since these files are always open.

 

To get them moved off the array, you will have to set them to prefer one of your pools, and disable Docker and VM Manager in Settings, since nothing can move open file.

Link to comment
57 minutes ago, trurl said:

Do you have any idea what is causing that? Do you have multiple browsers open to your server? Something on your PC constantly trying to connect to your server? Better if you don't have things spamming syslog, it makes it difficult to see other things that might be going on, and it will eventually fill log space.

 

Your appdata, domains, system shares have files on the array. Typically you want these all on fast pool, and set to stay there. With them on the array, your docker/VM performance will be impacted by slower array, and array disks can't spin down since these files are always open.

 

To get them moved off the array, you will have to set them to prefer one of your pools, and disable Docker and VM Manager in Settings, since nothing can move open file.

 

Thank for the advice! I'll move everything to the nvme cache poll tonight.

 

I don't know whats trying to connect to the server from my PC..
I usually have only one browser opened.

 

Is there any way to figure this out?


Thanks again!!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...