Jump to content

Cache drive deletes all data on its own


Go to solution Solved by JorgeB,

Recommended Posts

I'm not sure what is happening. It started yesterday and I pretty much just wiped/formatted my cache drives when it first occurred. Today the same thing happened. I looked at the logs and I think it may be a issue with one of my drives, but I'm not sure. I did recently install a 2TB SSD last week, but I don't think that could be the cause. What happens is that my cache drive will read as full, and all contents in them will be wiped out. All docker containers will freeze and I can't access them. They will still appear on the list, but I can't access them. Any chance I can still recover any data? 

 

image.thumb.png.9ce9b09b78e56ca5cbaf341ef7bd51ed.png

 

May 11 08:24:27 UNRAID kernel: ------------[ cut here ]------------
May 11 08:24:27 UNRAID kernel: BTRFS: Transaction aborted (error -28)
May 11 08:24:27 UNRAID kernel: WARNING: CPU: 0 PID: 13355 at fs/btrfs/inode.c:3329 btrfs_finish_ordered_io.isra.0+0x67e/0x69b
May 11 08:24:27 UNRAID kernel: Modules linked in: xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat vhost_net tun vhost vhost_iotlb tap xt_connmark xt_mark iptable_mangle xt_comment iptable_raw xt_nat xt_tcpudp veth xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xfs md_mod tcp_diag inet_diag efivarfs iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls wmi_bmof x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl intel_cstate intel_uncore i915 mpt3sas iosf_mbi drm_buddy i2c_algo_bit ttm i2c_i801 raid_class nvme i2c_smbus
May 11 08:24:27 UNRAID kernel: BTRFS: error (device nvme0n1p1: state A) in btrfs_finish_ordered_io:3329: errno=-28 No space left
May 11 08:24:27 UNRAID kernel: scsi_transport_sas igc drm_display_helper nvme_core btusb
May 11 08:24:27 UNRAID kernel: BTRFS info (device nvme0n1p1: state EA): forced readonly
May 11 08:24:27 UNRAID kernel: btrtl btbcm drm_kms_helper btintel ahci libahci input_leds bluetooth drm joydev led_class ecdh_generic ecc intel_gtt agpgart i2c_core vmd syscopyarea sysfillrect sysimgblt fb_sys_fops fan thermal wmi video backlight tpm_crb tpm_tis tpm_tis_core tpm acpi_tad acpi_pad button unix [last unloaded: md_mod]
May 11 08:24:27 UNRAID kernel: CPU: 0 PID: 13355 Comm: kworker/u40:4 Not tainted 5.19.17-Unraid #2
May 11 08:24:27 UNRAID kernel: Hardware name: ASUS System Product Name/TUF GAMING Z690-PLUS WIFI, BIOS 1601 07/07/2022
May 11 08:24:27 UNRAID kernel: Workqueue: btrfs-endio-write btrfs_work_helper
May 11 08:24:27 UNRAID kernel: RIP: 0010:btrfs_finish_ordered_io.isra.0+0x67e/0x69b
May 11 08:24:27 UNRAID kernel: Code: e8 20 b1 4f 00 0f 0b 89 e9 ba 13 0d 00 00 e9 e6 fd ff ff 83 fd e2 0f 84 d6 fd ff ff 89 ee 48 c7 c7 dc 1b 0f 82 e8 fb b0 4f 00 <0f> 0b e9 c1 fd ff ff 48 81 c4 98 00 00 00 5b 5d 41 5c 41 5d 41 5e
May 11 08:24:27 UNRAID kernel: RSP: 0018:ffffc90003347d78 EFLAGS: 00010286
May 11 08:24:27 UNRAID kernel: RAX: 0000000000000000 RBX: ffff888035af12e8 RCX: 0000000000000027
May 11 08:24:27 UNRAID kernel: RDX: 0000000000000001 RSI: ffffffff820d7be1 RDI: 00000000ffffffff
May 11 08:24:27 UNRAID kernel: RBP: 00000000ffffffe4 R08: 0000000000000000 R09: ffffffff828653f0
May 11 08:24:27 UNRAID kernel: R10: 00003fffffffffff R11: ffff88887f7c880f R12: ffff88817a41d888
May 11 08:24:27 UNRAID kernel: R13: 0000000000000020 R14: ffff88872502b480 R15: 0000000000003000
May 11 08:24:27 UNRAID kernel: FS:  0000000000000000(0000) GS:ffff88885f200000(0000) knlGS:0000000000000000
May 11 08:24:27 UNRAID kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 11 08:24:27 UNRAID kernel: CR2: 0000150e71062000 CR3: 00000001d28a4004 CR4: 0000000000770ef0
May 11 08:24:27 UNRAID kernel: PKRU: 55555554
May 11 08:24:27 UNRAID kernel: Call Trace:
May 11 08:24:27 UNRAID kernel: <TASK>
May 11 08:24:27 UNRAID kernel: ? newidle_balance+0x289/0x30a
May 11 08:24:27 UNRAID kernel: btrfs_work_helper+0x111/0x2a5
May 11 08:24:27 UNRAID kernel: process_one_work+0x1a8/0x295
May 11 08:24:27 UNRAID kernel: worker_thread+0x18b/0x244
May 11 08:24:27 UNRAID kernel: ? rescuer_thread+0x281/0x281
May 11 08:24:27 UNRAID kernel: kthread+0xe4/0xef
May 11 08:24:27 UNRAID kernel: ? kthread_complete_and_exit+0x1b/0x1b
May 11 08:24:27 UNRAID kernel: ret_from_fork+0x1f/0x30
May 11 08:24:27 UNRAID kernel: </TASK>
May 11 08:24:27 UNRAID kernel: ---[ end trace 0000000000000000 ]---
May 11 08:24:27 UNRAID kernel: BTRFS: error (device nvme0n1p1: state EA) in btrfs_finish_ordered_io:3329: errno=-28 No space left
May 11 08:24:27 UNRAID kernel: BTRFS warning (device nvme0n1p1: state EA): Skipping commit of aborted transaction.
May 11 08:24:28 UNRAID kernel: BTRFS error (device nvme0n1p1: state EA): parent transid verify failed on 569036668928 wanted 2264 found 2212
May 11 08:24:28 UNRAID kernel: BTRFS error (device nvme0n1p1: state EA): parent transid verify failed on 569036668928 wanted 2264 found 2212
May 11 08:24:28 UNRAID kernel: BTRFS error (device nvme0n1p1: state EA): parent transid verify failed on 569036668928 wanted 2264 found 2212
May 11 08:24:28 UNRAID kernel: BTRFS error (device nvme0n1p1: state EA): parent transid verify failed on 569036668928 wanted 2264 found 2212
May 11 08:24:28 UNRAID kernel: BTRFS error (device nvme0n1p1: state EA): parent transid verify failed on 569036668928 wanted 2264 found 2212
May 11 08:24:28 UNRAID kernel: BTRFS error (device nvme0n1p1: state EA): parent transid verify failed on 569036668928 wanted 2264 found 2212
May 11 08:24:28 UNRAID kernel: BTRFS error (device nvme0n1p1: state EA): parent transid verify failed on 569036668928 wanted 2264 found 2212
May 11 08:24:28 UNRAID kernel: BTRFS error (device nvme0n1p1: state EA): parent transid verify failed on 569036668928 wanted 2264 found 2212
May 11 08:24:28 UNRAID kernel: BTRFS error (device nvme0n1p1: state EA): parent transid verify failed on 569036668928 wanted 2264 found 2212
May 11 08:24:28 UNRAID kernel: BTRFS error (device nvme0n1p1: state EA): parent transid verify failed on 569036668928 wanted 2264 found 2212
May 11 08:24:31 UNRAID kernel: PMS Logger[11073]: segfault at 1464653ba61c ip 0000146482ccc4b3 sp 000014647dffb940 error 4 in ld-musl-x86_64.so.1[146482c91000+53000]
May 11 08:24:31 UNRAID kernel: Code: e9 37 ff ff ff 55 41 57 41 56 41 55 41 54 53 50 49 89 cc 49 89 d7 49 89 f6 49 89 fd 48 89 d3 48 0f af de 48 85 f6 4c 0f 44 fe <83> b9 8c 00 00 00 00 78 29 4c 89 e7 e8 c5 de ff ff 89 c5 4c 89 ef
May 11 08:24:45 UNRAID kernel: verify_parent_transid: 13523 callbacks suppressed
May 11 08:24:45 UNRAID kernel: BTRFS error (device nvme0n1p1: state EA): parent transid verify failed on 568932679680 wanted 2264 found 1516
May 11 08:24:45 UNRAID kernel: BTRFS error (device nvme0n1p1: state EA): parent transid verify failed on 568932679680 wanted 2264 found 1516

 

unraid-diagnostics-20230511-0835.zip

Link to comment
  • Solution

Filesystem is completely full:

 

                  Data      Metadata System                               
Id Path           RAID0     RAID1    RAID1     Unallocated Total     Slack
-- -------------- --------- -------- --------- ----------- --------- -----
 1 /dev/nvme0n1p1 231.88GiB  1.00GiB         -     1.05MiB 232.88GiB     -
 2 /dev/nvme1n1p1 231.85GiB  1.00GiB  32.00MiB     1.05MiB 232.88GiB     -
 3 /dev/nvme2n1p1 231.88GiB  1.00GiB         -     1.05MiB 232.88GiB     -
 4 /dev/sdb1      223.57GiB        -         -     1.02MiB 223.57GiB     -
 5 /dev/sdc1      119.24GiB        -         -     1.02MiB 119.24GiB     -
 6 /dev/sde1      119.24GiB        -         -     1.02MiB 119.24GiB     -
 7 /dev/sdd1      673.88GiB  3.00GiB  32.00MiB     1.16TiB   1.82TiB     -
-- -------------- --------- -------- --------- ----------- --------- -----
   Total            1.79TiB  3.00GiB  32.00MiB     1.16TiB   2.95TiB 0.00B
   Used             1.78TiB  2.98GiB 128.00KiB     

 

There's one device with >1TB free, but you are using raid0, so you need at least two devices with free space.

Link to comment

I notice that you have the Minimum Free Space setting for the cache pool set to 0.    It should be set to at least be larger than the biggest file you expect to cache so that Unraid knows when it should stop writing new files to the cache and instead by-pass the cache and write directly to the array.   Btrfs file systems seem to misbehave when they get too full so setting this may well help.

Link to comment
9 minutes ago, JorgeB said:

Filesystem is completely full:

 

                  Data      Metadata System                               
Id Path           RAID0     RAID1    RAID1     Unallocated Total     Slack
-- -------------- --------- -------- --------- ----------- --------- -----
 1 /dev/nvme0n1p1 231.88GiB  1.00GiB         -     1.05MiB 232.88GiB     -
 2 /dev/nvme1n1p1 231.85GiB  1.00GiB  32.00MiB     1.05MiB 232.88GiB     -
 3 /dev/nvme2n1p1 231.88GiB  1.00GiB         -     1.05MiB 232.88GiB     -
 4 /dev/sdb1      223.57GiB        -         -     1.02MiB 223.57GiB     -
 5 /dev/sdc1      119.24GiB        -         -     1.02MiB 119.24GiB     -
 6 /dev/sde1      119.24GiB        -         -     1.02MiB 119.24GiB     -
 7 /dev/sdd1      673.88GiB  3.00GiB  32.00MiB     1.16TiB   1.82TiB     -
-- -------------- --------- -------- --------- ----------- --------- -----
   Total            1.79TiB  3.00GiB  32.00MiB     1.16TiB   2.95TiB 0.00B
   Used             1.78TiB  2.98GiB 128.00KiB     

 

There's one device with >1TB free, but you are using raid0, so you need at least two devices with free space.

 

Sorry for my confusion here, but how is that my 2TB still has 1.16TB of unallocated space? Shouldn't it be in use? Would I be able to allocate the space over? Or will I have to get another drive with at least 1TB? 

Link to comment
17 minutes ago, itimpi said:

I think this is what is needed.

 

Awesome. I just have a few quick questions here.

  1. When I add another 2TB SSD will the data "recover" itself? Right now it looks like anything that was on my cache has been wiped out.
  2. When installing the secondary 2TB SSD, will I need to do anything special? I want to be sure that I'm using all the allocated space for it.
  3. Is there a way to reclaim the unallocated space on the current 2TB SSD once I add on the new one? 
Link to comment

It looks like adding another 2TB SSD got me back in. I took a look at the usage and it appears that there's still of space that's unallocated. I assume the best approach here is to downsize/separate these drives into diff pools. How would I go about that without breaking anything?

 

                  Data      Metadata System                               
Id Path           RAID0     RAID1    RAID1     Unallocated Total     Slack
-- -------------- --------- -------- --------- ----------- --------- -----
 1 /dev/nvme0n1p1 199.55GiB        -         -    33.33GiB 232.88GiB     -
 2 /dev/nvme1n1p1 199.52GiB        -         -    33.36GiB 232.88GiB     -
 3 /dev/nvme2n1p1 199.55GiB        -         -    33.33GiB 232.88GiB     -
 4 /dev/sdb1      192.24GiB        -         -    31.33GiB 223.57GiB     -
 5 /dev/sdc1      104.24GiB        -         -    15.00GiB 119.24GiB     -
 6 /dev/sde1      104.24GiB        -         -    15.00GiB 119.24GiB     -
 7 /dev/sdd1      451.55GiB  4.00GiB  32.00MiB     1.37TiB   1.82TiB     -
 8 /dev/sdh1      451.55GiB  4.00GiB  32.00MiB     1.37TiB   1.82TiB     -
-- -------------- --------- -------- --------- ----------- --------- -----
   Total            1.86TiB  4.00GiB  32.00MiB     2.91TiB   4.77TiB 0.00B
   Used             1.85TiB  3.09GiB 128.00KiB                            

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...