LOG 100%, Syslog.2 108MB, BTRFS Errors

YourNightmar3 · December 7, 2022

Today i noticed the LOG on the dashboard of my unraid server shows 100%:

image.png.0acfd8c29b8afd0bfd41adde3ff80ad1.png

I used this comment to find out which log file is taking up all this space:

du -sm /var/log/*

And this is the result:

0       /var/log/btmp
1       /var/log/btmp.1
0       /var/log/cron
0       /var/log/debug
1       /var/log/dmesg
1       /var/log/docker.log
0       /var/log/faillog
1       /var/log/lastlog
1       /var/log/libvirt
0       /var/log/maillog
0       /var/log/mcelog
0       /var/log/messages
0       /var/log/nfsd
0       /var/log/nginx
0       /var/log/packages
1       /var/log/pkgtools
0       /var/log/plugins
0       /var/log/pwfail
0       /var/log/removed_packages
0       /var/log/removed_scripts
0       /var/log/removed_uninstall_scripts
0       /var/log/sa
3       /var/log/samba
1       /var/log/scan
0       /var/log/scripts
0       /var/log/secure
0       /var/log/setup
0       /var/log/spooler
0       /var/log/swtpm
4       /var/log/syslog
14      /var/log/syslog.1
108     /var/log/syslog.2
0       /var/log/vfio-pci
1       /var/log/wtmp

When i do

cat syslog.2

i see a bunch of BTRFS errors. I have no idea what's going on. I don't know what's causing this and what i should do. Any support with this is much appreciated. I have attached my diagnostics which i checked also contains the logs. Thanks in advance.

tower-diagnostics-20221207-1923.zip

Edited December 7, 2022 by YourNightmar3

JorgeB · December 7, 2022

Problems with the cache2 device:

Dec  5 04:40:02 Tower kernel: BTRFS warning (device sdb1): lost page write due to IO error on /dev/sdc1 (-5)
Dec  5 04:40:02 Tower kernel: BTRFS error (device sdb1): error writing primary super block to device 2
Dec  5 04:40:02 Tower kernel: BTRFS warning (device sdb1): lost page write due to IO error on /dev/sdc1 (-5)

It dropped offline, check/replace cables then run a scrub on the pool.

YourNightmar3 · December 7, 2022

12 hours ago, JorgeB said:

Problems with the cache2 device:

Dec  5 04:40:02 Tower kernel: BTRFS warning (device sdb1): lost page write due to IO error on /dev/sdc1 (-5)
Dec  5 04:40:02 Tower kernel: BTRFS error (device sdb1): error writing primary super block to device 2
Dec  5 04:40:02 Tower kernel: BTRFS warning (device sdb1): lost page write due to IO error on /dev/sdc1 (-5)

It dropped offline, check/replace cables then run a scrub on the pool.

Thanks a lot! Can this cause any serious data issues or is cache 1 my saviour here? I'll check the cable asap and how do i perform a scrub on the pool?

Edited December 8, 2022 by YourNightmar3

ChatNoir · December 8, 2022

7 hours ago, YourNightmar3 said:

I'll check the cable asap and how do i perform a scrub on the pool?

Click the on 'cache', then scroll down to the scrub section.

YourNightmar3 · December 11, 2022

Ok so i first unplugged/replugged the cables on the particular SSD but when i booted up unRAID again it said the entire SSD was missing this time. I then swapped the power and SATA cables with the power and SATA cables from the other SSD that is still working to rule out a cable problem and the same SSD still said missing. I then plugged it into my windows computer, formatted it to exFAT because otherwise i couldn't do anything with it. I did a quick read/write test on it and got a normal 400 MB/s, then did a chkdsk cmd command on it with this result:

The type of the file system is exFAT.
Volume Serial Number is D4AA-EF58
Windows is verifying files and folders...
Volume label is New Volume.
Windows is verifying file allocations...
File and folder verification is complete.
Windows is verifying free space...
  3815278 free clusters processed.
Free space verification is complete.
Bad sectors were found and tested while examining free space on the volume.

Windows has made corrections to the file system.
No further action is required.

 976712704 KB total disk space.
       256 KB in 1 files.
       512 KB in 2 indexes.
      1536 KB in bad sectors.
       768 KB in use by the system.
 976711168 KB available on disk.

    262144 bytes in each allocation unit.
   3815284 total allocation units on disk.
   3815272 allocation units available on disk.

It seems to be working fine on my windows PC but it does say bad sectors. I then plugged it back into my unRAID server to see if it would show up but it still cannot find the disk at all. Is that normal? How come my Windows PC can see/use it seemingly fine (even though it may be dying) but unRAID just doesn't see it at all? And does this mean the SSD is just dead?

Edited December 11, 2022 by YourNightmar3

JorgeB · December 12, 2022

Try a different port/controller, it would be strange that it works on Windows and not in Unraid.

YourNightmar3 · December 12, 2022

13 hours ago, JorgeB said:

Try a different port/controller, it would be strange that it works on Windows and not in Unraid.

Something i noticed when i type the "lsblk" command in the terminal is that (if im reading this right) i get three 1TB drives as result:

root@Tower:~# lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0     7:0    0 116.8M  1 loop /lib/firmware
loop1     7:1    0  19.4M  1 loop /lib/modules
loop2     7:2    0    80G  0 loop /var/lib/docker/btrfs
                                  /var/lib/docker
loop3     7:3    0     1G  0 loop /etc/libvirt
sda       8:0    1   7.5G  0 disk 
└─sda1    8:1    1   7.5G  0 part /boot
sdb       8:16   0 931.5G  0 disk 
└─sdb1    8:17   0 931.5G  0 part 
sdc       8:32   0 931.5G  0 disk 
└─sdc1    8:33   0 931.5G  0 part /mnt/cache
sdd       8:48   0 931.5G  0 disk 
└─sdd1    8:49   0 931.5G  0 part 
sde       8:64   0 465.8G  0 disk 
└─sde1    8:65   0 465.8G  0 part 
sdf       8:80   0   3.6T  0 disk 
└─sdf1    8:81   0   3.6T  0 part 
sdg       8:96   0   3.6T  0 disk 
└─sdg1    8:97   0   3.6T  0 part 
sdh       8:112  0   7.3T  0 disk 
└─sdh1    8:113  0   7.3T  0 part /mnt/disks/VDK2E7KK
sdi       8:128  0   7.3T  0 disk 
└─sdi1    8:129  0   7.3T  0 part 
sdj       8:144  0   7.3T  0 disk 
└─sdj1    8:145  0   7.3T  0 part 
md1       9:1    0   7.3T  0 md   /mnt/disk1
md2       9:2    0   3.6T  0 md   /mnt/disk2
md3       9:3    0   3.6T  0 md   /mnt/disk3
md4       9:4    0 931.5G  0 md   /mnt/disk4
md5       9:5    0 465.8G  0 md   /mnt/disk5
sr0      11:0    1  1024M  0 rom  
nvme0n1 259:0    0 931.5G  0 disk 
nvme1n1 259:1    0 931.5G  0 disk

Is that correct? I see sbd, sdc, and sdd as 1TB drives.

sdd is a 1TB array drive. sdc is my working cache SSD that's left, and sdb doesn't show up anywhere in the unRAID main page. Could that be the other 1TB cache SSD? If so, why is it not showing up in the GUI? If not, then what is it?

I have attached my server's diagnostics (once again) in case this might help answer the question/solve the problem.

tower-diagnostics-20221213-0059.zip

Edited December 13, 2022 by YourNightmar3

JorgeB · December 13, 2022

sdb was a 1TB Sandisk device that dropped offline:

Dec 12 20:59:19 Tower kernel: ata4.00: disable device
Dec 12 20:59:19 Tower kernel: sd 5:0:0:0: [sdb] tag#24 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=90s

LOG 100%, Syslog.2 108MB, BTRFS Errors

Recommended Posts

YourNightmar3

Link to comment

JorgeB

Link to comment

YourNightmar3

Link to comment

ChatNoir

Link to comment

YourNightmar3

Link to comment

JorgeB

Link to comment

YourNightmar3

Link to comment

JorgeB

Link to comment

Join the conversation