May 6, 20197 yr I have an unraid server since 1 month and I suddenly couldn't write on the cache drive because it was in read-only. I restarted the server and now it is "Unmountable: No file system". The cache drive is a 1TB Samsung SSD and about a year old, has been used in the past as an OS drive and worked fine. The file system is BTRFS, formatted 3 weeks ago. When I execute the check command (sdc = cache drive): # btrfs check /dev/sdc1 checksum verify failed on 376520704 found E1688036 wanted 6AD0A710 checksum verify failed on 376520704 found E1688036 wanted 6AD0A710 Csum didn't match ERROR: cannot open file system Syslog errors from sdc : May 6 20:32:24 ALPHA-UNRAID01 emhttpd: shcmd (46): mount -t btrfs -o noatime,nodiratime /dev/sdc1 /mnt/cache May 6 20:32:24 ALPHA-UNRAID01 kernel: BTRFS info (device sdc1): disk space caching is enabled May 6 20:32:24 ALPHA-UNRAID01 kernel: BTRFS info (device sdc1): has skinny extents May 6 20:32:24 ALPHA-UNRAID01 kernel: BTRFS warning (device sdc1): sdc1 checksum verify failed on 376520704 wanted 6AD0A710 found E1688036 level 0 May 6 20:32:24 ALPHA-UNRAID01 kernel: BTRFS error (device sdc1): failed to read block groups: -5 May 6 20:32:24 ALPHA-UNRAID01 root: mount: /mnt/cache: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error. May 6 20:32:24 ALPHA-UNRAID01 emhttpd: shcmd (46): exit status: 32 May 6 20:32:24 ALPHA-UNRAID01 emhttpd: /mnt/cache mount error: No file system May 6 20:32:24 ALPHA-UNRAID01 emhttpd: shcmd (47): umount /mnt/cache May 6 20:32:24 ALPHA-UNRAID01 kernel: BTRFS error (device sdc1): open_ctree failed There was not a lot of data on the cache drive, so formatting it is not a problem. But since the installation is new, does it mean I have a drive problem ? How can find the cause of the problem ? Thank you alpha-unraid01-diagnostics-20190506-2102.zip Edited May 6, 20197 yr by yoshi_26_02 Attached diagnostic
May 7, 20197 yr 11 hours ago, yoshi_26_02 said: does it mean I have a drive problem ? Not likely, you could have a RAM problem though, since that type of problem is mostly caused by one or more bit flips in RAM.
May 7, 20197 yr Author Thank you for your answer I will try dial back my ram to 2666 mhz (currently 3000MHz) By the way, unraid run within KVM as a virtual machine, I forgot to precise in in the post. The host is Proxmox VE.
May 7, 20197 yr 18 minutes ago, yoshi_26_02 said: I will try dial back my ram to 2666 mhz (currently 3000MHz) That's a good idea, Ryzen with overclocked RAM is known to in at least some cases corrupt data, evident in sync errors during a parity check, Threadripper could likely be the same. 19 minutes ago, yoshi_26_02 said: By the way, unraid run within KVM as a virtual machine, I forgot to precise in in the post. I noticed, but it should be unrelated to current issue.
May 7, 20197 yr Author 26 minutes ago, johnnie.black said: That's a good idea, Ryzen with overclocked RAM is known to in at least some cases corrupt data, evident in sync errors during a parity check, Threadripper could likely be the same. Yes, because my RAM is rated for 3400 MHz but I had to configure it at 3000 MHz in the first place because the system was instable. However, MemTest86 ran 24h successfully at 3000 MHz so I thought it was stable. Anyway, I will go back to 2666 MHz and reformat the cache drive. I will write big files to a cache enabled share and see if there are any problems. I had no problem running unraid without cache, no corruption (about 8TB written to the pool, all drives in BTRFS). If the problem is RAM, is it because only the cache drive uses the RAM to transfert data ? I don't know how unraid uses RAM.
May 7, 20197 yr If the problem is RAM it can affect cache or data disks the same way, but if it's a sporadic error it can go some time before issues are noticed. 14 minutes ago, yoshi_26_02 said: However, MemTest86 ran 24h successfully at 3000 MHz so I thought it was stable. In the Ryzen cases I mentioned, memtest also didn't detect any errors, but a parity sync would.
Archived
This topic is now archived and is closed to further replies.