September 24, 20232 yr Hi there Since some days (i'm running solid since 4 years), my dockers stops gradually and it's very hard to make all the systems run again fine (and it shortly redo the same errors :/) When i try to restart them, unraid show an Execution Error so dead end I've searched and found some post talking about bad RAMs and i ran 2 (paralleled and unparalleled) MEMTEST86, which PASSed fine Also found ie this topic that talk about a full cache but it doesn't seem to be the case for me either, so i'm a little lost root@Tower:~# btrfs dev stats /mnt/cache [/dev/mapper/sdk1].write_io_errs 192912 [/dev/mapper/sdk1].read_io_errs 145891 [/dev/mapper/sdk1].flush_io_errs 1086 [/dev/mapper/sdk1].corruption_errs 628 [/dev/mapper/sdk1].generation_errs 0 [/dev/mapper/sdj1].write_io_errs 0 [/dev/mapper/sdj1].read_io_errs 0 [/dev/mapper/sdj1].flush_io_errs 0 [/dev/mapper/sdj1].corruption_errs 0 [/dev/mapper/sdj1].generation_errs 0 [/dev/mapper/sdl1].write_io_errs 0 [/dev/mapper/sdl1].read_io_errs 0 [/dev/mapper/sdl1].flush_io_errs 0 [/dev/mapper/sdl1].corruption_errs 0 [/dev/mapper/sdl1].generation_errs 0 I'm attaching diagnostics and some logs if anyone can point me where i can head next to fix this issue please Thank you very much in advance ! logs.txt tower-diagnostics-20230924-1631.zip Edited October 29, 20232 yr by AinzOolGown typo
September 24, 20232 yr Not necessarily your problem, but corruption issues are generally related to bad RAM (as your searching has already indicated). FWIW, your particular RAM is not on either the motherboard's QVL for memory and G-Skill doesn't list that motherboard on the memory's QVL. Personally, I only ever buy RAM from the MB QVL for the most trouble free experience. But take it with a grain of salt. Just because neither the memory or the MB says that they are compatible with each other doesn't mean that they are not.
September 25, 20232 yr Author Hi Squid, thank you Yep, i figured that and done testings when i built the server, but all the tests was successfull so i decided to keep them It's been 4 years and never had a RAM issue since. If it is really a RAM problem, then i don't understand how it can be all ok since 4 years and suddently is not ok anymore Memtest86 also tells me everything's fine 😭
September 25, 20232 yr Community Expert 20 hours ago, AinzOolGown said: [/dev/mapper/sdk1].write_io_errs 192912 [/dev/mapper/sdk1].read_io_errs 145891 [/dev/mapper/sdk1].flush_io_errs 1086 These suggest the device dropped offline sometime in the past, see here for more info and better pool monitoring.
September 25, 20232 yr Author Thanks JorgeB That reminds me... Sorry i forgot to tell you, just some days before this behavior, one disk of my array went offline 2 or 3 days. Unraid emulated it and i noticed just some hours later by chance. I changed a cable and rebuilt array. All seemed good to me, but in fact, it might have corrupted the array ? If this is it, how can i correct that please ? i read about some "scrub" command, i have it for cache array but not for data array Will it erase all cache ? I wont lose any data right ? Thank you P.S.: Installed Squid's script from your link, thanks again ! Included most recent logs, it start to worry me In the meantime, i had a notification "Error on cache pool - No description" LastLogs.txt Edited September 25, 20232 yr by AinzOolGown
September 25, 20232 yr Community Expert 23 minutes ago, AinzOolGown said: If this is it, how can i correct that please ? See the link above.
September 30, 20232 yr Author Hi So, i followed this process and changed all shares that use cache to "Yes". VM & Dockers are disabled but mover have finished and shares (appdata/system) remain on it. I opted to completely replace my cache drives/SATA Cables Can i just save the remaining content elsewhere and move it back when the new cache pool will be operationnal ? Is there a better way to move the remaining files to the array ? And, how to do this the right way ? (to conserve permissions and such) i have a Mac and PathFinder ready for that Also, i have a share set to "no" for cache, but strangely the last files created on it are on cache O_o Thank you Edited September 30, 20232 yr by AinzOolGown
September 30, 20232 yr Community Expert 1 hour ago, AinzOolGown said: but mover have finished and shares (appdata/system) remain on it. Enable mover logging, run the mover, post new diags.
September 30, 20232 yr Community Expert The filesystem is read-only, so the data cannot be moved, it can be copied though.
September 30, 20232 yr Author Can i just copy the content via SMB and do the opposite when the new drives will be installed ? (sorry i prefer to double-check, there's some prod Dockers for my business and that would be hell to lose anything ) Edited September 30, 20232 yr by AinzOolGown
September 30, 20232 yr Community Expert 21 minutes ago, AinzOolGown said: Can i just copy the content via SMB and do the opposite when the new drives will be installed ? That should work, you can also use for example rsync.
September 30, 20232 yr Author Sorry JorgeB, rsync gives me errors too cmd was : rsync -a /mnt/cache/appdata/ /mnt/user/Backup-Saves/TowerDockers/Cache/appdata/ I tried a CA Backup and same, it gives errors 😱 Ran a Filesystem checks on Cache, attached logs rsync errors.txt Cache filesystem check.txt Edited September 30, 20232 yr by AinzOolGown
October 1, 20232 yr Community Expert Any corrupt files will fail to copy, you can try btrfs restore here, it will ignore corrupt files and still copy them but they will still be corrupt.
October 1, 20232 yr Author Hi I copied what i can Deleted the cache pool / pluged new SSDs / recreated a new cache pool and assigned the new SSDs with new cables Then i reswitched appdata on "only" cache and done a CA plugin restore Plugin said restoring complete but i have a notification saying error occured Syslog indicate a lot of btrfs error I switched to Discord somewhere inbetween, and it seems that my RAID controller card is the culprit Synd & Kilrah helped me and they advised to replace my card by a 9300-8i, so i ordered one and now i'm waiting for it to arrive and retry a CA restore Thank you JorgeB for your help and patience, when you're stressed, it's a big help to have support ! Will repost the next steps when i'll try with the new card.
October 13, 20232 yr Author Hi So i received HBA Card and was wondering how to proced the cleanest way Do i scrub SSD like this (when they're still connected to RAID Controller) before replacing RAID Controller by HBA Card or : 1/ Replace RAID Controller with HBA Card 2/ Scrub SSD 3/ delete + redo Cache pool 4/ Restore Backup ? Please Edited October 13, 20232 yr by AinzOolGown
October 13, 20232 yr Community Expert If you are going to re-format the pool there's no much point in scrubbing, but you could do it before or after.
October 21, 20232 yr Author Hi So, i replaced the card just now, and tried to reactivate VM, but the list is empty Maybe it is linked with the cache settings in shares I changed them, but maybe have to invoke mover ? Thanks ! P.S. I have a backup of libvirt.img EDIT : Nevermind, i replaced the libvirt.img and all VMs reappeared Edited October 21, 20232 yr by AinzOolGown
October 21, 20232 yr Author To follow, i see anothers btrfs errors, after invoking mover I don't understand, SSD are new, câbles and HBA card are new too, i'm in despair Here's the fresh diagnostic tower-diagnostics-20231021-1207.zip When i deleted/redone cache pool, there was no reformating proposed or done by the system Do i need to force one ? Scrub was saying "no error found" before invoking mover Edited October 21, 20232 yr by AinzOolGown
October 21, 20232 yr Community Expert Solution There are issues with both cache devices, since they are MX500 see if this helps: https://forums.unraid.net/topic/134954-warning-crucial-mx500-ssds-world-of-pain-stay-away-from-these/?do=findComment&comment=1255816
October 21, 20232 yr Author Thank you very much JorgeB I think i'm damned Snolly's post say that to view the firmware information, we need to "click on the drive's name and go to identity tab" Mine's look like this : Also, my Crucial SSDs are not listed in /dev/ maybe because Unraid can't view past the HBA card ? That doesn't make sense since it can mount them successfully Edited October 21, 20232 yr by AinzOolGown
October 22, 20232 yr Community Expert 16 hours ago, AinzOolGown said: Mine's look like this : The devices dropped offline, power cycling the server should bring them back.
October 23, 20232 yr Author Hi JorgeB, Thanks to bear with me ^^ I'm thinking of changing the SSD, this time i'm searching possible unraid incompatibility for each SSD i think can be good but every model return plenty bad result... Do you have some brand/model to recommend please ?
October 23, 20232 yr Community Expert I've been happy with my MX500, other models I've been using without issues so far are the 860 and 870 EVO.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.