October 13, 20205 yr Hey everyone. Last night my unraid server got all crazy on me and decided to do read errors and mess with my dockers and VMs. I have no idea where to start on figuring out what caused this. And my cache is apparently full of BTRFS errors: Oct 13 07:39:02 Jinx kernel: print_req_error: I/O error, dev sdb, sector 5663456 Oct 13 07:39:02 Jinx kernel: BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 350, rd 905675, flush 0, corrupt 0, gen 0 Anyone got the knowledge and time to look through my diagnostics and figure out what the root problem to this is, it would be greatly appreciated. Here is a few visual representations of the problem from the gui Array: Docker: VMs I had 5-6 VMs yesterday.... I kind off regret that I switched to Ryzen. Have had so many problems with the system since I switched over. Anyone knows if this could be related. Is there compatibility problems on the AMD side? Thanks in advance lovely community. Best Regards Baskedk jinx-diagnostics-20201013-0728.zip
October 13, 20205 yr Author A reboot of the server, got everything working again. At least it looks like it for now. Still very curious on what could cause this. If it can happen out of the blue like that, it can happen again...... And that kind of stability is not something I'm a big fan of.
October 13, 20205 yr Your BTRFS errors are a different problem than the array errors you are highlighting. Your BRTFS errors seem to be on sdb and/or sdc (your cache drives) and your array is formatted in XFS. I cannot see your SMART diagnostics for any drive I checked. Quote A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. Can you run SMART manually on the drives with errors ? I'd say sdd & sde + maybe sdb & sdc to be sure. But there is sure a lot of errors and not just the one you quoted. I cannot help you much on this, lets see what the others can propose.
October 13, 20205 yr Community Expert Problem with the onboard SATA controller: Oct 12 23:36:45 Jinx kernel: ahci 0000:01:00.1: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000da8d0000 flags=0x0000] Quite common with Ryzen boards, there are reports that updating to the latest beta helps, due to newer kernel, you can also disable IOMMU if not needed.
October 13, 20205 yr Community Expert 1 hour ago, JorgeB said: Quite common with Ryzen boards, there are reports that updating to the latest beta helps, due to newer kernel, you can also disable IOMMU if not needed. Video on that issue, with some possible fixes linked in their forum thread (in description)
October 13, 20205 yr Author I don't even know what IOMMU is, so I have no idea if I need it hehe. I can not seem to find a newer version of my bios. And no beta versions at all. It's a 'Asus Prime B450M-A'. But if all this nonsense is sata controller related, can anyone recommend a good reliable AM4 motherboard to use for unRAID then? My life is to short for these kind of random data failures all the time. And if this can be solved with another mobo, that's my goto.
October 13, 20205 yr Author Just now, tjb_altf4 said: Video on that issue, with some possible fixes linked in their forum thread (in description) thx, i'll give it a watch 👍
October 13, 20205 yr Author 1 hour ago, JorgeB said: Problem with the onboard SATA controller: Oct 12 23:36:45 Jinx kernel: ahci 0000:01:00.1: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000da8d0000 flags=0x0000] Quite common with Ryzen boards, there are reports that updating to the latest beta helps, due to newer kernel, you can also disable IOMMU if not needed. Ahh, you meant the unRAID OS beta, and not a beta bios 🙄 I'll try that out and see if the kernel makes the difference 👍
October 20, 20205 yr Author Just to follow up on my issue, it seems that the beta update stopped my disk errors. My log is not flooded with errors anymore, and every thing seems to be in order. Glad i didn't move to ryzen earlier, when that option was not around 👍 Thanks @JorgeB for pointing me in the right direction 😎
Archived
This topic is now archived and is closed to further replies.