Changed SAS card, random disks disabled, log is full, i dont know what to do :(, any help appreciated :)

nlink0714 · February 7, 2023

Hi all

Noob unraider here but i could use some help on some issues Ive been having over the past 6 weeks.

I have a 10 drive system with 8 drives and 2 parity drives. 8 of the drives are the same (SEAGATE EXOS 18tb) and connected via an LSI SAS 2308 PCI-E card. The other 2 data drives are WD red 18tb drives connected via SATA to my MoBo (ASUS Z390M). I also have 2 SSDs connected via SATA and an NVMe drive (which I put in because I literally couldnt find a use for it; I dont use it and quite frankly I think im gonna get rid of it because

Last month I had an issue where random drives would get disabled with either read or write errors. I took the suggestion of a few fellow unraiders and would stop the array -> "remove" drive -> start in maintenance mode -> stop array again -> "replace" drive and then do a data rebuild. But when I would do this, another drive would have errors and disable. Finally, i took the suggestion on reddit to check all the cables and took it upon myself to replace my LSI SAS card with an identical one that I had sitting on an unused unraid server that I had from long ago. During this time I had multiple panic attacks that the server was going to fail and my data would be lost. But everything worked out.

Things were working well until this week when my Parity 2 drive came up with 2500 read errors and then disabled itself. I'm not sure what caused it. I'm not good at reading logs but I also noted that my system log in memory is at 100% and I see a bunch of errors.

Im super hesitant to go through the method described above and wondering if some saavy users here can help me understand what to do and how to fix this, or if I need to do hardware exchange. I also want to fix my issue with the logs. Fix common problems directed me to here. My thought was to change out the motherboard thinking that its a PCIE issue but i want to ensure its not something that I can fix here before trying to convince the spouse to let me buy a new motherboard (she thinks I spend too much money on this as is :P).

I'm attaching everything I think will be helpful including images

ST18000NM000J-2TV103_ZR54ZAC2-20230202-1653.txt syslog.txt

Vr2Io · February 7, 2023

Log full with "nchan" error. Due to those error / crash, disk error may not relate to disk controller problem. Any VM / Docker running in system ?

Pls try memory test to ensure system in health first and post full diagnostic, also try stop all unnecessary docker / VM / applications to rule-out the problem.

Edited February 7, 2023 by Vr2Io

GRRRRRRR · February 7, 2023

Yes The log is full. Fix that as well.

I fix this like this I doublecheck these

OC

Cooling

Power and power cables

sata cables

HBA card firmwares

After fuzzing with these such issues disappear until next time I touch something such as Add a disk.

booting into memtest is not a real memtest because the power management is different than the loaded OS. He he he.

Edited February 7, 2023 by GRRRRRRR

Changed SAS card, random disks disabled, log is full, i dont know what to do :(, any help appreciated :)

Recommended Posts

nlink0714

Link to comment

Vr2Io

Link to comment

GRRRRRRR

Link to comment

Join the conversation