nlink0714 Posted February 7, 2023 Share Posted February 7, 2023 Hi all Noob unraider here but i could use some help on some issues Ive been having over the past 6 weeks. I have a 10 drive system with 8 drives and 2 parity drives. 8 of the drives are the same (SEAGATE EXOS 18tb) and connected via an LSI SAS 2308 PCI-E card. The other 2 data drives are WD red 18tb drives connected via SATA to my MoBo (ASUS Z390M). I also have 2 SSDs connected via SATA and an NVMe drive (which I put in because I literally couldnt find a use for it; I dont use it and quite frankly I think im gonna get rid of it because Last month I had an issue where random drives would get disabled with either read or write errors. I took the suggestion of a few fellow unraiders and would stop the array -> "remove" drive -> start in maintenance mode -> stop array again -> "replace" drive and then do a data rebuild. But when I would do this, another drive would have errors and disable. Finally, i took the suggestion on reddit to check all the cables and took it upon myself to replace my LSI SAS card with an identical one that I had sitting on an unused unraid server that I had from long ago. During this time I had multiple panic attacks that the server was going to fail and my data would be lost. But everything worked out. Things were working well until this week when my Parity 2 drive came up with 2500 read errors and then disabled itself. I'm not sure what caused it. I'm not good at reading logs but I also noted that my system log in memory is at 100% and I see a bunch of errors. Im super hesitant to go through the method described above and wondering if some saavy users here can help me understand what to do and how to fix this, or if I need to do hardware exchange. I also want to fix my issue with the logs. Fix common problems directed me to here. My thought was to change out the motherboard thinking that its a PCIE issue but i want to ensure its not something that I can fix here before trying to convince the spouse to let me buy a new motherboard (she thinks I spend too much money on this as is :P). I'm attaching everything I think will be helpful including images ST18000NM000J-2TV103_ZR54ZAC2-20230202-1653.txt syslog.txt Quote Link to comment
Vr2Io Posted February 7, 2023 Share Posted February 7, 2023 (edited) Log full with "nchan" error. Due to those error / crash, disk error may not relate to disk controller problem. Any VM / Docker running in system ? Pls try memory test to ensure system in health first and post full diagnostic, also try stop all unnecessary docker / VM / applications to rule-out the problem. Edited February 7, 2023 by Vr2Io Quote Link to comment
GRRRRRRR Posted February 7, 2023 Share Posted February 7, 2023 (edited) Yes The log is full. Fix that as well. I fix this like this I doublecheck these OC Cooling Power and power cables sata cables HBA card firmwares After fuzzing with these such issues disappear until next time I touch something such as Add a disk. booting into memtest is not a real memtest because the power management is different than the loaded OS. He he he. Edited February 7, 2023 by GRRRRRRR Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.