Changed SAS card, random disks disabled, log is full, i dont know what to do :(, any help appreciated :)


Recommended Posts

Hi all

 

Noob unraider here but i could use some help on some issues Ive been having over the past 6 weeks. 

 

I have a 10 drive system with 8 drives and 2 parity drives. 8 of the drives are the same (SEAGATE EXOS 18tb) and connected via an LSI SAS 2308 PCI-E card. The other 2 data drives are WD red 18tb drives connected via SATA to my MoBo (ASUS Z390M). I also have 2 SSDs connected via SATA and an NVMe drive (which I put in because I literally couldnt find a use for it; I dont use it and quite frankly I think im gonna get rid of it because 

 

Last month I had an issue where random drives would get disabled with either read or write errors. I took the suggestion of a few fellow unraiders and would stop the array -> "remove" drive -> start in maintenance mode -> stop array again -> "replace" drive and then do a data rebuild. But when I would do this, another drive would have errors and disable. Finally, i took the suggestion on reddit to check all the cables and took it upon myself to replace my LSI SAS card with an identical one that I had sitting on an unused unraid server that I had from long ago. During this time I had multiple panic attacks that the server was going to fail and my data would be lost. But everything worked out.

 

Things were working well until this week when my Parity 2 drive came up with 2500 read errors and then disabled itself. I'm not sure what caused it. I'm not good at reading logs but I also noted that my system log in memory is at 100% and I see a bunch of errors. 

 

Im super hesitant to go through the method described above and wondering if some saavy users here can help me understand what to do and how to fix this, or if I need to do hardware exchange. I also want to fix my issue with the logs. Fix common problems directed me to here. My thought was to change out the motherboard thinking that its a PCIE issue but i want to ensure its not something that I can fix here before trying to convince the spouse to let me buy a new motherboard (she thinks I spend too much money on this as is :P). 

 

I'm attaching everything I think will be helpful including images

Fix common problems.jpg

Log is full.jpg

parity disabled.jpg

ST18000NM000J-2TV103_ZR54ZAC2-20230202-1653.txt syslog.txt

Link to comment

Log full with "nchan" error. Due to those error / crash, disk error may not relate to disk controller problem. Any VM / Docker running in system ?

Pls try memory test to ensure system in health first and post full diagnostic, also try stop all unnecessary docker / VM / applications to rule-out the problem.

Edited by Vr2Io
Link to comment

Yes The log is full. Fix that as well.

 

I fix this like this I doublecheck these

OC

Cooling

Power and power cables

sata cables

HBA card firmwares

 

After fuzzing with these such issues disappear until next time I touch something such as Add a disk.

 

booting into memtest is not a real memtest because the power management is different than the loaded OS. He he he.

Edited by GRRRRRRR
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.