vitovega Posted May 10, 2023 Share Posted May 10, 2023 I was unable to access my unRaid via WebUI or SSH so I needed to force a shut down (I know:(, should've tried to plug in a Keyboard/monitor). When it came back up, 2 of the drives were Disabled (1 of my 2 parities and Disk1). A year or so ago (it's a 3 year old rig without any other issues I can think of) I got an UDMA CRC error count on a drive that resolved when I opened the box and pushed in all the connectors. Each of the disabled drives again reads a UDMA CRC error but opening the box and pushing in the connectors didn't magically bring them back this time. Upon closer inspection it looks like the tension and heat maybe stretched the blue sheathing of the SATA cables and is exposing some metal underneath. I bought all new cables and they come in tomorrow. Seems like it's probably a good idea to replace the cables in any event, but i'd really like to get a better understanding of the problem to make sure I'm addressing the right one. Is there a way i can confirm this is the issue before doing it? Should I replace these drives while I'm rerunning the cables (I already have replacements)? Also advice on a work flow for what to do inside unRaid after I've replaced the SATA cables et al and, hopefully, the drives are .... reenabled? rediscovered? I'm sorry i'm not sure the nomenclature...obviously:) Quote Link to comment
JorgeB Posted May 10, 2023 Share Posted May 10, 2023 Please post the diagnostics after array start.. Quote Link to comment
vitovega Posted May 10, 2023 Author Share Posted May 10, 2023 8 hours ago, JorgeB said: Please post the diagnostics after array start.. Of course my apologies. I also forgot to mention that there was no useful info in the log that I could find. Simply 2 errors stating the disk was disabledser7en-diagnostics-20230510-1126.zipser7en-diagnostics-20230510-1126.zip Quote Link to comment
Solution JorgeB Posted May 10, 2023 Solution Share Posted May 10, 2023 SMART looks fine for both disks, there's a single CRC error and for one of the disks it's an old error, still a good idea to replace cables to rule that out, then and since the emulated disk is mounting, and assuming contents look correct, you can rebuild on top and re-sync parity, both can be done at the same time. Quote Link to comment
vitovega Posted May 10, 2023 Author Share Posted May 10, 2023 28 minutes ago, JorgeB said: SMART looks fine for both disks, there's a single CRC error and for one of the disks it's an old error, still a good idea to replace cables to rule that out, then and since the emulated disk is mounting, and assuming contents look correct, you can rebuild on top and re-sync parity, both can be done at the same time. Thanks for that color! Is there a patirticular order i should rebuild Disk1 and re-sync the parity? or since they can be done simultaneously it doesn't matter (any benefit to doing them separately)? And finally, when i'm rerunning those SATA cables, do I need to keep track of which cable goes to which drive/port on the Card? Or will unraid sort that out without me trying to replicate the same runs...? Quote Link to comment
JorgeB Posted May 10, 2023 Share Posted May 10, 2023 11 minutes ago, vitovega said: or since they can be done simultaneously it doesn't matter (any benefit to doing them separately)? I would do both at the same time. 12 minutes ago, vitovega said: And finally, when i'm rerunning those SATA cables, do I need to keep track of which cable goes to which drive/port on the Card? No, it tracks the disks by serial. Quote Link to comment
vitovega Posted May 12, 2023 Author Share Posted May 12, 2023 (edited) Got the cables, but didn't have the time/energy to do it last night, def by the close of the weekend (Agh mother's day!). But something that was bothering me...If one of the CRC errors is old on those 2 drives....why did the drive get disabled? Edited May 12, 2023 by vitovega Quote Link to comment
vitovega Posted May 13, 2023 Author Share Posted May 13, 2023 @JorgeB could you please provide a link or how-to to rebuild the parity and re-add disk1? Quote Link to comment
JorgeB Posted May 15, 2023 Share Posted May 15, 2023 https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself Quote Link to comment
vitovega Posted May 15, 2023 Author Share Posted May 15, 2023 thanks! unfortunately my Unraid box has been making a constant, very loud beeping sound ever since i replaced the Sas cables. Something in the IPMI about cpu2 overheating... Quote Link to comment
JorgeB Posted May 16, 2023 Share Posted May 16, 2023 Check to see if there's a cable or something jamming one of the CPU fans. Quote Link to comment
vitovega Posted May 16, 2023 Author Share Posted May 16, 2023 13 hours ago, JorgeB said: Check to see if there's a cable or something jamming one of the CPU fans. unfortunately nothing like that. I checked the IPMI and cpu2 is overheating. so i'm going to try a different fan header and see if I can hear the pump going on the AIO cooler and if not replace? In order to recable the hard drives I had to remove the radiators from the fan wall and then the fan wall and think i knocked out the cpu header when I was reattaching... Quote Link to comment
JonathanM Posted May 16, 2023 Share Posted May 16, 2023 If possible replace the liquid cooling with standard heatsinks and fans. Liquid cooling is great for gaming machines that are in view constantly while in use, but very bad for unattended servers. The failure modes for liquid cooling can be catastrophic if not caught quickly. Quote Link to comment
vitovega Posted May 17, 2023 Author Share Posted May 17, 2023 (edited) Eep I didn't know that! What sort of catastrophic failure? Leaks? Unfortunately replacing it with anything but the same model will require me to remove the whole mobo and I'm not sure I'm ready to do that if it can be simply replaced... But thank you for the info! I might not even be able to replace with the same model without removing the mb. If it comes to that I'll just get a fan. Thanks once again for the advice Edited May 17, 2023 by vitovega Quote Link to comment
JonathanM Posted May 17, 2023 Share Posted May 17, 2023 10 hours ago, vitovega said: What sort of catastrophic failure? The relatively low thermal mass of the water block can allow rapid temperature spikes if the fluid stops moving or is gone. Processors do attempt to save themselves from overheating, but the engineers assume a certain amount of mass is going to be available even without airflow, so the lack of mass can allow damaging heat in a matter of seconds. Leaks can be very bad as well, even if the fluid is clean, the boards have dust and particles that will mix with the fluid and cause corrosion, a slow undetected leak is the worst as it seeps into cracks and crevices causing voltage to go where it's not supposed to. Worst case would be a slow leak above a bottom mounted PSU, you could end up with mains voltage going to all your sensitive parts at once, blowing out circuit boards on drives. Granted, that sort of failure is very rare, probably because most water coolers are in gaming rigs for show, and any leaks or failures are caught relatively quickly. Quote Link to comment
vitovega Posted May 17, 2023 Author Share Posted May 17, 2023 (edited) YIKES! wow thanks for that! I'm glad i shut down my rig so fast when it was beeping, hopefully there was no permanent damage done. we'll find out tonight! Do you have any suggestions for what signs et al to look for if there WAS permanent damage? is it like memory issues where it's super hard to diagnose? Edited May 17, 2023 by vitovega Quote Link to comment
JonathanM Posted May 17, 2023 Share Posted May 17, 2023 6 hours ago, vitovega said: is it like memory issues where it's super hard to diagnose? It can be, since if there is minimal but real damage it may only effect certain things under certain conditions. Quote Link to comment
vitovega Posted May 18, 2023 Author Share Posted May 18, 2023 (edited) Well i re-ran all the cables, reinstalled the fan wall, triple checked all the connections and it instantly beeped constantly the second i turned it on (it had been off for 4 days, so overheating seemed very unlikely). I tried a bunch of different headers for the AIO but eventually I just let it run as I was messing around in the IPMI (and cause it seemed to be booting effectively and didn't even feel warm). Well low and behold before unraid GUI could be accessed the beeping stopped and it's been running for over 2 hours without any further beeping. My server-related discord server suggested I update the BMC (and possibly BIOS). But personally feel I should rebuild the array first but I'm TERRIFIED. Can i just confirm with you that this are my steps? select no disk on both the Parity and Disk1 then Start the Array Stop array Reassign both disks to their previous slots start array again finally sleep easy? I simply can't select "no device" and start the array unless someone smarter than me tells me that's the correct move. Edited May 18, 2023 by vitovega Quote Link to comment
JorgeB Posted May 18, 2023 Share Posted May 18, 2023 4 hours ago, vitovega said: select no disk on both the Parity and Disk1 then Start the Array Stop array Reassign both disks to their previous slots start array again That's it, but check that emulated disk1 is still mounting before rebuilding on top. Quote Link to comment
vitovega Posted May 18, 2023 Author Share Posted May 18, 2023 oh fiddlesticks, I got the email this morning that you posted and it just showed "that's it..." and I didn't see the rest and went for it. How WOULD i have gone about checking that the emulated disk was still mounting? it seemed to have...and it's currently rebuilding parity/data-rebuild in process and its 9.8% done. What should i be on the look out for? Quote Link to comment
vitovega Posted May 18, 2023 Author Share Posted May 18, 2023 fwiw, my running theory to the issue i had with the beeping is that the server was (forcefully) shut down during a accurate alert to a hot CPU2/AIO cooler/fan header not working and kept alerting on boot until it got to a certain part in the boot process and something could be reset. I dunno if this is crazy or stupid but it's the best i can come up with cause its not been without beep for 15 hours or so. Quote Link to comment
JorgeB Posted May 18, 2023 Share Posted May 18, 2023 29 minutes ago, vitovega said: What should i be on the look out for? It should be, but in the main GUI page look at the stats for disk, it will show used and free space or unmountable. Quote Link to comment
vitovega Posted May 18, 2023 Author Share Posted May 18, 2023 2 hours ago, JorgeB said: It should be, but in the main GUI page look at the stats for disk, it will show used and free space or unmountable. Thankfully it shows a used and free space amt...thank you for the peace of mind 1 Quote Link to comment
vitovega Posted May 19, 2023 Author Share Posted May 19, 2023 (edited) So i thought that during rebuild i would be able to access my files but i noticed that the network shares weren't connected on my pc and plex was still telling me "files not found", so i rebooted my pc and plex server (I did nothing with the unRaid box) and oddly was then unable to access unraid over GUI or ssh (i tried on my android without luck as well). I was looking at the gui several different times today including right before i restarted my pc and it was about 51% of the way through rebuilding after 13+ hours. Do you have a suggestion for how to proceed from here? My current plan is to wait for at least enough time to pass for the array to be rebuilt (i figure tomorrow evening or Saturday) but then what should I do if I'm still unable to access the gui or ssh? hard reset? that's what I did before i had the disabled disks the first time.... I suppose i should plug a monitor into the server. fwiw it appears to be running fine with no error lights or beeping... Edited May 19, 2023 by vitovega Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.