November 20, 201411 yr It has been several years since I last visited the forum, unRAID server has performed flawlessly and I was very content. But about a week ago, I began to experience drop outs, for whatever reason the server would go off line, and one of the drives appeared to hang (the red warning light on my RAID chassis would light up). So I had to re-boot the server and it came back without an issue (or so I thought). But the server drop outs became quite regular, and now when I try to do a Parity Check (if I can get the array to come back online), after about 10 minutes, the server would crash, and I think it is getting worse, I get two symptoms: 1) upon reboot, the Parity Drive is missing; or 2) when trying to mount "Disk 11", the server would shut down after a few minutes, and power itself off, i.e., it goes dead. Since all this happens before the array is online, I am not able to access the MC or get a copy of the log to see what's going on. I have taken out "Disk 11" and mounted/read it on my desktop, which seems to work fine. So I am seeking your advise as to the best course of action, do I just assume the disk is bad and shrink the array, then copy its content to another drive once the array comes back online? Or is there something much more serious that may cause even more demage - I don't want to screw things even more. For your reference, I am still on unRAID 4.4.2 with about 10TB data split between the drives mounted inside the server chassis and an external RAID chassis (via external e-SATA card and cables). Your suggestions and comments are much appreciated. TIA, Jaz
November 20, 201411 yr I'd be looking at hardware first, not the hard drives. Cables, enclosure, etc. Smart reports can help find out if it's a bad drive. Is there no syslog on your flash drive?
November 20, 201411 yr Author I'd be looking at hardware first, not the hard drives. Cables, enclosure, etc. Smart reports can help find out if it's a bad drive. Is there no syslog on your flash drive? Thanks for the reply, I checked all the cables, memory, re-seated the disk drives. There is no recent syslog file on the flash drive... I was going over the "bad drive" and noticed that there were some missing files, i.e., I copied some files to it this morning before the server went down, but now those files could not be found on the drive - would that suggest the drive is bad? But I am with you on the hardware, perhaps the motherboard, NIC, etc. was mis-behaving which caused the drive to hang... Is there a way I can force the system to generate a syslog even in the event that it craps out during the the boot process? Jaz
November 21, 201411 yr Author I managed to boot up the server and brought the array online, don't know if it will stay up - it's doing Parity Check now. I did notice that the Parity Driver was writing to Disk 11 during the startup, so that would explain why some files were missing from the drive. Anyway, here is the syslog for your perusal, perhaps you can help me spot something... === Oh well, the server quit just after I posted... I noticed that "Disk 8" had quite a few errors (from the Web interface), but that does not explain why the server just shut itself off like that, does it? Your tips and suggestions are appreciated. Thanks, Jaz syslog.txt
November 21, 201411 yr Oh well, the server quit just after I posted... I noticed that "Disk 8" had quite a few errors (from the Web interface), but that does not explain why the server just shut itself off like that, does it? Your tips and suggestions are appreciated. If the server unexpectedly shuts off then this strongly indicates there being something wrong at the hardware level. The most obvious cause would be a power loss - but I would have assumed that would be noticed? Another thing to check is that there is not an issue with the CPU fan. Most motherboards shut themselves down if the fan is detected as not working to prevent the CPU burning out. Other possibilities include a failing motherboard or power supply.
November 21, 201411 yr itimpi is heading in the right direction. Though I'd take it a step further and say, "The most obvious cause would be a power loss - but I would have assumed that would be noticed?" ... thus I would immediately suspect the power supply and/or CPU cooling. Swap out the PSU now or at a minimum, pull it, open it (CAREFUL!!!) blow out the dust, and clean off the fan blades. If there are no fans, then ignore that comment of course and just replace it. Do the same thing with the HSF on the CPU. Pull it, clean it, reapply thermal compound, check fan operation, and reinstall.
November 21, 201411 yr Have you cleaned the dust and dirt out of the computer lately? Pay particular attention to the CPU heat sink (and any other heat sinks). Also check the air intakes if they have smallish openings as these often become partially-to-mostly blocked. You probably want to do this outside as you will probably 'find' more dirt than you can believe!
November 21, 201411 yr Also check the air intakes if they have smallish openings as these often become partially-to-mostly blocked. You probably want to do this outside as you will probably 'find' more dirt than you can believe! Computer towers make very effective air cleaners. ESPECIALLY if anyone in the household smokes. The tar and nicotine sticks to everything inside the tower.
November 29, 201411 yr Author Thanks guys. I finally had a chance to work on the problem after coming back from a long work trip. I cleaned everything in the server and the external array - vacuumed up all the dust, re-seated the hard drives, cables. Et voila! the server came up and everything worked as it should, so I guess the system was overdue for a general cleaning/maintainance. All is well now, thanks again for all your suggestions.
Archived
This topic is now archived and is closed to further replies.