Jump to content

(SOLVED - SORT OF) Thought I was getting there.... and now this.


Recommended Posts

Posted (edited)

Hey everyone,

 

I had started getting everything in line and had copied most of three drives over to the server without a single issue, however, I realized folders would be split because DrivePool had multiple instances of everything across four drives.  I stopped that array, set it to BTFS to "break" the array and set it back to XFS. System Formatted.

 

As previously discussed in the other thread, I'm using Unassigned Devices and copying from the old DrivePool/SnapRaid setup as SATA Port 6 is free as I'm not using the Cache Drive nor Parity until everything is copied over.

 

So far so good, or so I thought.  The first share being copied was 1 TB going to Drive 3 of the Unraid Server (intentionally).  About 45% of the way through copying (or about an hour in), Windows Explorer on my W10 machine that I'm using to handle moving files, turned RED and lost connection to the server. I attempted to go to the Server GUI and it was non-responsive.  Waited a few minutes and tried again and the server had rebooted itself but did not mount the array.

 

This happened again about an hour after that.  And then while copying the next folder (about 600GB), it died at 49% of the way and my other half called to say that it my server made a powering down sound) and then came back up

 

And here I am. I haven't mounted the array again and will not be doing anything until I can figure out why it keeps doing this when it wasn't before.  I'm not sure what logs are needed or where to possibly find them.  Can somebody let me know what logs might be helpful and what I should do as a relative noob. 

 

Thanks all!

Edited by AdrianF
Typo.
Posted

Logs do not survive a reboot unless you set up the syslog server under Settings.

 

however the machine powering itself off has to be a hardware issue.     The two obvious candidates are the CPU fan not working properly causing the CPU  to overheat and turn the system off, or an insufficient/faulty power supply.

Posted
7 minutes ago, itimpi said:

Logs do not survive a reboot unless you set up the syslog server under Settings.

 

however the machine powering itself off has to be a hardware issue.     The two obvious candidates are the CPU fan not working properly causing the CPU  to overheat and turn the system off, or an insufficient/faulty power supply.

I'm finding it really strange that this was working fine without issue a few days ago and now suddenly is doing this.  System is reporting an unclean shutdown and I don't want to keep hurting the drives if this keeps happening.

 

How to do I enable the syslog in settings? I realize I can Enable it but what kind of settings or where should I have it store the logs? 

 

Fan is a Noctua and is running fine.  Power Supply seems to be okay as well -- 650W eVGA that's maybe using 200W on a good day at the moment; it's (5) drives plus an SSD and motherboard.  No external video cards or other cards running.  Power load on UPS is reporting 97W at the moment connected.

 

 

Posted

I'll just add that it packed in about 15 minutes after copying again.  Something here is funky.  I'm going to wait for further advice on how to set up syslog.

Posted (edited)

So, I tracked back to everything that happened between the working config and the crashing config -- all that happened was (3) drives were swapped out (SATA and power) plus a fourth drive reconnected.  I simply unplugged everything and reconnected the four drives; also tracked back to the mainboard and ensured all connections were secure.  I was reading that faulty/unsecure SATA cables can cause this kind of behaviour as well.

 

Been up for about 20 minutes so far copying and nothing further has happened.  Will update if it craps out again OR if it continues to play nice.

 

EDIT: Just died about 15 minutes after I posted this.  I'm gonna go take another look at everything later tonight or tomorrow but this is frustrating.

Edited by AdrianF
Posted

Ran Memtest86 for over 12 hours (5 passes complete) and zero errors. CPU was measuring no more than 50C the entire time. 
 

Ordered new SATA cables which should arrive today. 

Posted (edited)

Well, that wasn't good.   SATA cables changed out to no avail and the system just died in the middle of copying 1.2TB. File copying was taking place via Windows 10 share on the machine where the syslog server is running so I was able to watch for it.

 

System logging was running but tells me nothing of absolute use because the event is NOT captured (it happened at 17:01):

2020-05-22 16:43:42    User.Notice    192.168.7.47    May 22 16:43:40 BlackBox root: Fix Common Problems Version 2020.05.05
2020-05-22 17:03:40    User.Notice    192.168.7.47    May 22 17:03:38 BlackBox webGUI: Successful login user root from 192.168.7.44
2020-05-22 17:03:40    Syslog.Info    192.168.7.47    May 22 17:03:38 BlackBox rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.1908.0 try https://www.rsyslog.com/e/2359 ]

 

So, as I mentioned above, MemTest86 was running all night, didn't see the CPU temperature go past 50C.  I figure after 5 complete passes, it would've complained about something if it was RAM related.  

 

Time to try a different power supply maybe?  

_Or_ is it time to move away from the i5-6500 / Gigabyte Z170N-Wifi combo to something else?

Edited by AdrianF
Adding in the idea of moving to new motherboard/CPU
Posted
On 5/21/2020 at 4:27 PM, itimpi said:

Logs do not survive a reboot unless you set up the syslog server under Settings.

 

however the machine powering itself off has to be a hardware issue.     The two obvious candidates are the CPU fan not working properly causing the CPU  to overheat and turn the system off, or an insufficient/faulty power supply.

As mentioned, did set up a syslog server on another PC but it didn't capture anything of (nothing logged around the time of the reboot).

 

Also, as mentioned, I did run MemTest86 for almost 12 hours last night; 12 passes.  Didn't see CPU push past 50C at all during that time.

 

Power Supply is an eVGA 550W from about four years ago -- I added it to my PC when I added an eVGA GeForce 1050 card which has since been removed.  Not exactly the best quality product out there although it didn't run a lot but was powered on and off which I know adds to it age.

That said, it's Z170-based with an i5-6500 on it, 5x 7200 RPM SATA drives and 1x 120GB Kingston SSD as a cache drive.  Just ordered a Corsair SF450 because I want something cool and quiet (I used to own this PSU at one point and loved it but with the video card, it would've been too tight).  Should be here on Sunday.  Will removed ALL old modular cables from the previous PSU and install the new ones to see if that solves the issue and report back.

Posted

New PSU showed up. Cables were too short and not enough ports for the drives and case fans.  Had already taken the old PSU out and moved out on the floor holding the sides.  Went to pick it up as I thought about using some of the sleeved cables and the middle SATA power connector fell right out. 

 

Could've been shorting so everything is reconnected (haphazardly) while I try copying again. Been copying for 15 minutes and still going strong. Will update. 

 

Still contemplating junking the i5-6500 if everything's good alongside the gigabyte board (junk but needed that form factor at the time). Thinking about jumping to an Asus Prime H370-A/CSM board since it's going to be more current along with an i5 9400 because the price is good (cheaper at base price in CDN$ vs US$). 

 

Will update later. 

Posted

Cancel that. Died again. Part of me says it's time for the new build to get away from the Gigabyte Board. Might still be the PSU but I'm doubtful.  

  • 2 weeks later...
Posted

Quick update: Decided that since the system was pushing 5-6 years old, instead of spending more time trying to get it up and running, I decided it was time to do a 2020 Unraid Build.  Pretty much have everything sorted except potentially cooling and PSU but think I'm close on that.

Thanks everybody for the space.  Gonna mark this "SOLVED" for now...

  • AdrianF changed the title to (SOLVED - SORT OF) Thought I was getting there.... and now this.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...