-
Server Unresponsive
Good morning! Here is the update... Server is 100% stable. It has not crashed at all since the cpu was swapped out. However, I did find another issue. The Nvidia Tesla P4 GPU gets super hot when transcoding 3 files in tdarr. I have disabled the tdarr container and it runs stable. I have ordered a fan for the GPU to help cool it. More to follow on the success of that! Again, huge thanks to JorgeB & JonathanM for all your help!
-
GatorMB started following nVidia Tesla p4 8GB Card failure
-
nVidia Tesla p4 8GB Card failure
For some reason the GPU card shows when I boot, and then it falls off the system. I caan't see the card details in the nVidia driver section, and I can't see. it in the GPU Statistics plugin. Am I doing something wrong, or is it a bad gpu? What is another alternate GPU you would recomment that won't break the bank?
-
Server Unresponsive
System is stable. It was 100% a bad processor. Thanks everyone for all your help!
-
Server Unresponsive
New processor arrived today. It’s now in and I’m up and running. I will report back within 72 hours if it’s stable. Fingers crossed!
-
Server Unresponsive
I do also notice that I get a failure on the GPU plugin in Unraid on occasion. Do I need to disable the onboard video now that I am running the Tesla P4?Or do I have a bad GPU as well?
-
Server Unresponsive
I'm starting to believe you are correct. I initially ran this box with a Supermicro x9scl mobo and cpu and it ran perfect. It just couldn't handle transcoding and the mobo didn't have a slot for a gpu card. I upgraded the mobo to the Supermicro x11ssh-f and the cpu to the xeon e3-1285v6. Ram went from 32 to 64. All drives, psu, cooling, case stayed thee same. I started having failures. I changed the ram and still same issue. I changed the mobo and same issue. I changed the psu from 600 80+ white to 850 80+ gold. I added liquid cooling. I added a Tesla P4. Nothing has eliminated the problem. The only thing left is the CPU. I am waiting for a xeon e3-1270v6 to arrive in a few days. I'll swap it out and see if that helps. If not then I'm at a total loss as to what could be causing it! Could it be bios related? I have BMC connected, but don't have the password, so I will need to reset it via the jumper? Then I can review it on a remote pc. I have link aggregation connected from the mobo to my ASUS GT-AC5300 router. I literally have no idea what else to try?! I have the syslog going to root on the flash drive, but nothing seems to stand out to you or others... Do you think it could just be a bad CPU?!
-
Server Unresponsive
Ok, so I went away to the lake yesterday am and left the server running. I got back an hour ago and it's all locked up again. Not showing on the network either. Did a hard reset and it came up fine. Here are the logs. I can't figure this out! Someone please point me in the right direction! syslog syslog-previous
-
Server Unresponsive
No, I don't leave any active connections. It's a headless server and I only log in to run a process or to try to figure out an use such as this. I'm going to enable the IPMI function and connect that lan port for diagnostics later this weekend.
-
Server Unresponsive
So yesterday I woke up to an unresponsive server again. No network connectivity, nothing. So, I decided to try 2 more things. I changed the PSU from a 600w to a new corsair 850 80+ Gold. I then added a corsair water cooler for the cpu. Again this am, I woke to a non-responsive server. But this time it was still showing on the router as connected. If it crashes over the weekend, I'll upload a new set of logs. goldraid-diagnostics-20240216-2022.zip syslog-2.txt
-
Server Unresponsive
I will after this learning experience! lol It’s running fine now. I’ll post in the morning and let you know if it crashed again. Thanks again for all your help!
-
Server Unresponsive
Ok, then how would I reset it to that?
-
Server Unresponsive
What would you suggest?
-
Server Unresponsive
I went to the bash command and made sure all appdata / domains / system shares were moved to the cache using: rsync -av --remove-source-files /mnt/disk2/appdata/ /mnt/cache/appdata/ It moved all files. I then removed all empty folders left behind: find /mnt/disk2/appdata/ -type d -empty -delete I corrected appdata folder permissions: chmod -R 755 /mnt/cache/appdata/ chown -R nobody:users /mnt/cache/appdata/ I made sure that the appdata / domains / system shares were all now cache only in my shares menu. I made sure that the data & iCloud-drive-sync shares were all now pointing to array in my shares menu. I reloaded the nvidia driver, and installed the gpu statistics plug-in. I removed changed the macvlan to ipvlan. I have checked the cache pool and It's still pretty full (I think). I have a 2tb ssd and a 256GB ssd. Do I need a larger cache pool? What am I missing? And before I forget, huge thanks to trurl & JorgeB for all your help. I really appreciate the time you are taking!
-
Server Unresponsive
NETWORK ID NAME DRIVER SCOPE 400cca84e0d2 br0 ipvlan local 06eb05e8c95b bridge bridge local 74b5ac950c5a host host local 8c0d0933753b none null local
-
Server Unresponsive
So, it crashed again. Here is the syslog info. syslog-previous.txt syslog.txt
GatorMB
Members
-
Joined
-
Last visited