November 29, 20178 yr Upgraded from 6.3.5 to 6.4.0-rc14. Uptime for 48 hours so far no issues. Running a Ryzen 1700 build. Will plan on undoing all the "fixes" needed to get Ryzen to run once I determine there's no overall stability issues. Will report back if all the Ryzen issues end up being resolved.
November 29, 20178 yr Running on Asus x370 pro and a ryzen 1700x but the system crashed after parity check completion, I'm not sure if this is a ryzen problem or caused by one of my PCI-e cards. This only happened once yesterday, I'll see if anything happens later today and update my post. Installed in my system are: Asus GT 710 EVGA GTX 970 FTW+ with modified bios TurboSight TBS 6902 DVB-S/S2 Realtek Gigabit NIC Motherboard on latest bios 3203 The PSU is a new EVGA 650 G2 80+ gold, so this will probably eliminate any issues caused by a faulty power supply. Edited November 29, 20178 yr by PSYCHOPATHiO
November 29, 20178 yr On 11/24/2017 at 12:10 PM, Squid said: Just wanted to figure out why only the one error appeared and not all of them, and didn't catch in the syslog the reason why. Don't worry about it - it was fun. That would be when all of the errors happened (ie: network wasn't fully running until after FCP did its thing at array start) I think what we need is a setting somewhere that waits for the network to be up and running properly before installing plugins or starting the array (with an appropriate timeout) On 11/24/2017 at 2:05 PM, dlandon said: I put together a quick plugin named 'a.network.wait' that will run before other plugins. It checks at 1 second intervals and waits for the network to come up before other plugins can run. On my test server the wait time was logged at 4 seconds. It seems like LT should wait until the network is alive before trying to install plugins. I believe the thinking up until now was that it was not important because installed plugins have already downloaded the things they need and installed them on the flash. That no longer seems to be the case. Just to add a little bit. I made a thread a little over a year ago, when the large-scale DNS DDos was happening, and I tracked similar findings to how Plugins were incorrectly marked and deleted because they couldn't phone home.
November 29, 20178 yr 3 hours ago, PSYCHOPATHiO said: Running on Asus x370 pro and a ryzen 1700x but the system crashed after parity check completion, I'm not sure if this is a ryzen problem or caused by one of my PCI-e cards. This only happened once yesterday, I'll see if anything happens later today and update my post. .......... Motherboard on latest bios 3203 The PSU is a new EVGA 650 G2 80+ gold, so this will probably eliminate any issues caused by a faulty power supply. You might want to install the 'Fix Common Problems' plugin and turn on the troubleshooting mode. Also have a monitor connected up to the server to see what is on the screen when the crash occurs. Be sure to photograph that screen and make sure you get everything--- not blurred, sharp-focus and no flash spots. (Someone will have to be able to read it.) I would not rule out that new PS. I had one fail in the first thirty days earlier this year. (It would just shut-down...)
November 29, 20178 yr I found the culprit, it's a memory compatibility issue, even with the latest bios IF I overclock the memory either the system won't start or the system will start and crash and unRAID login. The memory at stock speed will work but might crash at any given time when the system is idle. I found compatible memory yet it's too expensive to order now, I might just order it or have to wait till mid 2018 to see if prices change. EDIT: forgot to mention that this means unRAID is in the clear I think Edited November 29, 20178 yr by PSYCHOPATHiO
November 29, 20178 yr 4 hours ago, PSYCHOPATHiO said: EDIT: forgot to mention that this means unRAID is in the clear I think Rule 1: do not overclock a server
November 30, 20178 yr 3 hours ago, bonienl said: Rule 1: do not overclock a server i'm not an idiot, it will reduce the lifespan of my CPU. In addition the Ryzen has enough power for all that I need
November 30, 20178 yr 21 hours ago, Frank1940 said: You might want to install the 'Fix Common Problems' plugin and turn on the troubleshooting mode. Also have a monitor connected up to the server to see what is on the screen when the crash occurs. Be sure to photograph that screen and make sure you get everything--- not blurred, sharp-focus and no flash spots. (Someone will have to be able to read it.) I would not rule out that new PS. I had one fail in the first thirty days earlier this year. (It would just shut-down...) Ok, I had another crash after I decided to reboot the system & when the array was starting the system crashed again, here is a picture of the kernel panic. System Specs: Motherboard Asus X370 pro with bios 3203 AGESA 1071 CPU: Ryzen 1700X stock speeds with noctua nh-d15 memory corsair 8GBx2 @ 2133 PSU: EVGA 650 G2 80+ gold PCIE cards: 1-ASUS GT 710 -2- TBS DVB-S2 -3- Realtek GBE -4- EVGA GTX 970 FTW+ modified bios The HDD list: Parity 8TB Seagate ironwolf Array 2x8TB Seagate ironwolf + old 4TB Seagate NAS Cache samsung 840 pro 256GB unattached 256 ssd m.2 Edited November 30, 20178 yr by PSYCHOPATHiO
November 30, 20178 yr 2 minutes ago, PSYCHOPATHiO said: Ok, I had another crash after I decided to reboot the system & when the array was starting the system crashed again, here is a picture of the kernel panic. If you haven't yet see if disabling C-states on the bios helps.
November 30, 20178 yr 3 minutes ago, johnnie.black said: If you haven't yet see if disabling C-states on the bios helps. it was set to auto so I disabled it, will report with any changes.
November 30, 20178 yr 5 hours ago, PSYCHOPATHiO said: it was set to auto so I disabled it, will report with any changes. I use ASUS prime X370-pro with bios 3203, I haven't problem currently with C-State enable/disable/auto. But few days ago. I got system crash manytime ( trace got, like your screen capture ), finally I shoot it is one of memory module issue, I take it out and no problem now. ( For C-state issue, it should completely hang instead trace ) I haven't follow the approve memory list, in first I use 2x4G (SR, single rank) 2133 and now use 4x8G (SR) 2400, it can run in 2600 1T(CR) without add voltage. Usually 8GB module was dual-RANK, but you use 2, so I don't think it have issue. I belive you would got crash again, pls test the memory more hard. Edited November 30, 20178 yr by Benson
November 30, 20178 yr 56 minutes ago, Benson said: I use ASUS prime X370-pro with bios 3203, I haven't problem currently with C-State enable/disable/auto. But few days ago. I got system crash manytime ( trace got, like your screen capture ), finally I shoot it is one of memory module issue, I take it out and no problem now. ( For C-state issue, it should completely hang instead trace ) I'm using 2 out of 4 memory modules used to be installed in my other system, all was working great and never had any issues with them. after disabling C-state I was able to push the memory up to 3200mhz without an issue. Now back online with multiple linux VMs & a windows VM and doing a parity check as well. for the time being all is good. I will keep testing. 56 minutes ago, Benson said:
November 30, 20178 yr 20 hours ago, bonienl said: Rule 1: do not overclock a server FWIW I do have one Ryzen 1700X system that ran unraid 6.3.5 for weeks before I'd even heard about the Ryzen problems and it never crashed once. That system was OCed to 3.8 GHz, had c-states enabled, and didn't have rcu_nocbs set. It does have SVM enabled. I've since disabled c-states, set rcu_nocbs, and removed the OC just to be safe, but it ran under an extremely heavy virtualized load for almost a month without issue.
December 1, 20178 yr 15 minutes ago, FredG89 said: How do I upgrade from 6.3 to this? Read earlier posts in this same thread...
December 1, 20178 yr Yesterday after I thought the system was stable enough I decided to start from scratch & reinstall everything and changed my file system from btrfs to XFS but to no avail, I was transferring about 4TB of data from my other backup to this one and woke up to a kernel panic this morning. So disabling C-state didn't do much, I think it is all due to motherboard issues. I will try stripping the motherboard to only what is needed and remove any extra cards including the m.2 drive.If this doesn't work then I will try using my other ryzen 1700x on an MSI x370 Gaming pro motherboard. The reason why I have 2 Ryzen(s) is a long story for another time. Edited December 1, 20178 yr by PSYCHOPATHiO
December 1, 20178 yr UPDATE: I gave up & set the Ryzen system aside till AMD figures out what their doing and went back to my trusty old i7-4930K and downgraded to 6.3.5... till the next AMD bios update, I will stick with intel. I've also tried the Ryzen on windows & it also crashes so I will keep the ryzen as a test bench till it is stable enough for a server use.
December 1, 20178 yr 3 hours ago, PSYCHOPATHiO said: UPDATE: I gave up & set the Ryzen system aside till AMD figures out what their doing and went back to my trusty old i7-4930K and downgraded to 6.3.5... till the next AMD bios update, I will stick with intel. I've also tried the Ryzen on windows & it also crashes so I will keep the ryzen as a test bench till it is stable enough for a server use. I don't see any information that anyone can use to try to help you. You'll at least need to provide diagnostics if you don't want your posts to sound like a rant.
December 1, 20178 yr Author 4 hours ago, PSYCHOPATHiO said: UPDATE: I gave up & set the Ryzen system aside till AMD figures out what their doing and went back to my trusty old i7-4930K and downgraded to 6.3.5... till the next AMD bios update, I will stick with intel. I've also tried the Ryzen on windows & it also crashes so I will keep the ryzen as a test bench till it is stable enough for a server use. Apparently the first run of Ryzen had some kind of issue which resulted in kernel segfaults (different than outright hangs), and you could RMA the chip and get a new one. Maybe you have one of those early Ryzens?
December 1, 20178 yr 3 hours ago, limetech said: Apparently the first run of Ryzen had some kind of issue which resulted in kernel segfaults (different than outright hangs), and you could RMA the chip and get a new one. Maybe you have one of those early Ryzens? I have an MSI x370 pro and another Ryzen 1700x CPu. just finished assembly and will give it a try on a trial unRAID and check the stability for a couple of days. EDIT: installed the system with bare minimal without an overclock of CPU or RAM, it crashed once the parity started. Edited December 1, 20178 yr by PSYCHOPATHiO
December 2, 20178 yr 23 hours ago, BRiT said: Read earlier posts in this same thread... I was able to upgrade to 6.4 but now I can't access the shares from my desktop.
December 2, 20178 yr Author 7 minutes ago, FredG89 said: I was able to upgrade to 6.4 but now I can't access the shares from my desktop. Please post diags, can't do anything without the diags. Did I mention: need to see your diags.
December 2, 20178 yr 11 minutes ago, limetech said: Please post diags, can't do anything without the diags. Did I mention: need to see your diags. Sorry about that. Weird because I can SSH to the server so it seems that port 445 is being blocked? I'm not sure what unRAID uses but I would assume that's what port my computer is trying to connect for SMB. homesrv01-diagnostics-20171201-1906.zip Edited December 2, 20178 yr by FredG89
December 2, 20178 yr I've been in and out of the server all day. Just now Chrome warns me the SSL cert is expired. I thought this was automatic? Quote This server could not prove that it is c660dbb7f4f305799046a5e90477700ce58c8c08.unraid.net; its security certificate expired yesterday. This may be caused by a misconfiguration or an attacker intercepting your connection. Your computer's clock is currently set to Friday, December 1, 2017. Does that look right? If not, you should correct your system's clock and then refresh this page. It would appear there isn't enough padding on the expiration to handle PST time zone. I am going to manually renew. new expiration: Mar 2 03:42:40 2018 GMT Maybe this needs to auto renew 24-48 hours prior to expiration? Edited December 2, 20178 yr by interwebtech
Archived
This topic is now archived and is closed to further replies.