taserz Posted May 29, 2018 Share Posted May 29, 2018 Hello everyone! So I have been using unraid for 5 months or so and I finally upgraded from my old i3 desktop to a actual server! Hurray! But there are a bunch of issues I am now having and I need help. I swapped out a drive for my parity drive to rebuild parity and it starts off great and then next thing you know it is going at a rate less tham 5mb/s sometimes less than 1mb/s and that is horrible with an 8tb parity drive. If I try to run the drive without the parity it works okay but quickly becomes unusable. I attached my diagnostics files and would really appreciate the help. Model: Custom M/B: Supermicro - X9DRi-LN4+/X9DR3-LN4+ CPU: Intel® Xeon® CPU E5-2670 0 @ 2.60GHz HVM: Enabled IOMMU: Enabled Cache: 512 kB, 2048 kB, 20480 kB Memory: 64 GB Multi-bit ECC (max. installable capacity 1536 GB) Network: bond0: fault-tolerance (active-backup), mtu 1500 eth0: 1000 Mb/s, full duplex, mtu 1500 eth1: not connected eth2: not connected eth3: not connected Kernel: Linux 4.14.40-unRAID x86_64 OpenSSL: 1.0.2o Uptime: 0 days, 04:46:35 tower-diagnostics-NEW.zip Link to comment
eschultz Posted May 29, 2018 Share Posted May 29, 2018 I'm seeing tons of memory errors in your log file from 4 different sticks of RAM. Did you tweak the memory settings in BIOS or is it set on auto/xmp? You can reboot and run the 'Memtest86+' option from the boot menu to check. Link to comment
taserz Posted May 29, 2018 Author Share Posted May 29, 2018 I reset the motherboard to the default settings to make sure any tweaks the past owner did would not apply seeing if it would help and it did not change anything. I will check for auto/xmp and dor a memtest86 test. Not sure how long it will take with 64gb of ram but I will give it a whirl. Luckily I just bought it off theserverstore so if the ram is bad they would swap it out for me. I will run memtest86+ right now and see if it works. Thank you sooo much eschultz for responding so fast. I love unraid and this was the first time I had an issue and this was an amazingly fast response time. Link to comment
taserz Posted May 29, 2018 Author Share Posted May 29, 2018 Also this is the only setting I saw in the bios about ram figure I take a screen grab and show it since I am still learning a lot about server hardware and all this fancy stuff. Currently still in school for compi sci/info system, Link to comment
eschultz Posted May 29, 2018 Share Posted May 29, 2018 10 minutes ago, taserz said: Also this is the only setting I saw in the bios about ram figure I take a screen grab and show it since I am still learning a lot about server hardware and all this fancy stuff. Those memory setting seem fine. Try running that memory test option from the boot menu. I'd run that test for several hours at least but you might see errors pretty quickly if the memory is bad. Link to comment
taserz Posted May 29, 2018 Author Share Posted May 29, 2018 Sounds good I started it I am at 8% right now on the first pass. I really hope it's just a bad stick or two of ram and not a tricky thing to figure out what is causing the slow down. Link to comment
taserz Posted May 29, 2018 Author Share Posted May 29, 2018 So I ran it for about 7 hours today and it is still currently going This is what memtest is showing thus far. Link to comment
JorgeB Posted May 30, 2018 Share Posted May 30, 2018 Check the board's SEL (system event log), there might be some more info there, single bit memory errors would be corrected by ECC and go undetected by memtest. Link to comment
taserz Posted May 30, 2018 Author Share Posted May 30, 2018 6 minutes ago, johnnie.black said: Check the board's SEL (system event log), there might be some more info there, single bit memory errors would be corrected by ECC and go undetected by memtest. Should I let memtest finish first it says it's at 80% Link to comment
taserz Posted May 30, 2018 Author Share Posted May 30, 2018 Okay so I just realized I can expory SEL from IPMI here is what it says sel.txt Link to comment
JorgeB Posted May 30, 2018 Share Posted May 30, 2018 DIMM C1 need to be replaced, this might be unrelated to your speed issues, but it needs to be replaced anyway. Link to comment
taserz Posted May 30, 2018 Author Share Posted May 30, 2018 1 minute ago, johnnie.black said: DIMM C1 need to be replaced, this might be unrelated to your speed issues, but it needs to be replaced anyway. Do you think that could possibly be why it was causing the machine to randomly hangup and be extremely slow at times. Maybe when it was in use? If I just pull that stick of ram and my problem persists do you have any thoughts on what I should be looking for to resolve or identify the problem? Link to comment
JorgeB Posted May 30, 2018 Share Posted May 30, 2018 Dificult to say for sure, you'll need to replace that DIMM and post again if the issues persist. Link to comment
taserz Posted May 30, 2018 Author Share Posted May 30, 2018 I just swapped Ram slot DIMMC1 and the on next to it and started unraid with dockers and everything on to see if it follows the possible bad stick or if it is the ram slot on the motherboard. I started the dockers to ensure the ram sticks would be used. Looks like it is following the bad stick! I am going to pull it and make sure it runs stable will report on my findings. Link to comment
taserz Posted May 30, 2018 Author Share Posted May 30, 2018 What are the odds of having another bad stick of ram? sel2.txt Link to comment
JorgeB Posted May 30, 2018 Share Posted May 30, 2018 or 2: 439,System Event,2018/05/30 12:59:53 Wed,Memory,#0x00,Assertion: Memory| Event = Correctable ECC@DIMMG1(CPU2) 440,System Event,2018/05/30 12:59:54 Wed,Memory,#0x00,Assertion: Memory| Event = Correctable ECC@DIMMG2(CPU2) Link to comment
taserz Posted May 30, 2018 Author Share Posted May 30, 2018 Ohhh truck... I just pulled Those and rearranged the ram for cpu 2 so the blue slots are loaded so it would boot. third times the charm.... right? Link to comment
taserz Posted May 30, 2018 Author Share Posted May 30, 2018 Okay this is just really weird... After pulling those other 2 I checked and take a look at this... 10,System Event,2018/05/30 15:29:05 Wed,Memory,#0x00,Assertion: Memory| Event = Correctable ECC@DIMMB1(CPU1) Another 1???? Does this make any sense or could it honestly be shit ram? sel3.txt Link to comment
JorgeB Posted May 30, 2018 Share Posted May 30, 2018 It might be a board problem, try with the minimum amount of RAM possible, and with different stick(s). Link to comment
taserz Posted May 30, 2018 Author Share Posted May 30, 2018 Motherboard layout to see where all the ram being bad are on the boards. Is it odd 3 times it has been one of the blue ports? https://www.supermicro.com/manuals/motherboard/C606_602/MNL-1258.pdf Page 12 or pic Link to comment
taserz Posted May 30, 2018 Author Share Posted May 30, 2018 Also on a side note... I just realized my 500gb ssd is showing up in unraid at 1tb? Is there a way to fix this I am not sure what unraid will try to do once it gets to 500gb. Link to comment
JorgeB Posted May 30, 2018 Share Posted May 30, 2018 6 minutes ago, taserz said: Also on a side note... I just realized my 500gb ssd is showing up in unraid at 1tb? Is there a way to fix this I am not sure what unraid will try to do once it gets to 500gb. You have an unassigned SSD that is currently part of the cache pool, you can remove it with (with the array started): btrfs dev del /dev/sdb1 /mnt/cache Check it's still sbd as it was in the diags posted. Link to comment
taserz Posted May 30, 2018 Author Share Posted May 30, 2018 1 minute ago, johnnie.black said: btrfs dev del /dev/sdb1 /mnt/cache Yeah I was going to have it set as a raid 0 but I did not want a headache if it fails so I am going to way to get a third ssd so everything has a parity. I ran that command and it didn't seem the chance the number. I see sdb in my unassigned devices. Link to comment
JorgeB Posted May 30, 2018 Share Posted May 30, 2018 You need to wait for the data to be transferred, sdb had most of the pool data, the device will then be removed from the pool and a GUI refresh will show the new space usage. Link to comment
taserz Posted May 30, 2018 Author Share Posted May 30, 2018 Okay so run the command and terminal waiting for a long period of time is normal. Actually it is saying: ERROR: error removing device '/dev/sdb1': add/delete/balance/replace/resize operation in progress Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.