Jump to content

Unraid crashes/freezes while copying Data


Recommended Posts

So, first of all: I'm new to this forum as well as to having an Unraid server, so please excuse any "nooby" mistakes or moments. 

To the issues: My newly build server keeps crashing while transferring, relatively, big chunks of data, ranging from 60 to 160GB. Sometimes it crashes in the middle of the transfer, sometime 2mb short of finishing it. I know, Syslog and Diagnostics are useful for this here, I'm working on it as I'm writing this.

The issue is just that the whole server locks up, won't respond or is accessible trough the WebGui, even the reset won't work only a hard reset, which is not my favorite by any means. Can it be the case that a single hdd can be the reason that the machine is locking up? I know that they are a hundred different reasons for that, but a bad cable, drive? Or how can the system completely lock up? 

I just need a little think help with all these possibilities of what this could cause, given that the system locked up 3 times in 15 hours. 

Btw: I´m also having trouble sending the mentioned drive into sleep mode, I just noticed, so I somewhat hope that it's the drive. Otherwise, I'm a little stumped with all of this

 

Greetings, Alex 

Link to comment

server-diagnostics-20210103-1106 

 

Sorry for the late reply, I had to get some sleep. Diagnostics downloaded this morning from the machine, I hope they are any helpful and also, the build in, SystemLog from this morning. Maybe someone can get something from that. 

If I have to or should tweak some settings regarding syslog file etc. pls tell me 

Edit: If it helps, I can attach the smart data read out from the disks, once I have them. 

SystemLog 03.01.2021, Build in.txt

Edited by deltaexray
Link to comment

Looks like something is playing up with the SATA controller/drives/cables :/ is this all new hardware? (mobo and cables)

 

I had a similar issue and ended up being a few things compounded causing the issue for me 

Jan  3 11:03:44 Server kernel: ata6: link is slow to respond, please be patient (ready=0)
Jan  3 11:03:44 Server kernel: ata4: link is slow to respond, please be patient (ready=0)
Jan  3 11:03:44 Server kernel: ata5: link is slow to respond, please be patient (ready=0)
Jan  3 11:03:44 Server kernel: ata4: COMRESET failed (errno=-16)
Jan  3 11:03:44 Server kernel: ata6: COMRESET failed (errno=-16)
Jan  3 11:03:44 Server kernel: ata5: COMRESET failed (errno=-16)
Jan  3 11:03:44 Server kernel: ata6: link is slow to respond, please be patient (ready=0)
Jan  3 11:03:44 Server kernel: ata5: link is slow to respond, please be patient (ready=0)
Jan  3 11:03:44 Server kernel: ata4: link is slow to respond, please be patient (ready=0)
Jan  3 11:03:44 Server kernel: ata6: COMRESET failed (errno=-16)
Jan  3 11:03:44 Server kernel: ata5: COMRESET failed (errno=-16)
Jan  3 11:03:44 Server kernel: ata4: COMRESET failed (errno=-16)
Jan  3 11:03:44 Server kernel: ata6: link is slow to respond, please be patient (ready=0)
Jan  3 11:03:44 Server kernel: ata4: link is slow to respond, please be patient (ready=0)
Jan  3 11:03:44 Server kernel: ata5: link is slow to respond, please be patient (ready=0)
Jan  3 11:03:44 Server kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan  3 11:03:44 Server kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT5._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
Jan  3 11:03:44 Server kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.SPT5._GTF, AE_NOT_FOUND (20180810/psparse-514)
Jan  3 11:03:44 Server kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT5._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
Jan  3 11:03:44 Server kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.SPT5._GTF, AE_NOT_FOUND (20180810/psparse-514)
Jan  3 11:03:44 Server kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Also not sure if for any reason but you 'should' try and use your SATA6G ports (would prob be good to have the parity drive at least on it) as it looks like all your drives are running on the Sata3G ports (based on P8H77-V the white ports should be the sata6 ones) Unless unraid slows them all down to the slowest drive speed maybe.

Edited by brent3000
Link to comment

It's not new hardware but hardware that runs for over 7 Years now, without any known issues. 

Thinking the same with the drive, the system just crashed again while trying to copy 200GB of movies, locking up everything in the process. I do think that the drive is dead, doesn't even show up in the BIOS properly only after a reboot while the other two work just fine, despite being all on the Sata3G, they all work. 

Ihm probably gonna put the Parity on Sata6G just for the sakes of doing and leaving the rest on Sata3G, while replacing the other drive

 

Ihm going to keep you posted 

Link to comment

Handy thing to test, create a share for each drive and just send data to that drive, process of elimination. If your using a drive which has the Cache mode enabled it should only ever be hitting the one drive before the mover kicks in,

 

If the drive isnt showing up in the BIOS thats clearly a red flag there, have you done a preclear on any of the drives at all or just a straight load?

 

I always do a drive write+read on it first to make sure the whole drive is ok (if the drives are new or within warranty its an easy way to rule out faults)

 

Other thing could possibly be the SATA cable, make sure its a SATA6G cable when you connect it to the system, also is the BIOS running in RAID at all? (dont think it would matter but I always turn it off if its not needed) 

 

edit: just checked you don't have a cache drive, so its going direct to the parity drive and the drive its writing data to, so always two drives being hit

Edited by brent3000
Link to comment

That's actually a really good idea, will try that now! 

 

I have, sort of, until I had to roll out the machine because of storage limitations, it was my mistake not to do a complete pre-clear there

 

Tested 4 cables, drive doesn't shop up regularly with all of them, BIOS runs in AHCI, RAID doesn't even work on this mobo;D 

Link to comment

I dont know what country your in but check if the Drive can be RMA'ed if its a new one, else if its not showing in the BIOS its possibly dead,

 

Based on the brands as well Seagate did have a FW issue some time ago and one of my drives dropped off from the system and i did a FW update to it and it was back good as new, maybe check on the drive if any FW else RMA time hahah

Link to comment
4 minutes ago, deltaexray said:

has a sound to it, like you would crush metal on metal.

Yes, thats an RMA, its kinda not normal for that sound...

 

I would do that first and prob in the intrum remove it from the array so you can at least use the other systems and just add it back in when its all normal again hahah

 

But yes Sata6G for the parity and maybe look into getting a new Sata card, cuz thats gonna impact your parity checks and writes etc long term (unless your not going for speed) else a cache drive would also be a handy addon for speed hahah

Link to comment

I do think so too. 

That's the plan, moving the existing data from the drive to the other, given that the other one is healthy and then just add the new one back in. 

 

Not going for speed, system doesn't even has a cache drive, just 32GB RAM and a bunch of HDDs, only running Gigabit Networking so no need for cache, works just fine if it works;D  

 

We will see, I'm going to find out if the drive is dead first now 

Link to comment

Quick update: 

System crashes both times while transferring 200GB of Data, doesn't matter if the share is only on Disk 2 oder Disk 3. Disk 2 is the one which made noises and doesn't show up in the bios after a restart but now it also crashes while only copying onto Disk 3? What the hell? 

On the Windows PC it says: Networked timeout, cannot copy File to Network disk, which makes sense due to the fact that Unraid locks up. 

 

 

I´m just a little confused now 

Edited by deltaexray
Link to comment

As you dont have a cache drive then yes, the parity drive will always be written too as i mentioned above 

11 hours ago, brent3000 said:

edit: just checked you don't have a cache drive, so its going direct to the parity drive and the drive its writing data to, so always two drives being hit

If your parity drive is the faulty one two things, replace the drive first then let it re-build or (if there is no data on it worth keeping) rebuilt it and use another drive for testing.

 

But end of the day, dead drive needs to go else its going to have issues when being written to, if its the parity drive, in your current setup, its getting hit every time you write anything to the array.

  

9 hours ago, deltaexray said:

I am stupid

Dont worry, new systems even the most tech switched on can let the most simple things slip, and Unraid is anything but how, all a fun learning curve hahah 

Edited by brent3000
Link to comment

So based on how I progress with this issue, I'm guessing this is going to be the last post regarding the topic because I start to turn insane. 

 

Turns out, on my main PC all of the drives show up and can be formatted in Windows, regardless of what I do. 

On the Intel System, or to be precise the P8H77-V, 2 Drives are showing up, the third only when I do a restart. So drives aren't the issue, at least at this point, I boot into Unraid and start the array, start copying files and it locks up and random points and at the same points with the same transfers, u can recreate everything. I'm at this point, completely out of ideas with what could work or not except buying a complete new Mobo, which with this CPU and Socket is "new" impossible at this date and time. 

I have the feeling that one drive, I don't know because of the drive or the SATA Controller of the Mobo, drops out while being in operation of Unraid, specially given that the drive that doesn't naturally shows up in the BIOS is the Parity drive of the Array. 

So, yeah. A days worth of troubleshooting and here I am. Maybe someone else has an Idea. 

 

 

Edited by deltaexray
Link to comment

@brent3000

 

I thought so and it makes sense. I have the feeling the parity may causes all of this but as mentioned before, I do not know if it's the Mobo or the Drive, given that the drive shows up in another system. 

 

I may be doing a complete array rebuild tomorrow. But it's always the same drive that is involved in every crash cause it's the parity drive. I'm just not sure if it's the drive or the mobo, cause it's shows up on a different one but not on the one in the server on the first boot, only on the second, yet it's the parity and therefore always involved in a read/write operation. So I am kinda thinking that it's the drive not the mobo. But Idk. I'm just confused. 

 

 

Link to comment

THings to look at,

 

Are the drives getting enough power? (the scratching is a worry and not normal but maybe if they are on the same power rail split it over two power rails?)

Sata port issues, try a sata card vis PCI or a raid card to remove the onboard sata issue

 

If you think its a mobo issue its very rare it would power at all, its likely the sata controller if HDD is not working but all else is fine

 

The fact it makes a scratching sound in one PC vs another is kinda strange, unless a driver issue, but again it does turn on :/ 

 

Might be worth getting Seatools and running a check on the drive in a bios setting see if it shows or comes up with any issues, 

Link to comment

I am running a full extended Smart Test over night on all the drive, see if that will help or give anything new. 

 

Thought so myself, I can change it but they are only 3 drives each with 9 watts of power consumption on a Corsair RM650M on one Sata power connector/rail (the cable where 4 connections are on one rail), with no issues at all in the past but I can try it out tomorrow. 

 

If anyone has a go to HBA/Raidcard in IT Mode recommendation, I'll get one. besides I thought about this before building this machine to avoid the internal controller entirely but did not want to spend another 100€, at least, extra on it. 

 

It definitely makes a sound, I just do not know if it's a bad one or not. In the other PC, it shows up with the same noise. I got a 6TB Ironwolf in my main rig that also makes noises, which is normal at 7200rpm, and works just fine. 

 

As said, I'm running the Smart extended test within Unraid over Night. I tried the Seagate tool before today, it's actually just the same as what Unraid does so I skipped it. 

 

 

The funny thing is just, it all started with locking up while making a data transfer and now we are here talking about if the drives dead/damaged or the board is the issue. Try me but if the drive is dead, would it not cause issues all the time? 
And just to clarify, I am tech guy for over 10 years now so I kinda know a lot but still I'm new to Unraid:D 

Edited by deltaexray
Link to comment
29 minutes ago, deltaexray said:

Try me but if the drive is dead, would it not cause issues all the time? 

Depending on the issue, some drives until it tried to write or use a part of the drive it could be all but normal, that's why something like a preclear is handy, writes to every possible part of the drive to find bad blocks which may not come up till some time later

29 minutes ago, deltaexray said:

And just to clarify, I am tech guy for over 10 years now so I kinda know a lot but still I'm new to Unraid:D 

same here :D I have learnt sooo much about unraid and even linux/docker in the last few months while i play with this system hahah

 

33 minutes ago, deltaexray said:

It definitely makes a sound, I just do not know if it's a bad one or not.

Well if you have three of them and only one is making the sound, good indication something is different :) 

 

If you switch the drive/sata with one of your other units and the sound goes with it, deff not a cable issue. 

 

Possible option before you fully rebuild is running all drives just as un-assigned and then transfer to each drive before you formally build the array maybe? you have alot of ports to test it with (only 2 Sata6) but if its a controller issue then only option is a card to replace it, there is a hardware recommendation form might be handy to post in, i know alot of people using a mix but tbh i have only ever used the onboard ones so I'm no help to you there, 

Link to comment

1. I guess that's why it's called troubleshooting. I am guessing I am gonna pre clear that drive, If I can't use the server I might as well do a pre clear with that drive. 

 

2. I know that feeling very very well. All of my networking, server, Unraid and so on knowledge is from learning by doing and YouTube/forums. 

 

3. I might as well gonna put it in my main rig and let it start with the system to figure it out. Swapped cables like 4 times now, they all work. PSU works. it can only be a drive or Sata controller issue. 

 

I can do that but I can't write to the parity disk, bc of format issues, so I can't test it. On the other hand, if the other two work without the parity drive/do not spit out error while copying, it's clear what it is I've tested all of the ports on the board, they all work. Only the parity drive is the one that doesn't show up while booting, regardless of the port or cable, at least with the first boot, after a restart it's there. 

I will look at it, cannot hurt to have one on hand in order to do more/better stuff with it. Usually the onboard ones are enough but in this case? Hell I do not know. 

Link to comment

So update: 

 

I have let the extended SMART Test run over night on all drives, with no error reporting on all of them. So the drives are out of play at this time. 
Then, I looked at the BIOS and the SATA Configuration. Noticed that 2 drives show up but the third, Parity, not. So I swapped one of the drives that show up with the one that doesn't and? It showed up, the other that usually shows up doesn't. Did the same with the third, the same happened. So I figured it's not a drives issues, it's a Motherboard Sata Controller issue. When I connect 3 1TB drives, all of them show up. 3 8TB drives, only 2 show up. 3 8TB Drives and an old 120GB SSD, only 2 8TB show up. It's the board, nothing else makes sense. Everything else is checked. 

 

So what I did now? Bought a "Broadcom 9207-8i SAS2308 6G SATA SAS HBA PCIe x8 3.0 LSI RAID IT Dell 0VGXKD" on Ebay, should arrive on Wednesday and the Cables are from a good retail seller, brand new. 

We will see how that goes and if it works, if not I'm honestly completely out of ideas. If it does, I guess that's why it's called Troubleshooting. 

I just hope that the HBA won't interfere with the sleep mode setting or with the drive spin down / sleep mode and the spin up respectively because that would cause a lot of other issues. Well, I just hope this works sigh

Link to comment
2 hours ago, deltaexray said:

I guess that's why it's called Troubleshooting

Worst part about tech is the troubleshooting, so many things which could be wrong hahaha

 

Your board does have a funny split but its also super old (being the Sata 3g ports on it and looks to be a split Sata controller too

 

GL on the new card keep us posted :) 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...