18 QUINTILLION writes to one drive, now it won't mount?


Recommended Posts

So, I noticed that one of my drives had literally over 18 quintillion (that is seven sets of 3 zeros - I had to look it up) writes to it.  No errors or anything until I tried to spin it up.  Here is a shot of the write numbers:

 

UnraidMajorWrites.thumb.JPG.b2d3fa5452ae84aed0da97c05c5a0a27.JPG

 

 

When I tried to get a look at the SMART data, the indicator in the Main view was green (indicating that the drive has spun up), but when I click on Attributes, UnRaid kept telling me that it couldn't give me the SMART data until the drive had spun up.  Here is a screenshot of the error log right after:

UnraidMajorWrites_LOG.JPG.9e2b8bf4a5815eccd079a1b679c213c0.JPG

 

 

I was seriously confused, so I rebooted the server (using the WebGUI) and when it rebooted, the drive was disabled, and I saw these errors:

UnraidMajorWrites_LOG_boot1.JPG.1c177ec9f6c10ae5a9d181e699086210.JPG

 

 

Just for giggles, I shut the server down, and moved the sata cable from my PCIe expansion card to the motherboard to see if it that was the problem.  No dice (the drive is still disabled):

UnraidMajorWrites_LOG_boot2.JPG.12a80b92e8ed0e222b70cb0689238380.JPG

 

I had another drive suddenly get disabled previously and I wasn't sure about what was going on, so I replaced it just recently. I wondering if this is a similar issue? This also showed up after I check the array after a parity check.  Coincidence?

 

Some more info: Server is an old AMD A8-7650K, watercooled but no overclocking.  8GB of RAM, 2 Parity drives, 9 data drives, 1 Cache drive, and an Ableconn 10 port SATA PCIe 2.0 2-lane adaptor.  It is plugged into the only PCIe 3.0 slot on an ASRock FM2A78M-ITX+ motherboard. I know it's not ideal hardware for a server, but this server only is on the network as a backup file server - I have four machines that back up to it once a week using SMB.  Most of it's life it spends spun down. Oh, and I'm running UnRaid Plus, 6.6.7.

How do I narrow down what is going on here and start figuring out what the issue is?  Can I try re-building the drive using the same HD, with it just plugged into one of the SATA ports on the Motherboard instead of on the expansion card?

Any help that might get me in the right direction is much appreciated!

 

Thank you,

 

Dave

Edited by EternalFootman
Link to comment

That number is exactly 2 to the 64th power, so it is almost certainly some sort of overflow situation and not actually the number of writes.

 

The drive will stay disabled until you rebuild it.

 

Go to Tools - Diagnostics and attach the complete diagnostics zip file to your next post.

Link to comment

I'm pretty sure that it is my SATA expansion card that is causing the problem - I originally had this setup in an old HP server and it failed somewhat similarly.  I thought that it might have something to do with the older PCIe technology, because I was having multiple UDMA CRC errors, but only on the drives attached to the PCIe expansion card. So I moved the whole setup into this AMD computer, and I haven't had any of those issues, but now I'm still having problems, I guess...?


I've attached the .ZIP of the complete diagnostics. I hope that it is something simple/easily fixed and not the SATA expansion card.

Thank you so much for your help!

 

w-server-diagnostics-20190420-2153.zip

Link to comment

Not related, but your system, appdata, and domains shares should be set cache-prefer, and all their content should be moved to cache. You should deal with that after you get your array stable again.

 

SMART for the disabled disk looks fine. Are there any SMART warnings for any of your other array disks showing in the Dashboard?

 

Unfortunately, the diagnostics were taken after a reboot, so there is no record of what happened to cause the disk to be disabled.

 

You can rebuild to the same disk or rebuild to a spare. The advantage of rebuilding to a spare, even though the original is good, is you can use the original as a backup to recover data if for some reason there is a problem with the rebuild. The disadvantage, of course, is you have to have a spare disk.

 

If your other disks are all healthy and you have valid parity there shouldn't be any problem rebuilding.

Link to comment

The rebuild went successfully - I have set appdata, system, and domains to cache prefer (why "prefer" as opposed to "Cache Only"?  Just in case the Cache drive fails?).  What is the best way to move the files?  Should I use the terminal or a plugin like UnBalance?  My first instinct is to use the terminal, but I just want to make sure - I'm fairly familiar with terminal, but not necessarily as familiar with UnRaid.

I will also go into the BIOS and change the two SATA ports to AHCI - but as far as not using a controller, that really isn't an option right now - I need all the drives and the space, so I need all 12 drives. I've read that a lot of people use expansion SATA controllers, so I'm a little confused as to the differing data that I am seeing on their usefulness?

As far as any other SMART warnings - No there are none.  About 3 of my drives have UDMA CRC errors that were happening when I had the whole unit plugged into another machine.  But since I moved the server out of that machine and into this one, I have not gotten any more SMART errors, specifically those UDMA CRC errors, which I believe were coming from a incompatibility  between that old HP server and the PCIe controller.  It may have even been because that server's BIOS was set to IDE as well?  I'm not sure. 

I'll change the BIOS immediately on this machine and then we'll see if it happens again.  Thanks for all your help!

 

Dave

Link to comment

If you turn on the Help in the GUI you will see the difference between Use Cache:Only and Use Cache:Prefer and how these settings affect mover.    Once you have the Use Cache settings correct you can use mover to put the files onto the correct drives.

Edited by itimpi
Link to comment
2 hours ago, EternalFootman said:

as far as not using a controller, that really isn't an option right now - I need all the drives and the space, so I need all 12 drives. I've read that a lot of people use expansion SATA controllers, so I'm a little confused as to the differing data that I am seeing on their usefulness?

He's talking about your specific controller, the way it is implemented. Instead of a single sata controller chip port being used for each sata connector, your specific adapter is using what's called a port multiplier, which "time shares" the controller across multiple physical ports to reduce the cost of the board.

You would need to get a different HBA card, preferably an LSI chipset based model. There are a bazillion threads on here about disk controllers and what works vs what doesn't.

 

Unfortunately almost all of the low cost high port count SATA cards have issues when you try to use them in a modern linux based server. Avoid marvell chipsets, and any with a SATA port multiplier.

Link to comment

Ahhh.  Thank you for explaining that.  Now I understand! I have one last question, I think.  I set my system share to "Prefer" and I invoked the mover (I didn't know the mover worked quite like that!).  All the other shares moved like they were supposed to, but the system share is still in two places, and there is a HUGE disparity in size. Any thoughts/ ideas as to what I should do? 

 

System_Locations.thumb.JPG.d5b5ab9e946a5015d10e7402f372f638.JPG

 

There is a docker.img and a different libvirt.img in Disk 8...  I do not have any Dockers (although I did previously, so I'm thinking this is an old file), and on the cache drive there is nothing in the Docker folder.

The docker.img file on Disk 8 has a date of April 20, so it didn't move the Docker? But I also don't have any Docker Apps, so that's kinda weird? 

The date of the libvirt.img file on disk 8 is yesterday, but there is a libvirt.img on the cache drive that has today's date. 

Should I just delete both of the system files that are on disk 8?

 

Edited by EternalFootman
Link to comment

Mover can't move open files. Even if you don't have any dockers, if the docker service is enabled, mover won't be able to move the docker image because it is being used by the docker service.. Disable the docker service at Settings - Docker and run mover again.

 

Same applies to the VM Manager. It must be disabled for libvirt.img to be moved.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.