Jump to content

All drives 100%, CRC errors - oh my!


TyantA

Recommended Posts

  • I knew my server was getting full. 
  • A few weeks back I hit 100% on a few disks. I made notes that disks 5 & 7 suddenly had CRC errors. 
  • Every parity check since then (it runs every week) has shown 0 errors. 
  • This past weekend I ordered several new drives (2x 8TB, 2x 4TB) 
  • Previously, my largest drive was 6TB. 

 

Today I go to check the status of my drives and on the Dashboard now drives 7 & 8 are the ones with exclamation points. Green thumbs up for the rest. Weird. 

  • 7 says: Error count 878
  • 8 says: Reported uncorrect: 1

 

I don't have this data backed up elsewhere. Part of getting the new drives was to free up drives for a 2nd backup build which is otherwise ready to come together. 

 

  1. I think I'm leaving the 6 > 8TB parity upgrade for last until I know my data is safe. 
  2. I don't think I have enough *good* drives lying around to build my backup server, back everything up, then proceed. However, if that's the best way forward I can try and make that happen. 
  3. Given the no backup situation, I'm hesitant to run a full smart diagnostic on disk 5, 7 or 8. Should I be? 
  4. Finally, is it possible that the CRC errors are happening because there is literally no room left on some of these drives? Disk 9 has 549KB for example 

 

I'm running 6.5.3 currently. I have attached my diagnostics. 

unraid-diagnostics-20181129-2003.zip

Link to comment

No sooner did I post this and disk 5 is back to showing UDMA CRC error count: 16. I don't believe this number has changed. 

 

I just dug up the screenshot I took when I first was notified about the errors. 

 

Disk 5: 

  • 21.10.2018 22:07 - udma crc error count is 12
  • 21.10.2018 22:13 - udma crc error count is 16 ... where it stays until today. This disk currently has 19.5GB free; shows 100% full. I don't think anything has been written to it since this time.

Disk 7: 

  • 21.10.2018 03:42 - udma crc error count is 11
  • 21.10.2018 22:02 - udma crc error count is 178

  • 21.10.2018 03:07 - udma crc error count is 878 ... where it stays until today. This disk currently has 1.21GB free; shows 100% full. 

The date / time lines up with my scheduled parity check. 

 

I guess the question is, what should the order of operations be? Does it matter? 

 

My original plan was to: 

  1. Put my new Toshiba X300 8TB drive in place of the 6TB which is now about a year old.
  2. Replace the smallest media disks (6 & 9 which are 2TB) with the former 6TB parity & other new 8TB, respectively.
  3. Replace the 2TB disk 4 data drive with a 4TB Red and then one of the other 3TB data disks with the other. 

But that plan doesn't address disk 5 (which I'm hoping is "fine" if it's given a bit of breathing room), disk 7 which seems to be fairing the worst and disk 8 - is that "uncorrect 1" cause for concern? 

 

I guess I could always put that last new 4TB Red in place of disk 7 if need be. 

Link to comment

First off, parity is not a substitute for backups. You don't have to backup everything but you must have another copy of anything important and irreplaceable on another system. Maybe you do, but if not, this should have been your practice long before even thinking you might like to have a backup server.

 

Your diagnostics say 6.5.3, not 6.3.5 as you stated.

 

CRC errors are typically caused by connection problems, not drive issues. You can acknowledge the warning by clicking on it and it will warn you again if it increases.

 

Your complete lack of any free space seems your most critical issue and I would quit writing to the server until this is alleviated.

 

I didn't look at SMART for all your disks, just those you mentioned, and they don't look like they are about to fail. Are you getting warnings for any of the others?

 

Your replacement plan looks good to me. Replacing the smallest disks is a good way to get the most added capacity.

Link to comment

Haha, how did I know I'd be in for a scolding :)I "know" parity is no substitute but now I've finally opened my wallet and acted on that. Sure I have my most important data strewn across several OneDrive accounts, multiple systems, external drives, etc. but my goal is to take a much more systematic and thorough approach, hopefully, pre 2019.  

 

Re: version, corrected. I was going by memory and have oft wondered about dyslexia! 

 

Re: CRC errors, that's the impression I've been getting, and hoping is true.

 

Re: free space, yep, that's why I picked up over 26TB of storage to throw at the problem. It's not all for the server, but it's all to help put the "systematic and thorough storage plan" in place. 

 

No warnings for others. I'll accept the warnings and proceed with caution. I'm in the process of setting up the old hardware that will be the basis for my backup server, open testbed-style, with the goal of running pre-clear(s) there. 

 

Is there any reason NOT to run more than one pre-clear at a time? I'd love to have all 4 new drives pre-clearing at once to save time. I think there's an old Athlon X2 6000+ in there at the moment (which I'll swap out for a lower wattage chip). 

 

Thanks for your reply!

Link to comment
2 minutes ago, TyantA said:

I'm in the process of setting up the old hardware that will be the basis for my backup server, open testbed-style, with the goal of running pre-clear(s) there. 

 

Is there any reason NOT to run more than one pre-clear at a time? I'd love to have all 4 new drives pre-clearing at once to save time. I think there's an old Athlon X2 6000+ in there at the moment (which I'll swap out for a lower wattage chip). 

I would say try it and see since this system doesn't have any of your data on it.

Link to comment

I'm kinda stumped. 

 

Got drives hooked up, grabbed a copy of latest unraid, used the installer to create a key, fired up no problem, got the bios to boot from USB but I'm stuck with a Kernel panic. 

 

"please append a correct "root=" boot option"

 

I feel like I've run into this before but I'm drawing a blank and search isn't proving too helpful at this hour. I came up with this thread which basically says "start over". I feel like there's a simple config setting I can change for compatibility with this Asus M3N78 Pro board.

 

Edit: the problem I was thinking about was not the same, although it too was on older hardware. (In that case I added append initrd=/bzroot noapic which got me up and running - this thread from another build).  

Link to comment

A word of caution...  I believe that memory is one preclear requirement that some folks don't think about.  I am not sure what the requirement is per HD but is not insignificant!    It sound like you are using an old system so be aware that you could have an issue there.   Also make sure you have adequate air flow over your HD's if you have the case side off.  One of two drives should not be an issue but when you start talking four or more,  you need to think about heat problems and air flow. 

Link to comment
1 minute ago, Frank1940 said:

A word of caution...  I believe that memory is one preclear requirement that some folks don't think about.  I am not sure what the requirement is per HD but is not insignificant!    It sound like you are using an old system so be aware that you could have an issue there.   Also make sure you have adequate air flow over your HD's if you have the case side off.  One of two drives should not be an issue but when you start talking four or more,  you need to think about heat problems and air flow. 

Good points. There's only going to be 1GB of ram in this backup server but I could easily make that 4-8GB for the duration. And yes, good reminder - I should bring down my fan to keep the drives happy.... once I can get the system to boot! 

Link to comment
4 minutes ago, trurl said:

I would go with at least 4GB for Unraid v6 even if it is only going to be a NAS. As soon as you have a problem with less than that increasing RAM is going to be a first recommendation.

Interesting. It's only purpose will be to be fired up once a month or so to copy vital data from the main server. I had 4x 256MB sticks I was planning to use (because where else will they be used!?) but I have 4x 512MB sticks I could potentially stick in there instead. 

 

Of course, I could pick up some other old memory if 4GB is the sweet spot even for this minimal use case. 

 

So on my testbed, I temporarily swapped out the 1GB for 8GB of ram and stuck another USB key in that I had also set up. Now the battery on the motherboard is dead so I went through and changed only the necessary settings this time like SATA set to ACPI etc. I chose GUI mode and now I'm staring at the Unraid logo asking for a username. 

 

I don't know if it was the ram, the bios settings or the key... but I guess this is progress! Since the night is wearing on and my only goal is to get this preclear going, I'll take it.

Link to comment
4 hours ago, TyantA said:

"please append a correct "root=" boot option"

Try this, it worked for me for an identical error, on the flash drive edit syslinux/syslinux.cfg and add root=sda to the boot option you're using after initrd=/bzroot, e.g.:

 

label Unraid OS
   menu default
   kernel /bzimage
   append initrd=/bzroot root=sda

 

Link to comment
6 hours ago, johnnie.black said:

Try this, it worked for me for an identical error, on the flash drive edit syslinux/syslinux.cfg and add root=sda to the boot option you're using after initrd=/bzroot, e.g.:

 


label Unraid OS
   menu default
   kernel /bzimage
   append initrd=/bzroot root=sda

 

Thanks johnnie! Once these drives are precleared, I'll pop the other USB key back in with that change and see if that helps. 

Link to comment

So that old system is currently pre-clearing 3 drives which are at the zeroing stage. (well, the 2 4TB are, the 8TB is close). 

 

I started the one preclear and waiting to see the read speeds. They were around ~180MB/s on a WD Red 4TB. After about 10 minutes I fired up the second. Immediately I noticed the CPU fan kick up a fair bit. I flipped over to the dashboard and noticed pretty much 100% CPU usage. To think, that used to be a flagship CPU... 

 

The two WD Reds stayed around the 175MB/s mark. The Toshiba started off around 225MB/s and this morning is still going around 210MB/s. 

 

The two WD Reds are currently zeroing at 180MB/s. So all in all, while the CPU is still pegged (I have a household fan going on the whole open setup) it - and the preclear plugin - are handling the task admirably. 

 

Oh, a note on memory usage: when the first preclear started, it was using 24% of the memory. This morning it's up to 29%. After seeing this, I'm leaning more towards using my 2GB setup when I finish the build for that little extra headroom - just in case I use this system for future pre-clears.

Link to comment

By the way, 2GB is now the minimum amount of RAM for any Unraid system.  Be careful if you have a system with anything but a pure base NAS with only that much RAM!  The first sign of a problem usually occurs when you try to update the Unraid versions using the GUI.  (The entire operation occurs on the RAM disk-- not a physical storage device like a flash drive or hard drive.) 

Link to comment
1 minute ago, Frank1940 said:

By the way, 2GB is now the minimum amount of RAM for any Unraid system.  Be careful if you have a system with anything but a pure base NAS with only that much RAM!  The first sign of a problem usually occurs when you try to update the Unraid versions using the GUI.  (The entire operation occurs on the RAM disk-- not a physical storage device like a flash drive or hard drive.) 

Didn't realize that. Used to the good ol' days of 4.x where 1GB was more than enough! 2GB it will be then, for this barebones setup. 

 

Curious, the smart data doesn't seem to show. There are no drive temps, and before I started the pre-clear, I tried to run a short self-test and the button didn't seem to do anything. No data displayed. I guess that's not really going to be helpful in identifying whether a drive is having issues or not... unless it does in fact show in the logs. I guess I can check there.

Link to comment

So preclear is all done. Unfortunately with no smart details I can't be sure the drives haven't exhibited any signs of failure. I'm tempted to throw caution to the wind and move ahead with these drives anyway vs. starting again. (It's possible I forgot to enable SMART in the bios). 

 

The next thought i had was my PERCH310 - I don't even know if the thing will recognize < 4TB drives? I haven't been able to dig up any affirmative answer. Worst case, I keep the smaller drives on it but it would be nice to know for sure. 

Link to comment
18 minutes ago, TyantA said:

The next thought i had was my PERCH310 - I don't even know if the thing will recognize < 4TB drives?

It works with all drive capacities. A lot of people use them in Unraid, especially now that the SASLP-MV8 and SAS2LP-MV8 can't be used reliably.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...