Jump to content

Unraid n00b having disk problems on first build


Recommended Posts

Hi Guys,

 

Total Unraid n00b here having some unexpected issues with a new server build.

 

 

The Back Story

 

I have built a new server following https://blog.briancmoses.com/2020/11/diy-nas-2020-edition.html?r=current-nas. I get that this is overkill for a NAS but I want a machine to manage my content and provide me a play-pen environment for docker and VM hosting.

 

I think I was probably too keen to move from hardware build to software. I used the Memtest 86 to give the hardware a burn in. This failed a couple of times, with the server shutting down, but on the third attempt (and a couple of subsequent attempts) completed the 4 pass cycle without error.

 

At this point I moved on to setting up unraid. I have 5 brand new 12TB Iron Wolf drives that I  assigned as parity and data disks 1-4. I have an older 12TB SAS drive that is unassigned as a spare. Two 240 Gb NVME paired for a cache drive. So far so good, or at least it was until the box fell over when creating the first parity run. I restarted it and 12+ hours later it successfully completed.

 

My content currently lives on an old Drobo ISCSI SAN and I am keen to get it on Unraid and retire the Drobo. I setup shares, and used a Windows PC to sync data to the Unraid share using "MS sync toy" app. After several hours the new server fell over again, seemingly a hardware shutdown rather than anything software related.

 

I've bought a copy of PassMark BurnInTest, created a bootable USB and let it loose testing CPU, RAM and DISK, and sure enough stressing it caused the box to do what I assume is a thermal shut down. Upon inspection of the BIOS defaults CPU and chassis fan setting were on "standard", which limited the fan to a maximum of 50% full speed. Changing this to "performance" for all fans, increased the noise, but meant the fans were running in the 50% to 100% range, depending on temp.  With these changes the burn in tests successfully completed with CPU, Mem and Disk stressed to 90%. Cranking beyond 90% for a 10 hours or so the test runs does cause a shutdown, but I can live with this as its never going to hit that level of sustained stress. As I figure I've solved the shutdown problem, I swap of USB drive a fire up Unraid.

 

At this point it wants to re-calc parity and I figure it can do this whilst I continue to upload content to it. This all seems to be working fine, albeit slower than I'd like, but I think that more to do with source than destination.

 

The issue.

 

So I am occasionally monitoring the progress of the data upload to my new unraid server and I notice that the first data disk (the only one being written to as I've not hit the first switching point on my high water mark share), has got a red cross against it and is being emulated. There's a whole bunch of "Raw read error rate" and "Seek error rate" errors accumulated - big numbers.

 

I have read that this is possibly due to connectivity path between the disk and the motherboard, rather than a disk fault so I have today un-plugged / re-plugged, the HBA card, the power and data cables and the disks themselves. Restarted but still have the issue.

 

I think I probably have a couple of choices to get back to a happy place and restart the data migration.

 

  1. Replace the data drive in the array with my spare 12TB SAS drive.
  2. Remove the drive and shrink the array

 

I can do either as I don't need the capacity offer up front. I think option 2 would be my preference as it option one introduces an older SAS drive, where all other drives as new SATA drives.

 

I attach the diags

 

Currently all my content is safe as its been a copy not a move from the Drobo. I could if necessary blat the box and start again and all it will have lost me is the time taken to re-copy the circa 6.5Tb I've already migrated.

 

Any thoughts or guidance from the collective mind out there would be greatly appreciated. 

 

TIA guys ☺️

 

 

  

unraid01-diagnostics-20210303-1232.zip

Edited by JPanda
typos
Link to comment
1 minute ago, JorgeB said:

Diags are after rebooting, but the disk looks fine, likely a connection/power problem.

 

Those are normal for Seagate drives.

Thanks for the info, but disk 1 still has a red cross, so I think I still need to drop it out of the array to get back to healthy, unless there's a better option I've missed.  I attach some diags from yesterday, before I shut the machine down to reseat components.

 

Thanks again.

unraid01-diagnostics-20210302-1631.zip

Link to comment

a disk gets disabled (Red Cross) if a write to it fails for any reason.   The write failure can be a genuine disk issue or a transient  issue such as cabling or a power glitch.  The only way you get the disabled state is to rebuild the disk (to either itself or a spare drive). The steps to follow to achieve are covered here in the online documentation accessible via the ‘Manual’ link at the bottom of the unRaid GUI.

 

in your case since you have a spare drive you can go either route.    I would be tempted to use the spare drive as this would be good practice for what to do if you have a future drive failure while keeping the contents of the current ‘disabled’ drive intact just in case any problem arises building onto the older drive.   If the rebuild finishes with no problem you can then put this drive through some tests using either the preclear plugin/docker or the manufacturers test software.   That would give you a good indication if it was a genuine disk problem that caused the disk to be disabled so it is a RMA candidate or a transient glitch so you just keep the drive as the new spare.

  • Like 1
Link to comment

That's great - thanks for the info.

 

I've pre-cleared the spare, just to try to get some confidence that it will be good. It failed claiming it had an issue with the MBR and the disk also disappeared from the web UI, although it magically came back after box was shut down and restarted to allow me to re-plug all the components. I was then able to format it without issue. 

 

I'll follow your steer and use this drive to replace the disabled drive, and rebuild the array as per the documented process.

 

Thanks again 😊 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...