Several disks got read errors after rebuilding one data disk (SOLVED)


Recommended Posts

Hi!

I have a Unraid set up consisting of 1 parity disk, 1 cache disk and 12 data disks.

One of the data disks (4 TB WD Red) had read errors and I decided to remove it and add a new disk (6 TB HGST) and rebuild.

Everything looked good for 10 hours but then I started to get error messages that several disk had read errors.

When the rebuild was done I had 4 disks with read errors and when I got home from work one of them (8 TB WD Red) was disabled and content emulated.

 

I have used Unraid for several years, changed hardware several times and changed disks before without problems.

 

What do I do know? Any suggestions would be appreciated

tower-diagnostics-20190415-1818.zip

Unraid disk errors_small.PNG

Link to comment

Problem with the HBA:

Apr 15 11:02:59 Tower kernel: mpt2sas_cm1: SAS host is non-operational !!!!

Make sure it's well seated and sufficiently cooled, also update firmware to latest, you're on p15, current one is p20.00.07.00, then you'll need to force a rebuild of disk7, since current one will be corrupt, you can do that with the invalid slot command, assuming rest of the disks are OK.

Link to comment

Thanks a lot.

I forgot to close the side of the chassis so it probably was overheated while doing the rebuild.

So the safe thing would be

1. Shutdown unraid

2. Make sure it's cooled down

3. Make sure controller and cables are well seated

3. update firmware

Reboot and ask IT Gods for mercy?

 

Link to comment

I didn't find a good guide to upgrade firmware (after a bit more looking I did, but still) and I don't feel confident enough to do it (I know I suck at this). So I decided to cool down the HBA and then start it again.

Attached is the new diagnostics after reboot. To my limited knowledge SMART seems fine. No read errors or allocated sectors.

Disk 2 is still disabled though. How would I go about to make it online again?

And how should I force rebuild of disk7. Would the guide I found here work? 

 

 

tower-diagnostics-20190415-2130.zip

Edited by ChristerB
Added SMART info
Link to comment
11 hours ago, ChristerB said:

To my limited knowledge SMART seems fine.

It does.

 

To force a rebuild of disk7 see below, but it might be a good idea to use a new spare disk, though the previous rebuild is a little (or a lot) corrupt it did ran for several hours before the errors, so some data will be fine in case something else goes wrong with this try.

 

-Tools -> New Config -> Retain current configuration: All -> Apply
-Assign any missing disk(s) including a new disk7 (or use the old one if you want)
-Important - After checking the assignments leave the browser on that page, the "Main" page.

-Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters):

mdcmd set invalidslot 7 29

-Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk7 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check

 

11 hours ago, ChristerB said:

I didn't find a good guide to upgrade firmware

It's very simple, just boot with a DOS flash drive, or use a Windows PC, and download the firmware package, then to update:

 

using DOS

sas2flsh.exe -o -f firmware.bin

using Windows

sas2flash.exe -o -f firmware.bin

 

  • Like 1
Link to comment

@johnnie.black You are truly a Hero. Just got home and my Unraid is back to normal.

Received the confirmation email 45 min ago. 

Event: Unraid Parity sync / Data rebuild
Subject: Notice [TOWER] - Parity sync / Data rebuild finished (0 errors)
Description: Duration: 21 hours, 40 seconds. Average speed: 105,8 MB/s
Importance: normal

 

Thank you for your help and support in this troubled time.

I am forever thankful to you.

Link to comment
  • ChristerB changed the title to Several disks got read errors after rebuilding one data disk (SOLVED)

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.