ChristerB Posted April 15, 2019 Share Posted April 15, 2019 Hi! I have a Unraid set up consisting of 1 parity disk, 1 cache disk and 12 data disks. One of the data disks (4 TB WD Red) had read errors and I decided to remove it and add a new disk (6 TB HGST) and rebuild. Everything looked good for 10 hours but then I started to get error messages that several disk had read errors. When the rebuild was done I had 4 disks with read errors and when I got home from work one of them (8 TB WD Red) was disabled and content emulated. I have used Unraid for several years, changed hardware several times and changed disks before without problems. What do I do know? Any suggestions would be appreciated tower-diagnostics-20190415-1818.zip Quote Link to comment
trurl Posted April 15, 2019 Share Posted April 15, 2019 Are disks 1,2,3,11 on the same controller? Quote Link to comment
ChristerB Posted April 15, 2019 Author Share Posted April 15, 2019 I can't really say, I need to shut it down to be able to see the disks. Of course, it could be one of the ports on one of the controllers. Is it safe to shut down the server? Quote Link to comment
JorgeB Posted April 15, 2019 Share Posted April 15, 2019 Problem with the HBA: Apr 15 11:02:59 Tower kernel: mpt2sas_cm1: SAS host is non-operational !!!! Make sure it's well seated and sufficiently cooled, also update firmware to latest, you're on p15, current one is p20.00.07.00, then you'll need to force a rebuild of disk7, since current one will be corrupt, you can do that with the invalid slot command, assuming rest of the disks are OK. Quote Link to comment
ChristerB Posted April 15, 2019 Author Share Posted April 15, 2019 Thanks a lot. I forgot to close the side of the chassis so it probably was overheated while doing the rebuild. So the safe thing would be 1. Shutdown unraid 2. Make sure it's cooled down 3. Make sure controller and cables are well seated 3. update firmware Reboot and ask IT Gods for mercy? Quote Link to comment
JorgeB Posted April 15, 2019 Share Posted April 15, 2019 After rebooting will need to force disk7 rebuild (and online disk2), but post new diags first so we can check SMART for the dropped disks. Quote Link to comment
ChristerB Posted April 15, 2019 Author Share Posted April 15, 2019 (edited) I didn't find a good guide to upgrade firmware (after a bit more looking I did, but still) and I don't feel confident enough to do it (I know I suck at this). So I decided to cool down the HBA and then start it again. Attached is the new diagnostics after reboot. To my limited knowledge SMART seems fine. No read errors or allocated sectors. Disk 2 is still disabled though. How would I go about to make it online again? And how should I force rebuild of disk7. Would the guide I found here work? tower-diagnostics-20190415-2130.zip Edited April 16, 2019 by ChristerB Added SMART info Quote Link to comment
JorgeB Posted April 16, 2019 Share Posted April 16, 2019 11 hours ago, ChristerB said: To my limited knowledge SMART seems fine. It does. To force a rebuild of disk7 see below, but it might be a good idea to use a new spare disk, though the previous rebuild is a little (or a lot) corrupt it did ran for several hours before the errors, so some data will be fine in case something else goes wrong with this try. -Tools -> New Config -> Retain current configuration: All -> Apply -Assign any missing disk(s) including a new disk7 (or use the old one if you want) -Important - After checking the assignments leave the browser on that page, the "Main" page. -Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters): mdcmd set invalidslot 7 29 -Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk7 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check 11 hours ago, ChristerB said: I didn't find a good guide to upgrade firmware It's very simple, just boot with a DOS flash drive, or use a Windows PC, and download the firmware package, then to update: using DOS sas2flsh.exe -o -f firmware.bin using Windows sas2flash.exe -o -f firmware.bin 1 Quote Link to comment
ChristerB Posted April 16, 2019 Author Share Posted April 16, 2019 (edited) @johnnie.black, you are my HERO. Thanks for your fast and understandable answers. I will look in to this when I get home from work. Edited April 16, 2019 by ChristerB Quote Link to comment
ChristerB Posted April 16, 2019 Author Share Posted April 16, 2019 Stupid question. If all goes well, everything on the raid will be intact? I will go for a new disk7 as you suggest johnnie.black. I just precleared a 8TB disk intended for my raid. Quote Link to comment
JorgeB Posted April 16, 2019 Share Posted April 16, 2019 14 minutes ago, ChristerB said: If all goes well, everything on the raid will be intact? Yes, if all goes well, i.e., no errors during the rebuild, data on the array should be fine. Quote Link to comment
ChristerB Posted April 17, 2019 Author Share Posted April 17, 2019 @johnnie.black You are truly a Hero. Just got home and my Unraid is back to normal. Received the confirmation email 45 min ago. Event: Unraid Parity sync / Data rebuild Subject: Notice [TOWER] - Parity sync / Data rebuild finished (0 errors) Description: Duration: 21 hours, 40 seconds. Average speed: 105,8 MB/s Importance: normal Thank you for your help and support in this troubled time. I am forever thankful to you. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.