chopeta Posted October 31, 2013 Share Posted October 31, 2013 Hi guys, I have 1 disk with red ball + 1 disk missing in my UnRAID. It's the first time it happens to me (I have recovered other disks in the past when only 1 disk had a red ball) and I can't find similar cases in the forum so I am posting a new topic here hoping someone will be able to help me out. I am using 5.0 Final Pro and my setup include 17 disks (1 parity, 15 data and 1 cache). A month ago I started considering adding more drives and figured out it was prudent to replace my old PSU (Thermaltake 500W - non-single rail) with a better one, so I got a Corsair CX600. Right before I replaced the PSU, disk15 started giving problems - UnRAID was showing reading-errors but the drive was never marked as disabled (red ball). A SMART report was showing more than 600 "Reallocated_Sector_Ct" + Current_Pending_Sector and Offline_Uncorrectables but the system was still working fine without any red balls (please don't ask me why I didn't replaced this disk before). Today, disk13 shows up with a red ball but after running SMART everything seems to be OK with the disk (no serious errors beside 3 Command_Timeout). At this point and before taking further actions I've decided to replace the PSU (I tough disk13 could been suffering because of my previous PSU) Right now (with new PSU in place), disk15 is not even showing up on UnRAID interface (it says "disk missing") and I have no idea on how to proceed. I have the impression that disk13 is just fine and that I could replace/rebuild disk15 but since I now have disk13 disabled and disk15 missing I don't know what to do. Any idea on how should I proceed here? It's possible my theory of "trust my array" and rebuild disk15 from my other drives including disk13 that seems to be OK? If not, what can I do? Together with the new PSU I also got a AOC-SASLP-MV8 to replace my old PCI SATA card. I also have Seagate 2TB spare drive. Here's the link to my current syslog: https://drive.google.com/file/d/0B0mYGQLWndqzRnF6U1lNZEVxSVk/edit?usp=sharing Here's from early today (when disk15 was still showing up): https://drive.google.com/file/d/0B0mYGQLWndqzRUpEQmNUX0dON0U/edit?usp=sharing from yesterday: https://drive.google.com/file/d/0B0mYGQLWndqzTHhscDVsTXNsbkU/edit?usp=sharing and finally from 2 days ago (when there was no red balls or missing disks): https://drive.google.com/file/d/0B0mYGQLWndqzYUlwaUtvUVZZVUk/edit?usp=sharing SMART from disk13 also attached. Link to comment
Fireball3 Posted October 31, 2013 Share Posted October 31, 2013 Find out why disk 15 is missing! Check if disk 15 is running. Check the cabling. Link to comment
chopeta Posted October 31, 2013 Author Share Posted October 31, 2013 Find out why disk 15 is missing! Check if disk 15 is running. Check the cabling. I am just about to complete preclear in a new disk that's connected to the same cables/ports disk15 was using... Maybe my post was too big and confusing, just to recap: 1) disk13 is disabled but it was probably due to a cable or PSU problem. SMART report (see attach cut from syslog) looks OK and I am 100% sure I wasn't writing to the disk when it failed (I only use disk shares instead of user shares when writing). 2) disk15 is completed unusable, BIOS doesn't even recognized it anymore. SMART report looks horrible (see attach cut from syslog). So disk13 is good but disabled and disk13 is damaged and missing (here's a screenshot of my Main page) https://docs.google.com/file/d/0B0mYGQLWndqzU1JleHg2MDZZTW8/edit?usp=drivesdk What if I use 'New Config' button and reassign all drivers with proper position (specially parity) replacing the damaged disk15 with the new drive I am about to complete preclear? Would that work? Would be disk15 reconstructed and disk13 will be just fine and I will have no data loss? disk13-smart-report.txt disk15-smart-report.txt Link to comment
dgaschk Posted October 31, 2013 Share Posted October 31, 2013 In order to reconstruct 15 you need to cause it to fail. This can be done either by using the old drive in the new config and then changing to the new drive or starting the array with the disk unassigned and then assigning it. After a new config is started, with the "parity is correct" checkbox selected, I think a parity check may commence. You'll need to stop this check as quickly as possible and then cause disk 15 to fail. After the disk is rebuilt do a parity check. There may be a few errors from disk 13. It was disabled because a write to it failed. Likely a file system metadata write. Run a second parity check and no errors should remain. Finally, run reiserfsck on the disks to check the file systems. See my sig. Link to comment
Joe L. Posted October 31, 2013 Share Posted October 31, 2013 If you set a new disk configuration AND allow ANY parity calc to start, you lose the ability to re-cnstruct ANY drive,. Be very careful. you have two failed disks. (as far as unRAID is concerned) and under 5.0 there is no well documented way to recover forcing one of the disks to be considered valid. Ask for guidance from lime-technology before proceeding with ANY new-config followed immediately re-cnstruction of a failed drive. Do not be fooled into thinking you can easily recover using parity. I do not know how to accomplish that under 5.0. Do not just listen to forum users. I doubt any of us can properly guide you under 5.0. with multiple failed disks. Send e-mail to [email protected] for guidance. Link to comment
chopeta Posted October 31, 2013 Author Share Posted October 31, 2013 Thanks Joe, really appreciate it. I've sent an email to [email protected] and will be waiting for a response. Will keep the thread updated with any news. Link to comment
chopeta Posted October 31, 2013 Author Share Posted October 31, 2013 Thanks dgaschk (just noticed your reply). I hope you are right and I can get this issue resolved without loosing data. But I prefer to wait for LT guidance to be sure I won't break anything (more than already is). Link to comment
Fireball3 Posted November 4, 2013 Share Posted November 4, 2013 So disk13 is good but disabled and disk15 is damaged and missing (here's a screenshot of my Main page) you got a typo @"disk13 is damaged" So, in principle you "only" need to mark disk 13 as good and do a rebuild on the former disk15. At the moment you have only 1 loss which should be recoverable with parity. The only thing is: how to mark disk13 as good? Let LT guide you through this! Edit: Just curious: Wouldn't a backup of the config be useful in this case? What file would be necessary? Where is the array setup stored? super.dat? I imagine you restore the file(s) and start the array. It should detect the failed disk15 and there you go? Link to comment
chopeta Posted November 5, 2013 Author Share Posted November 5, 2013 OK, so my problem was solved with direct help from Tom/LimeTech. I am updating this so other users know what happened and how it was solved, however I can't vow for this to be a solution for others. It has worked for me under direct supervisor from LimeTech. Here are the steps: 0. Disable all add-ons (renamed plugins/extra folders and extra lines from go script) 1. Install a new disk that will replace the old disk15. Don’t do anything to the config, just physically hook up the disk. 2. Go to Utils page and click “New Config”. Check the checkbox and click Apply. 3. Go to Main page and assign all your disks again. Refer to your screenshots (https://docs.google.com/file/d/0B0mYGQLWndqzR29HVFFWMVFYWFU/edit?usp=drivesdk) and make sure have it right. When you get to disk15, you should be assigning the new disk. 4. From a telnet session or the console, type this command: mdcmd set invalidslot 15 5. Important: after typing above command, do NOT do anything else on the webGui except this: On the Main page, check the ‘Maintenance mode’ check box and then click ‘Start’. (ie, do not click refresh or refresh the browser, just check the checkbox and click Start). You should see all disks ‘green’ except disk15 which should be ‘orange’ and you should see a data rebuild taking place. Here's my main during the rebuilding: https://docs.google.com/file/d/0B0mYGQLWndqzWXBBbXUwcnRMN0k/edit?usp=drivesdk Once it was completed I had some filesystem corruption that was fixed by following dgaschk suggestions. Big thanks! Thanks Tom for his quick response and all the help. I am now going to install my new SASLP-MV8 Go UnRAID!!!! Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.