johnnya1306 Posted August 28, 2018 Share Posted August 28, 2018 Not a good weekend for my server... Started up this weekend to find two red X drives. Replaced with two other drives I had available. Rebooted and started the array to rebuild, only to see two additional red Xes in different slots. This was after a long wait to mount the array and having to refresh the main page. The first two were old 2 Tb drives and didn't think it was odd that they might have failed. The second pair were two new 8 Tb drives for which I don't have replacements. Both times, one would show as "un-mountable file system" but didn't give any result with the xfs check tool. Is my only option to rebuild? Do I set a new configuration and copy all the drives back into a new array? I would like to replace the old 8 X 2 Tb drives with 2 new 8 Tb drives. Should I consider a different infrastructure? Is it a failing power supply (it's older, too...)? Sorry for all the questions, but I don't want to make it worse. Sincerely appreciate any help, diagnostic.zip attached. Thanks, John tower-diagnostics-20180824-2045.zip Quote Link to comment
Frank1940 Posted August 28, 2018 Share Posted August 28, 2018 Did you have to open up the case to replace those two drives? If the answer is 'yes', then shutdown the server and open up the case again. Check all of the SATA connectors at both ends and make sure that they are firmly seated. What often happens in these cases is that one of more connectors are disturbed and loosened. Quote Link to comment
Harro Posted August 28, 2018 Share Posted August 28, 2018 When I had this happen to me with two red balled drives, I replaced all cables with locking ends and also replace my old power supply with a new one. Since these new parts I have been having no problems. These may or may not help but it did for me. Quote Link to comment
Frank1940 Posted August 28, 2018 Share Posted August 28, 2018 50 minutes ago, Harro said: , I replaced all cables with locking ends IF you do this, watch out for this problem. You can easily tell if the locking SATA connector won't lock. You should double check all SATA connections by gently pulling back on them. There should be a slight resistance to your pull. If it comes out without any resistance, you should place that cable with an entirely different one. https://support.wdc.com/knowledgebase/answer.aspx?ID=10477 Quote Link to comment
johnnya1306 Posted August 29, 2018 Author Share Posted August 29, 2018 OK, thanks for the replies. No, I didn't need to open the case to remove the drives. I'm using Norco 5in3 hotswap modules and all the cables are locking. I am also sure to shutdown before moving any of the drives, since I learned the system doesn't like to hotswap. The startup described above was just turning it back on from a previously working and active state. Since the failures were not common to one slot or controller, I would most suspect the power supply. I'll probably replace it as I'm considering a full rebuild. I guess my main question is that with four "failed" drives, is my only real option to rebuild the array? I'm not familiar enough with any of the tools to "repair" the drives. Am I correct in presuming that all the data is still on the drives? If I were to get a couple new drives and setup a new configuration, can I just set each as Unassigned, copy the data and re-deploy? Is there a better way? Do I have any other options? Is there any insight to glean from the diagnostics file? Sorry for all the questions and thanks for the help! Quote Link to comment
Frank1940 Posted August 29, 2018 Share Posted August 29, 2018 (edited) Post up that Diagnostics file. Be sure to tell us how many drives are in the system and which ones (i.e., Disk1, ... ) have a problem. Remember that the data drives in unRAID are format with a standard Linux file system and all of the files on those disks can be easily read if there are no file system corruption. If there is corruption, there are recovery tools for those file system. So don't panic at this point. Let's just work through this one step at a time. Edited August 29, 2018 by Frank1940 Quote Link to comment
johnnya1306 Posted August 29, 2018 Author Share Posted August 29, 2018 The diagnostic file is attached to my first post. I have 19 drives, 2 X 8 Tb parity, 3 X 8 Tb data, 6 X 4 Tb data and 8 X 2 Tb data. I think the first two with red X were 8 and 9??? The second two were 1 and 3??? Quote Link to comment
Frank1940 Posted August 29, 2018 Share Posted August 29, 2018 Sorry! But there is an issue with the Board setup and apparently, a lot of people (including me) can't download attachments at this juncture... Quote Link to comment
trurl Posted August 29, 2018 Share Posted August 29, 2018 Looks like attachments didn't come over to the new forum. I just downloaded a new attachment on another thread and it worked. 42 minutes ago, johnnya1306 said: The diagnostic file is attached to my first post. Post it again. Quote Link to comment
johnnya1306 Posted August 29, 2018 Author Share Posted August 29, 2018 OK, thanks. tower-diagnostics-20180824-2045.zip Quote Link to comment
trurl Posted August 29, 2018 Share Posted August 29, 2018 2 hours ago, Frank1940 said: Be sure to tell us how many drives are in the system and which ones (i.e., Disk1, ... ) have a problem. You have so many drives, it would help if you could give us some more details. We could probably figure out most of it, but you could save us some time. Quote Link to comment
johnnya1306 Posted August 29, 2018 Author Share Posted August 29, 2018 I think that the first two drives to red x were in slots 8 and 9, they were both old 2 Tb veterans. After replacing them with a pair of less old 2 Tb Samsung drives I had laying around, the second two red Xes appeared. I believe that they were in slots 1 and 3. Both were very new 8 Tb drives. At that point I returned and reassigned the "failed" 2 Tb drives to their original slots and shut down. It's been down ever since... Quote Link to comment
trurl Posted August 29, 2018 Share Posted August 29, 2018 To save us some time digging through the diagnostics with so many disks, and save yourself the time typing in even more details. just post a screenshot of Main - Array Devices. Quote Link to comment
trurl Posted August 29, 2018 Share Posted August 29, 2018 Also, your older version of unRAID doesn't give us as many clues when digging through the diagnostics. Quote Link to comment
Frank1940 Posted August 29, 2018 Share Posted August 29, 2018 Could you provide us with the make and model of your HBA('s) and the make and model number of your PS along with its wattage rating. Quote Link to comment
johnnya1306 Posted August 30, 2018 Author Share Posted August 30, 2018 Didn't realize I was running an old version. Seems like I upgraded recently but that was only after a prompt on the main page. I have two LSI 9211-8i flashed to IT mode. The power supply is an older Corsair HX 750W. Here's a shot of the main page with the problem drives... Disk 8 and 9 were the two I tried to replace before I had an issue with the other two. Thanks for all the help and apologies for the difficulty. Quote Link to comment
trurl Posted August 30, 2018 Share Posted August 30, 2018 The diagnostics are several days old now. Is your server currently turned off? There are several more disks that don't appear in the screenshot but I assume those are indicating green. Disks 8,9 don't even appear in the diagnostics. Parity and disk1 SMART look OK. Are all these disks on the same controller? Quote Link to comment
Frank1940 Posted August 30, 2018 Share Posted August 30, 2018 One thing you should consider is that you have PS issues. Drive 8 and 9 being missing in the SMART reports in your posted Diagnostics file and and another pair of disks being involved in this screen shot are a most unusual situation. I can recall some past similar cases that were PS issues. I did a quick search on Google and could find any real specs on this PS beyond the fact that it could a 12V single rail PS--- with a 40A Max rating on any 12V output in the multi rail configuration. Quote Link to comment
johnnya1306 Posted August 30, 2018 Author Share Posted August 30, 2018 Yes, the server's been off all week. I only turned it on last night to take that screenshot. I don't think that they are on the same controller, both pairs are in separate 5X3 modules. 8 and 9 were removed from the configuration when I tried to replace them. One similar observation from both instances is that the array didn't mount the drives and start the array as quickly as it usually does. Agree that it seems to point to an issue with the PS which is probably older than the discs. I've accepted that it should be replaced. Is there currently a PS model and rating that is favored by the community? Aside from diagnosing the cause, I've also accepted that with four failed drives, I'll have to rebuild the array from a new configuration. I suspect the data is intact since I didn't even get to start the array either time and there shouldn't have been any writes. Do I reset the configuration and start copying the data? Is Krusader the recommended application? Thanks again. Quote Link to comment
trurl Posted August 30, 2018 Share Posted August 30, 2018 39 minutes ago, johnnya1306 said: Do I reset the configuration and start copying the data? Do you really want to copy the data to different disks, or maybe just try to set a New Configuration with the disks as is and rebuild parity? I didn't notice any SMART issues on the 2 red X but of course couldn't see the other 2. If there is no corruption everything should be there, and if there is corruption we can try to repair. Anyway, that might be a simpler route to getting your hardware stable than trying to bring new disks into the picture right now. Then after you get things squared you can maybe replace/upsize disks if that's what you want to do. Don't remember asking this before now. Do you have good backups of any important and irreplaceable data? If not then trying to copy that stuff to another system or external drive would be the first priority. Quote Link to comment
John_M Posted August 30, 2018 Share Posted August 30, 2018 20 hours ago, johnnya1306 said: I think that the first two drives to red x were in slots 8 and 9, they were both old 2 Tb veterans. After replacing them with a pair of less old 2 Tb Samsung drives I had laying around, the second two red Xes appeared. I believe that they were in slots 1 and 3. Both were very new 8 Tb drives. At that point I returned and reassigned the "failed" 2 Tb drives to their original slots and shut down. It's been down ever since... I'm confused by your last action in the above. Having replaced disks 8 and 9 (presumably successfully, as you don't say otherwise), why did you reassign the old drives to their original slots? How did you think that would help you fix your new problem with disks 1 and 3? Quote Link to comment
johnnya1306 Posted August 30, 2018 Author Share Posted August 30, 2018 John_M, The replacement was not successful. I got the second two red Xes when the system hung while mounting the disks. trurl, Yes, I would like to replace the old 2 Tb disks with new 8 Tb disks but getting the system stable is my first priority. With disks 8 and 9 assigned back to the array, I can set a new configuration, keep only the data disks and rebuild parity, right? Assuming that I can get the array started after replacing the power supply. Thanks all. Quote Link to comment
John_M Posted August 30, 2018 Share Posted August 30, 2018 4 minutes ago, johnnya1306 said: The replacement was not successful. I got the second two red Xes when the system hung while mounting the disks. OK. That wasn't apparent. Quote Link to comment
trurl Posted August 30, 2018 Share Posted August 30, 2018 11 minutes ago, johnnya1306 said: With disks 8 and 9 assigned back to the array, I can set a new configuration, keep only the data disks and rebuild parity, right? yes Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.