curiouskid Posted June 22, 2017 Share Posted June 22, 2017 (edited) I had UnRaid running on 16 core 32 thread 128 GB RAM on monster PC. While I appreciate everything this PC did for me, it was power hungry. To save $ without loosing data and protection, I decided to move this to low power computer. So, I built replacement PC and tested it by installing Windows 10 in Mini PCIE SSD using USB bootable media. I confirmed all hardware including booting from USB worked. After doing stress test when I was happy with newly built PC's stability and performance, I decided to move UnRaid to this computer. I gracefully power down above mentioned Monster PC and pulled out USB Drive for UnRaid and put it in newly built low power PC. I used same USB slot where i had put in another bootable USB drive for Windows install earlier. I got disappointed when UnRaid didn't boot. I had verified and confirmed that USB stick for UnRaid was top priority in boot order and all other devices were disabled in boot order list. I thought may be bad port and I tried all 4 available USB ports one by one and no luck booting Unraid from any of them. Drive was getting detected each time. Now, I was more concerned. So, I put that USB back into old monster PC and it worked. One thing I did notice is that USB drive for Unraid shows as UEFI : Samsung USB <Serial number of UnRaid USB Drive> Would you please advice on what can I do to ensure newly built PC can boot into UnRaid? I would really appreciate any assistance on this issue. Edited June 22, 2017 by curiouskid Quote Link to comment
Frank1940 Posted June 22, 2017 Share Posted June 22, 2017 I assume that you are trying to boot 6.4.0-rcX. IF this is the case, look on your flash drive and rename the EFI folder to something like EFI.NotUsed You might also like to read this thread: You can also look on the boot order menu of your BIOS for an option for your USB drive that does NOT include UEFI in the description. You might want to read this post and the next few posts about this issue: And this issue is being address in the latest release of the 6.4.0-rcX release series. se the first post here: Quote Link to comment
curiouskid Posted June 30, 2017 Author Share Posted June 30, 2017 When I opened ticket with mother board manufacturer, they asked me to take photo of each BIOS setting and send to them. In the process, I learn that under Advanced Menu ->CSM Setting -> boot option filter was set to "UEFI only". While I was preparing response for them, I notice this. So, I quickly changed it to "UEFI and Legacy" which allowed boot. (See BIOS photo attached). In short, My drive was following legacy MBR boot and motherboard was looking for UEFI bootable media. Hence, it was not booting. After above change, I could boot without any issues. However, for this troubleshooting, I had power cycled system multiple time and often not so gracefully. As result one of my data disk in array went bad. Luckily it was not Parity drive. So, I followed https://wiki.lime-technology.com/Troubleshooting#Re-enable_the_drive and confirmed it passes SMART test and started rebuild. It's 6 TB drive with about 850 GB data in it. so, rebuild is seem to take forever. It seems once I have fired rebuild disk and start array, GUI stops responding. Frank1940 could you please advice if there is a way to monitor rebuild process via command line? How do I know if rebuild is going well? Is there a log file about drive rebuild that I can look for? Quote Link to comment
Frank1940 Posted June 30, 2017 Share Posted June 30, 2017 (edited) I don't why the GUI has stopped responding but understand that when a Disk is rebuilt, EVERY sector is rebuilt not just the ones with data so the total rebuilt time will probably be about the same as parity check time. I don't know of a way to verify the progress of the rebuilt process via the command line but someone else may. You might run diagnostics and that will write the Diagnostics file to the logs directory of the Flash Drive. IF you can get the file off of the Flash Drive, you might attach it to a new post. Edited June 30, 2017 by Frank1940 Fixed spelling of "way" in second paragraph. Quote Link to comment
Squid Posted June 30, 2017 Share Posted June 30, 2017 18 minutes ago, Frank1940 said: I don't know of a why to verify the progress of the rebuilt process via the command line but someone else may. cat /var/local/emhttp/var.ini | grep "mdResync" One of the resulting lines is mdResyncPos which can be compared to mdResyncSize which will give you the current position vs how far it has to go. Quote Link to comment
trurl Posted June 30, 2017 Share Posted June 30, 2017 5 hours ago, curiouskid said: As result one of my data disk in array went bad. Luckily it was not Parity drive. Not sure why you would think this. ALL disks are required to rebuild a disk. So parity is not any more important than any of the others. In fact, you could argue that parity is less important, since it doesn't actually contain any of your data. Quote Link to comment
curiouskid Posted June 30, 2017 Author Share Posted June 30, 2017 Thank you trurl for correcting me. I had oversimplified understanding/misunderstanding of Parity drive and it's role in re-construction of data. Because of your post, I carefully studied https://wiki.lime-technology.com/Parity and there I noticed, following. "At these times, all of the disks (including parity) are read to reconstruct the data to be written to the target disk. As the sum of the bits is always even, unRAID can reconstruct any ONE missing piece of data (the parity or a data disk), as long as the other pieces are correct." Could you please confirm if extended understanding of long sentence above is correct. 1) Only one drive's failure can be tolerated for reconstruction of data. 2) All other drives in array should be present and readable. 3) Pre-condition for recovery of that one drive is that other drives should not have any issues. which means as UnRaid owner I must pay attention to error notification very seriously because fault tolerance window is very small. If I do not act quickly to replace faulty drive quickly and in between if second drive run into any issues, there is no chance of recovery. 1 Quote Link to comment
Frank1940 Posted June 30, 2017 Share Posted June 30, 2017 RIGHT. You have a firm grasp of the situation. But not all apparent disk failures are true disk failures. For example, in replacing a bad disk, a connector to another disk may be displaced which results in an condition which would most likely appear to the casual observer to be another defective disk. And there are several other examples which some other folks could put forward. You are also correct in that everyone should have setup the 'Notifications Settings' and look at the resulting e-mail everyday to see that things are OK. You really want to address problems as they come up. If you wait until you can't even reach some files before you check on the condition of your server, you have probably lost data. Quote Link to comment
JonathanM Posted June 30, 2017 Share Posted June 30, 2017 11 minutes ago, curiouskid said: 1) Only one drive's failure can be tolerated for reconstruction of data. 2) All other drives in array should be present and readable. 3) Pre-condition for recovery of that one drive is that other drives should not have any issues. which means as UnRaid owner I must pay attention to error notification very seriously because fault tolerance window is very small. If I do not act quickly to replace faulty drive quickly and in between if second drive run into any issues, there is no chance of recovery. YES!!! Exactly! The only thing to add is that a 2nd parity drive extends this to 2 simultaneous failures, but with the same conditions. The parity drive does not use some magical data compression that can reconstruct any data on demand. Quote Link to comment
trurl Posted June 30, 2017 Share Posted June 30, 2017 58 minutes ago, curiouskid said: I had oversimplified understanding/misunderstanding of Parity drive Frank1940 and jonathanm pretty much said everything I would have said. I am curious though about what your previous (mis)understanding was. Did you think you would be able to recover all of the data from a failed drive using only the parity drive? If you think about it for a moment, there is no way the parity drive could contain the data for any drive. Since there is no way to know which drive might fail, it would have to contain all the data for every drive if it were possible to recover a disk using only the parity drive. Obviously parity doesn't have the capacity for this. Now that you understand this, I think you are in a much better place than quite a lot of unRAID users. Knowing how parity works makes a lot of things about using unRAID much more understandable, and you are less likely to make mistakes. Quote Link to comment
curiouskid Posted July 1, 2017 Author Share Posted July 1, 2017 Frank1940 After about 20 mins of silence with crossed finger, I got GUI back. Which now reads. Current Status: Total size:6 TB Elapsed time:17 hours, 57 minutes Current position:2.05 TB (34.2 %) Estimated speed:32.6 MB/sec Estimated finish:1 day, 9 hours, 35 minutes I will write update on recovery after 1 day and 10 hours. For notification I am using Boxcar Notification Agent. Free notifications on phone app from delivered almost real-time from UnRaid. Only problem is that I get too many of those even for green status. After I get my drive back, I will extend research to get only one important ones. Lesson from 1 data disk failure, 1) Read notification and act on them. Prerequisite is that I get few. 2) Do not do anything in panic. It can hurt my data. 3) When in doubt ask community. 4) Accept the fact that just like humans, data loss could occur due to many reasons and in many ways and I can get protection only from few of those reasons. 5) Get an off-site backup. Which I do not have at the moment.Squid Thank you for response. mdResyncPos/mdResyncSize * 100 = % of completion. I am proposing if you could write command line version of everything that Web-UI is doing or showing? May be a shell script with few command line argument about part of Web-UI in clear text on console. That way no Web-UI condition would not provoke novice user like me to push reset button in the moment of panic. This time I didn't do it because I knew drive rebuild is in progress.trurl 12 hours ago, trurl said: I am curious though about what your previous (mis)understanding was. Did you think you would be able to recover all of the data from a failed drive using only the parity drive? Yes. I raise white flag for that. Forgive me for my ignorance. 12 hours ago, trurl said: Now that you understand this, I think you are in a much better place than quite a lot of unRAID users. Knowing how parity works makes a lot of things about using unRAID much more understandable, and you are less likely to make mistakes. Yes. Thank you. Once again thank you for helping. I truly appreciate everyone's attention to detail in this community. Quote Link to comment
curiouskid Posted July 1, 2017 Author Share Posted July 1, 2017 jonathanm Thank you for additional point. 15 hours ago, jonathanm said: The only thing to add is that a 2nd parity drive extends this to 2 simultaneous failures, but with the same conditions. Above line got me reading more about Dual Parity. I enjoyed Further discussion#1 thread till page#3 as I could not follow all that was discussed from there on. Because I am infra guy and my programming skill is limited to small scripts and have no base for advanced mathematics to understand formula for data recovery. Anyway, as per WIKI Quote Dual parity For large arrays, ‘dual parity’ – or, the facility to have a second parity disc that is not simply a mirror of the first – would be useful. This would permit two simultaneous drive failures without losing data. unRAID does not have dual parity at present, but ‘P + Q redundancy’ is part of the future roadmap.(Dead Link) In a P + Q redundancy system (as in a RAID-6 system), there would be two redundancy disks: ‘P’, which is the ordinary XOR parity, and ‘Q’, which is a Reed-Solomon code. This would allow unRAID to recover from any 2 disk errors, with minimal impact on performance. Further discussion: [1], [2] Part highlighted in RED got me confused because I do see slot for 2nd parity drive in Web-UI of current stable release of UnRaid ver 6.3.5. For consumer with simple use case mentioned below which is better option? A. 2nd Parity assuming it's Q Drive with Reed-Solomon code implemented correctly. B. Hot Spare which can be used to replace parity or data disk quickly. C. Buy Cloud Backup with hope that they give me my-data when needed. Instead of showing fine print of user agreement about their limited liability about their inability. Even some of top cloud backup provider's customer review suggest that when they tried restore, it didn't work or it was incredibly slow where complete recovery would take months. Use Case: Very limited at this point. 1) Only user who writes. 2) A Share named VMs - For ESX_VM_Backup. 3) A Share named ISOs - For repository of ISOs. 4) A Share named TimeMachine - To backup Mac. 4) 3 Share named Data, Audio Video - To store 3 type of data. 5) Docker image running Plex. Quote Link to comment
Frank1940 Posted July 1, 2017 Share Posted July 1, 2017 (edited) 4 hours ago, curiouskid said: Part highlighted in RED got me confused because I do see slot for 2nd parity drive in Web-UI of current stable release of UnRaid ver 6.3.5. The WIKI's are seldom completely up to date. Many of them are maintained by the user base and, in some cases, the originator has left the scene or is inactive for a variety of reasons. Dual Parity is now a fact and if you want to read about the protection level that it provides, you can do so here: There is considerable discussion on the topic in this thread and it can help you make a informed decision about dual parity. Hot spares are another place where it is largely up to you. I know that there are a few people with them. I would suspect that there are a lot more with a precleared drive sitting on the shelf waiting for a future drive failure. The reason for this is that sometimes they will have two (or more) servers and having a drive ready to go is more economical solution then have one in every server! Plus, you are burning through the warranty period without even having the drive in use. As the article above points out, data loss due physical damage to the server itself is a real issue and must always be take into account! Cloud Backup is possibility but the realities of moving TB's of data is the real issue here. A better alternative is to backup to the cloud (or use some other offsite backup scheme), that data which is completely unreplaceable! (Family photographs and personal financial records are examples...) Movies and TV shows are replaceable and maybe even expendable! I, personally, use three 2.5" USB drives to make backups of my unreplaceable data. Two of them are always in a safety deposit box at a bank a few miles away. There is also a backup on one of my servers. And my servers are now read only for all Windows computers to provide protection from Malware. So I have a minimum of three backups of most of my most valuable data. Edited July 1, 2017 by Frank1940 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.