ptcadoc Posted March 3, 2021 Share Posted March 3, 2021 (edited) I'm a medical professional, requesting your smart people's help! Unraid 6.8.3 Set up: Seagate 6 TB drive X 3 + WD 4 TB x 1 + a cache SSD. One 6TB drive is the parity drive. The drive disabled is "Disk3" the most recent drive that I added to the server All of a sudden, one 6TB drive is "disabled" and in "emulation mode". I can read files that are on that drive but I'm assuming "emulation mode" means it's because of the parity drive. I I don't know enough about any of this so please bear with my ignorance. I looked around some of the posts and here's what I did 1. Rebooted a couple of times from the GUI - no change 2. Undid the SATA cable and reconnected. Used a different SATA cable . No difference 3. I downloaded the SMART file for that drive and it's attached here - from what I can glean it says PASSED 4. Going through some of the posts here there was a suggestion to follow the steps outlined here https://wiki.unraid.net/UnRAID_6/Storage_Management#Checking_a_File_System 5. So I went into "maintenance mode" clicked on the Drive in question, then Check button to run the file system check. Copied the info I got onto a text document it's attached I have no idea what to do now - please help. Is the drive gone bad ?? Thank you in advance klingon-smart-20210303-2130.zip filecheck.txt Edited March 3, 2021 by ptcadoc Change image Quote Link to comment
JorgeB Posted March 3, 2021 Share Posted March 3, 2021 Please post the diagnostics: Tools -> Diagnostics (after array is started) Quote Link to comment
ptcadoc Posted March 3, 2021 Author Share Posted March 3, 2021 35 minutes ago, JorgeB said: Please post the diagnostics: Tools -> Diagnostics (after array is started) Thank you for replying. Please see attached. klingon-diagnostics-20210303-2231.zip Quote Link to comment
trurl Posted March 3, 2021 Share Posted March 3, 2021 Disk3 SMART OK and as you mentioned it is emulated successfully. Since you rebooted before getting us the diagnostics we can't see any details about what may have disabled the disk except of course a write to it failed which is the reason a disk gets disabled. Check all connections, power and SATA, both ends including splitters. Then you will have to rebuild the disk since it is out-of-sync and the emulated contents need to be written back to the disk. Safest approach is to rebuild to another disk and keep the original as it is in case of problems during rebuild. But should be OK to rebuild to the same disk. Do you have backups of everything important and irreplaceable? Quote Link to comment
ptcadoc Posted March 3, 2021 Author Share Posted March 3, 2021 Hi - thank you for the reply Yes I'm realizing I should have gotten diagnostics before rebooting I guess the instinct to "turn it off and on" got the better of me I have everything backed up. I guess my next (stupid) question is what are the exact steps to do the rebuild? Can you provide an "idiot's guide"? Thank you in advance Quote Link to comment
trurl Posted March 3, 2021 Share Posted March 3, 2021 The documentation is accessible from the manual link in the very bottom right corner of your Unraid webUI. Here is the relevant section: https://wiki.unraid.net/UnRAID_6/Storage_Management#Rebuilding_a_drive_onto_itself Quote Link to comment
ptcadoc Posted March 3, 2021 Author Share Posted March 3, 2021 Thank you. I looked at the article you referred to Quick question regarding the steps: 1. Stop array 2. Unassign disabled disk 3. Start array so the missing disk is registered Does "unassign disabled disk" mean change disk3 to "no device" as shown in the picture? Thank you Quote Link to comment
trurl Posted March 3, 2021 Share Posted March 3, 2021 2 minutes ago, ptcadoc said: Does "unassign disabled disk" mean change disk3 to "no device" as shown in the picture? yes Quote Link to comment
ptcadoc Posted March 3, 2021 Author Share Posted March 3, 2021 (edited) Thank you. I followed the steps and as soon as I made the drive unassigned this error popped up. I continued and now have it doing a "rebuild". Is this error message suggesting a drive failure? Thank you immensely for the help One other question - while this rebuild is happening I should refrain from using Plex or writing anything to the server (or is it OK to do so)? Edited March 3, 2021 by ptcadoc edit Quote Link to comment
trurl Posted March 3, 2021 Share Posted March 3, 2021 CRC errors are connection issues not disk issues. That single CRC error was already there in your previous diagnostics. You can acknowledge that count by clicking on the SMART warning on the Dashboard page and it will warn you again if it increases. Accessing files while rebuilding is allowed but will slow down rebuild and slow down file access. Quote Link to comment
ptcadoc Posted March 3, 2021 Author Share Posted March 3, 2021 OK, that's very helpful to know. Thank you so much for the help so far. It says the rebuild will take 11 hours will post how it goes and if there are any hiccups. Quote Link to comment
SavellM Posted March 3, 2021 Share Posted March 3, 2021 Similar issue to myself. Are you using an LSI controller too? Quote Link to comment
trurl Posted March 3, 2021 Share Posted March 3, 2021 1 hour ago, SavellM said: Similar issue to myself. According to his description, rebooting was tried after the disk was already disabled. No evidence rebooting caused the disabled disk. In fact, since it was a disk just added, bad connection is the likely cause. Quote Link to comment
SavellM Posted March 3, 2021 Share Posted March 3, 2021 (edited) Mine too wasnt during reboot. Mine was during a spin up that it got disabled. This could happen at any point. @trurl Edited March 3, 2021 by SavellM Quote Link to comment
ptcadoc Posted March 4, 2021 Author Share Posted March 4, 2021 19 hours ago, trurl said: CRC errors are connection issues not disk issues. That single CRC error was already there in your previous diagnostics. You can acknowledge that count by clicking on the SMART warning on the Dashboard page and it will warn you again if it increases. Accessing files while rebuilding is allowed but will slow down rebuild and slow down file access. So after 15 hours of "rebuilding" it's all good !!!!!!!! Disk3 is back online and I got these messages I'd like to thank everyone for their suggestions especially @trurl I really really appreciate the time you took out to reply Moving forward, can I pick your brains on 1. Why this might have happened (it seems, if I understand correctly) this wasn't a hard drive hardware issue 2. More importantly, how to prevent it from happening again? Thank you Quote Link to comment
SavellM Posted March 4, 2021 Share Posted March 4, 2021 Try putting the drive to sleep... Give it a bit then try to wake the drive up and see how it goes. Quote Link to comment
trurl Posted March 4, 2021 Share Posted March 4, 2021 1 hour ago, ptcadoc said: pick your brains Connection issues are much more common than drive issues. Make sure the SATA and power cable connectors fit squarely on the disk connections, with no tension on the cables that might cause them to move. Do not bundle cables to try to make things look nice, that might cause that tension I mentioned, or have some things move other things. All you need is to have it neat enough for good airflow. Other connections along the way, such as power splitters, can be involved. And sometimes cables are bad. Some controllers also have issues but these are usually confined to specific models. Quote Link to comment
ptcadoc Posted March 4, 2021 Author Share Posted March 4, 2021 5 hours ago, trurl said: Connection issues are much more common than drive issues. Make sure the SATA and power cable connectors fit squarely on the disk connections, with no tension on the cables that might cause them to move. Do not bundle cables to try to make things look nice, that might cause that tension I mentioned, or have some things move other things. All you need is to have it neat enough for good airflow. Other connections along the way, such as power splitters, can be involved. And sometimes cables are bad. Some controllers also have issues but these are usually confined to specific models. So let me see if I understood correctly This still could have been a cable / connection issue? But then when I undid the SATA cable and replugged it in, the drive still remained "disabled" doesn't that suggest the cable was NOT the isssue? Do I have ways to prevent this from happening again? Thank you Quote Link to comment
trurl Posted March 5, 2021 Share Posted March 5, 2021 5 hours ago, ptcadoc said: I undid the SATA cable and replugged it in, the drive still remained "disabled" doesn't that suggest the cable was NOT the isssue? When a disk becomes disabled, it is out-of-sync and must be rebuilt. Quote Link to comment
trurl Posted March 5, 2021 Share Posted March 5, 2021 Here is how this works. When a write to a disk fails, Unraid disables it. But that failed write is still used to update parity. After a disk is disabled, it is no longer used. Instead, it is emulated from the parity calculation. The data can be read from the parity calculation, and writes to the emulated disk happen by updating parity. But the physical disk isn't used because it is out-of-sync with the parity calculation. When you rebuild the disk, the emulated contents are written to the disk and it becomes enabled again because it is in-sync after the rebuild. The failed write that disabled the disk, and any subsequent writes to the emulated disk, are recovered. 1 Quote Link to comment
ptcadoc Posted March 5, 2021 Author Share Posted March 5, 2021 I see, thank you makes sense now For my own learning/understanding, how long can this "emulated mode" have continued? In other words, could I have continued to "write" to Disk3 for quite a while and if so, at what point would have things finally broken down? Quote Link to comment
trurl Posted March 5, 2021 Share Posted March 5, 2021 It could continue indefinitely, but since the disk was disabled, you no longer had any redundancy. Parity is a common concept in computers and communications. It is basically the same idea wherever it is used. Parity is just an extra bit that allows a missing bit to be calculated from all the other bits. Parity by itself can recover nothing, all the other disks are required. If there was a problem with another disk then you had no way to recover since you already had a missing disk (though since each disk is independent in Unraid nothing on other disks would be lost). Depending on the exact situation in that case you might lose some data. Quote Link to comment
ptcadoc Posted March 5, 2021 Author Share Posted March 5, 2021 28 minutes ago, trurl said: It could continue indefinitely, but since the disk was disabled, you no longer had any redundancy. Parity is a common concept in computers and communications. It is basically the same idea wherever it is used. Parity is just an extra bit that allows a missing bit to be calculated from all the other bits. Parity by itself can recover nothing, all the other disks are required. If there was a problem with another disk then you had no way to recover since you already had a missing disk (though since each disk is independent in Unraid nothing on other disks would be lost). Depending on the exact situation in that case you might lose some data. Got it. I'm impressed (from an uninformed person's perspective) with Unraid. Previously I had a FreeNAS server and had nothing but problems including a failed disk rebuild after a drive failed and repeated glitches so I finally switched around 6 months ago. With Unraid, a glitch happened and got rectified easily, although only after I sought help as I didn't know how to go about it. Thanks again 1 Quote Link to comment
ChatNoir Posted March 5, 2021 Share Posted March 5, 2021 6 hours ago, ptcadoc said: For my own learning/understanding, how long can this "emulated mode" have continued? In other words, could I have continued to "write" to Disk3 for quite a while and if so, at what point would have things finally broken down? Do you have some kind of notifications setup so that you can act quickly if an issue is detected by Unraid ? You can setup various type of notifications from emails to fancy push on a phone, etc. Better to tackle a problem early on before than wait and find yourself in a bad situation with several undetected failures. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.