taflix Posted July 3, 2022 Share Posted July 3, 2022 (edited) Diagnostics attached. I have a new array of 14 TB drives. One of them was disabled a few hours ago. What should I do? The short self test said: Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 840 - taflix-unraid-diagnostics-20220702-2253.zip Edited July 3, 2022 by taflix Quote Link to comment
itimpi Posted July 3, 2022 Share Posted July 3, 2022 No obvious reason in the diagnostics except for the fact that you suddenly started getting read errors on disk4. At first glance SMART data for the drive looks OK. I would suggest you need to run the extended SMART test on the drive (which will take many hours). It is not unusual for a drive to pass the short test but fail the longer one. 1 Quote Link to comment
JorgeB Posted July 3, 2022 Share Posted July 3, 2022 It's not logged as a disk problem, still a good idea to run an extended SMART test like mentioned by itimpi, if that's OK check/replace/swap cables/slot for that disk before rebuilding. 1 Quote Link to comment
taflix Posted July 3, 2022 Author Share Posted July 3, 2022 Thanks guys. I started a READ CHECK last night. Should I stop it and do the extended test? Quote Link to comment
trurl Posted July 3, 2022 Share Posted July 3, 2022 18 minutes ago, taflix said: stop it and do the extended test? yes, you will have to disable spindown on the disk to get extended test to complete 1 Quote Link to comment
taflix Posted July 4, 2022 Author Share Posted July 4, 2022 (edited) 23 hours ago, trurl said: yes, you will have to disable spindown on the disk to get extended test to complete Okay, the READ CHECK was 25.7% done when I stopped it and it showed no errors. I started the extended self test, it probabl won't be done until tomorrow. It's at 50% since I started it 10 hrs ago. Happy 4th of July! 🧨 Edited July 4, 2022 by taflix Quote Link to comment
taflix Posted July 4, 2022 Author Share Posted July 4, 2022 On 7/3/2022 at 10:07 AM, trurl said: yes, you will have to disable spindown on the disk to get extended test to complete The extended test shows: Completed without error The results are attached. Whenever you get around to this, let me know what we should do next? Thanks! taflix-unraid-smart-20220704-0859.zip Quote Link to comment
trurl Posted July 5, 2022 Share Posted July 5, 2022 That looks fine. On 7/3/2022 at 5:37 AM, JorgeB said: if that's OK check/replace/swap cables/slot for that disk before rebuilding. 1 Quote Link to comment
taflix Posted July 5, 2022 Author Share Posted July 5, 2022 (edited) 11 hours ago, trurl said: That looks fine. Thank you. I've never rebuilt before. Is this the process? https://wiki.unraid.net/Replacing_a_Data_Drive Since the drive is fine and I'm not replacing the drive, what would the process be? Edited July 5, 2022 by taflix Quote Link to comment
JonathanM Posted July 5, 2022 Share Posted July 5, 2022 13 minutes ago, taflix said: Since the drive is fine and I'm not replacing the drive, what would the process be? Once you start the array with the disk unassigned, you can stop the array and assign the same disk back to the slot and let it rebuild. You just have to make Unraid see the drive as a replacement. 1 Quote Link to comment
Solution trurl Posted July 5, 2022 Solution Share Posted July 5, 2022 Your earlier diagnostics showed disabled/emulated disk4 mounted with more than 6TB of contents. You can examine the emulated contents if you want. The emulated contents is exactly what will be rebuilt. Assuming that is still the case, you can rebuild the disk to itself. Or you can rebuild to a spare if you have one and keep the original as is in case there are problems with rebuild. But should be OK to rebuild on top. Looks like you let google find an old wiki link. The current documentation is linked at the top and bottom of the forum, and in the manual link at lower right in your Unraid webUI. https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself 1 Quote Link to comment
taflix Posted July 5, 2022 Author Share Posted July 5, 2022 (edited) 2 hours ago, trurl said: Your earlier diagnostics showed disabled/emulated disk4 mounted with more than 6TB of contents. You can examine the emulated contents if you want. The emulated contents is exactly what will be rebuilt. Assuming that is still the case, you can rebuild the disk to itself. Or you can rebuild to a spare if you have one and keep the original as is in case there are problems with rebuild. But should be OK to rebuild on top. Looks like you let google find an old wiki link. The current documentation is linked at the top and bottom of the forum, and in the manual link at lower right in your Unraid webUI. https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself Thanks sir! It is showing this: I'm not familiar with the concept of "emulated"? And those instructions are much better! The disk is now being reconstructed: Edited July 5, 2022 by taflix Quote Link to comment
trurl Posted July 5, 2022 Share Posted July 5, 2022 13 minutes ago, taflix said: I'm not familiar with the concept of "emulated"? When a write fails to a disk in the array, the disk is disabled and emulated from the parity calculation. After it is disabled the physical disk is not used again until rebuilt. Reads of the disk are emulated by reading all other disks and getting its data from the parity calculation. Writes of the disk are emulated by updating parity as if the disk had been written. That initial failed write is emulated and any subsequent writes are emulated. Since the physical disk is no longer in sync with the array it is "kicked out". That initial failed write and any subsequent writes can be recovered by rebuilding since the data can be recovered by reading the emulated disk. And the emulated disk is exactly what will be rebuilt. While a disk is being emulated (and even while it is rebuilding), you can still access the emulated data (and even write new data), provided the emulated disk is mountable. That is why we asked you to check the contents of the emulated disk since that is what is being rebuilt whether it is mountable or not. Since your screenshot and earlier diagnostics showed the disk had contents (and so was mounted) everything should come out OK. 1 Quote Link to comment
trurl Posted July 5, 2022 Share Posted July 5, 2022 If a disk is unmountable it either has not been formatted yet and so has no filesystem to mount, or it has a corrupted filesystem that can't be mounted and needs repair. This is true whether the disk is "active" or emulated as shown in your screenshot. If an active disk is unmountable, the filesystem is repaired, and if an emulated disk is unmountable, you can repair the emulated filesystem before rebuilding (recommended) or after rebuild. Note that rebuild will not fix the corrupt filesystem, it will rebuild exactly what the emulated disk has even if the emulated disk has a corrupt filesystem. 1 Quote Link to comment
trurl Posted July 5, 2022 Share Posted July 5, 2022 For future reference, check filesystem (and repair) is also in the wiki but if it ever happens to you please ask for advice before doing anything. 1 Quote Link to comment
itimpi Posted July 5, 2022 Share Posted July 5, 2022 3 hours ago, taflix said: Thank you. I've never rebuilt before. Is this the process? https://wiki.unraid.net/Replacing_a_Data_Drive Since the drive is fine and I'm not replacing the drive, what would the process be? There is a lot of out-of-date information in the wiki (although it is largely correct). You should if possible always use the the official online documentation accessible via the ‘Manual’ link at the bottom of the GUI. The relevant part is this Quote Link to comment
quattro Posted July 30, 2022 Share Posted July 30, 2022 (edited) On 7/5/2022 at 9:52 AM, trurl said: When a write fails to a disk in the array, the disk is disabled and emulated from the parity calculation. After it is disabled the physical disk is not used again until rebuilt. Reads of the disk are emulated by reading all other disks and getting its data from the parity calculation. Writes of the disk are emulated by updating parity as if the disk had been written. That initial failed write is emulated and any subsequent writes are emulated. Since the physical disk is no longer in sync with the array it is "kicked out". That initial failed write and any subsequent writes can be recovered by rebuilding since the data can be recovered by reading the emulated disk. And the emulated disk is exactly what will be rebuilt. While a disk is being emulated (and even while it is rebuilding), you can still access the emulated data (and even write new data), provided the emulated disk is mountable. That is why we asked you to check the contents of the emulated disk since that is what is being rebuilt whether it is mountable or not. Since your screenshot and earlier diagnostics showed the disk had contents (and so was mounted) everything should come out OK. That's a beautiful explanation of what's going on. I too got the "being reconstructed" message which is very confusing. Is reconstruction different from rebuilding? I found instructions for adding back a drive that was disabled but appears to be good from SMART data (I suspect delayed spin up of a drive when a scheduled parity check started or a cable/power issue). The instructions I found say 1)stop the array 2) remove the disabled drive from the array 3)start the array 4)stop the array 5) assign the disabled disk to the empty slot 6)start the array in maintenance mode 7) click sync Could I just do steps 1 through 6 and not trigger the rebuild with the sync button? Why is the word reconstruct being used in this message instead of rebuild? A quick google of unraid reconstruct seems to only pull up info on reconstruct rewrite. When it comes to repairs, rebuild always seem to be the preferred terminology. What exactly is happening when this message is triggered. Thanks! Edited July 30, 2022 by quattro Quote Link to comment
JonathanM Posted July 31, 2022 Share Posted July 31, 2022 16 hours ago, quattro said: Could I just do steps 1 through 6 and not trigger the rebuild with the sync button? If you don't rebuild, the physical disk will not be part of the array, all reads and writes will still be happening on the emulated disk and you will remain unprotected from another disk failure. The physical disk only has the data on it from the instant before the write failed, all subsequent writes are only on the emulated copy. So, at this point, you have 2 different sets of data, 1 that is actively being read and written as an emulated drive in the array, and 1 that is on the physical disk that stopped being updated when the write failed. Normally you want to use the current set of data, and write that to a physical disk, either the original physical disk that was in the slot or a new disk, depending on the condition of the original drive. In some circumstances if parity was not in sync when the write failed, you can end up with the emulated data being corrupt, in which case you should rebuild to a new drive to attempt data recovery on both the original drive that was kicked out of the array and the emulated copy written to the new drive and see which copy is most correct. 16 hours ago, quattro said: Why is the word reconstruct being used in this message instead of rebuild? The parity emulated disk is being reconstructed on the fly by reading all the remaining drives, and you need to commit that emulated data to a physical disk, rebuilding it and getting the array back in sync so it is once again fault tolerant. Quote Link to comment
trurl Posted August 1, 2022 Share Posted August 1, 2022 On 7/30/2022 at 6:42 PM, quattro said: I suspect delayed spin up of a drive when a scheduled parity check started You can forget about that suspicion On 7/30/2022 at 6:42 PM, quattro said: or a cable/power issue Or just the connections. These are by far the most frequent cause. On 7/30/2022 at 6:42 PM, quattro said: start the array in maintenance mode Not usually necessary for rebuild. I only do Maintenance mode for check filesystem when needed. Starting in normal mode will allow you to use the server normally while rebuilding, though usage will affect performance of rebuild, and rebuild will affect performance of usage. Quote Link to comment
trurl Posted August 1, 2022 Share Posted August 1, 2022 I assume rebuild has had plenty of time to complete. How did it go? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.