evans.family.mike Posted September 24, 2022 Share Posted September 24, 2022 Good Morning, I recently discovered that I had a disk that is listed as Not Installed. I have experienced a couple power outages in the last couple of months, not sure if they are related or not. I am also seeing a ton of smart errors and what appear to be critical read errors on another disk. I rebooted and placed the array into maintenance mode and performed S.M.A.R.T. test on the disk with all the errors and started a Read Check of all the drives but ended up stopping it after a few hours as it was only at around 34% and not progressing beyond the disk with all the S.M.A.R.T. and read errors. I'm no longer seeing the docker apps I was running either and before the reboot I was seeing an error that the Docker service was not running. After the reboot I don't see the error and the apps are missing. I'm only configured with a single parity drive so I'm obviously concerned about data loss. I've got 2 new drives arriving tomorrow, but wanted to see if I could get some advice on the plan of attack to hopefully get restored to where I was before. Should I replace the 2 drives before upgrading the OS or should I get the array stabilized beforehand? Do I need to upgrade to a specific version from the RC or can I jump straight to 6.10.2? Also, is there any possibility of recovering the docker apps or, at a minimum, the configuration of said apps? Thank you for your time and advice, Mike unraid-diagnostics-20220924-0913.zip Quote Link to comment
trurl Posted September 24, 2022 Share Posted September 24, 2022 You don't seem to have a pool named cache, but some of your user shares are configured to use that pool. In particular, appdata, domains, system are configured to prefer a pool named cache, and since it doesn't exist, all those shares have been created on the array. Possibly this is what has led up to your docker problems. Anything you can tell us about that? More about your other problems after I have studied diagnostics Quote Link to comment
trurl Posted September 24, 2022 Share Posted September 24, 2022 No SMART report for any disk that might have been assigned as disk1, might have been 8GB serial ending FYZV device sde but isn't in SMART reports now. 1 hour ago, evans.family.mike said: performed S.M.A.R.T. test on the disk with all the errors You didn't say which disk that was. Which disk was it? Do any of your other disks have SMART warnings on the Dashboard page? 1 hour ago, evans.family.mike said: ton of smart errors and what appear to be critical read errors on another disk. Which disks? Maybe it could be figured out from diagnostics but why make us work so hard? 1 hour ago, evans.family.mike said: not progressing beyond the disk with all the S.M.A.R.T. and read errors. ?? 1 hour ago, evans.family.mike said: only configured with a single parity drive so I'm obviously concerned about data loss Looks like only disk1 is disabled, but it is also unmountable, so we will have to repair its filesystem before rebuilding it. Quote Link to comment
evans.family.mike Posted September 24, 2022 Author Share Posted September 24, 2022 53 minutes ago, trurl said: No SMART report for any disk that might have been assigned as disk1, might have been 8GB serial ending FYZV device sde but isn't in SMART reports now. You didn't say which disk that was. Which disk was it? Do any of your other disks have SMART warnings on the Dashboard page? My apologies, I am referring to the disk ending in 5RSND. Yes, both the Disk 2 ending in 5RSND and Disk 3 ending in 4MPD6 53 minutes ago, trurl said: Which disks? Maybe it could be figured out from diagnostics but why make us work so hard? Disk 2 ending in 5RSND. It's SMART Overall Health is listed as failed 53 minutes ago, trurl said: ?? I should have worded it as "progress was extremely slow due to all of the all the S.M.A.R.T. and read errors" 53 minutes ago, trurl said: Looks like only disk1 is disabled, but it is also unmountable, so we will have to repair its filesystem before rebuilding it. Ok, would that be done from the Check Filesystem Status button while in Maintenance Mode? 53 minutes ago, trurl said: Quote Link to comment
evans.family.mike Posted September 24, 2022 Author Share Posted September 24, 2022 1 hour ago, trurl said: You don't seem to have a pool named cache, but some of your user shares are configured to use that pool. In particular, appdata, domains, system are configured to prefer a pool named cache, and since it doesn't exist, all those shares have been created on the array. Possibly this is what has led up to your docker problems. Anything you can tell us about that? More about your other problems after I have studied diagnostics See the attached screenshots of the pool devices I have installed. I guess I did not have them configured properly for use by any of the shares Quote Link to comment
trurl Posted September 26, 2022 Share Posted September 26, 2022 On 9/24/2022 at 12:45 PM, trurl said: No SMART report for any disk that might have been assigned as disk1, might have been 8GB serial ending FYZV device sde but isn't in SMART reports now. On 9/24/2022 at 1:46 PM, evans.family.mike said: My apologies, I am referring to the disk ending in 5RSND. Yes, both the Disk 2 ending in 5RSND and Disk 3 ending in 4MPD6 Disk2 is failing as you mention. Disk3 looks OK for now, just 1 Reported Incorrect, and we are going to need it for getting other disks going again. Still doesn't solve the mystery of missing disk1. Do you still have the disk? Is it still attached? On 9/24/2022 at 1:46 PM, evans.family.mike said: would that be done from the Check Filesystem Status button while in Maintenance Mode? Correct. But... SInce disk2 is really failing, that might be a reason disk1 emulation isn't working well. And disk2 failure would also make it difficult or impossible to rebuild disk1. I think we are going to have to accept disk1 back into the array if you still have it and it still works, so we can try to rebuild disk2 instead. That is a little more complicated and not really documented in the wiki though there are quite a few threads where we have guided users through the process. @evans.family.mikeSorry for the delay in replying. Real life, etc. I thought someone else might have taken this up. Let me know if you are still waiting or if not, what you have done instead. Quote Link to comment
trurl Posted September 26, 2022 Share Posted September 26, 2022 Nothing can move open files. To get those shares moved, you will have to disable Docker and VM Manager in Settings. But let's wait on trying to move anything until your array is stable again. In fact, go ahead and disable Docker and VM Manager in Settings until your array is stable. Since you have gotten into this state, a couple of things I always ask. Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Don't let one unnoticed problem become multiple problems and data loss. Do you have another copy of anything important and irreplaceable? Parity is not a substitute for backups. Quote Link to comment
evans.family.mike Posted September 26, 2022 Author Share Posted September 26, 2022 1 hour ago, trurl said: Disk2 is failing as you mention. Disk3 looks OK for now, just 1 Reported Incorrect, and we are going to need it for getting other disks going again. Still doesn't solve the mystery of missing disk1. Do you still have the disk? Is it still attached? Correct. But... SInce disk2 is really failing, that might be a reason disk1 emulation isn't working well. And disk2 failure would also make it difficult or impossible to rebuild disk1. I think we are going to have to accept disk1 back into the array if you still have it and it still works, so we can try to rebuild disk2 instead. That is a little more complicated and not really documented in the wiki though there are quite a few threads where we have guided users through the process. @evans.family.mikeSorry for the delay in replying. Real life, etc. I thought someone else might have taken this up. Let me know if you are still waiting or if not, what you have done instead. No worries and no need to apologize. I'm just grateful for the assistance Disk 1 is still in the case and should be connected. I have been hesitant to shut the server down to check for fear of Disk 2 permanently going out to lunch. However, if I need to do that I will. Docker service has been disabled. VM Manager wasn't running. I don't have email notifications on as I always keep the GUI open and visible on another screen. That said, I will get that configured. Backups? We don't need no stinking backups... (I mean, I would if it were financially feasible for me to backup 80TB of data) Quote Link to comment
trurl Posted September 26, 2022 Share Posted September 26, 2022 1 hour ago, evans.family.mike said: I would if it were financially feasible for me to backup 80TB of data You get to decide what is 3 hours ago, trurl said: important and irreplaceable I backup only 2TB of my 20+ to external disks which I rotate offsite. Personal files, a large photo collection, and a large music collection all fit in that small space. My video media gets rsynced to my backup server occasionally, and I only have that backup server because I had the hardware leftover after upgrading my main server. It isn't 3 hours ago, trurl said: important and irreplaceable Quote Link to comment
trurl Posted September 26, 2022 Share Posted September 26, 2022 In order to rebuild failed disk2 we need disk1 to be working. Don't reassign it to the array we just need to see its SMART report. Was this disk1? On 9/24/2022 at 12:45 PM, trurl said: 8GB serial ending FYZV Can you see the disk in BIOS? Quote Link to comment
trurl Posted September 26, 2022 Share Posted September 26, 2022 On 9/24/2022 at 11:04 AM, evans.family.mike said: 2 new drives arriving tomorrow Have they arrived? At least 8TB (size of disk1 or disk2) but no larger than 12TB (size of parity). Quote Link to comment
trurl Posted September 26, 2022 Share Posted September 26, 2022 If disk2 weren't already failing, and all you had was disabled / emulated-but-unmountable disk1, then we would repair filesystem on emulated disk1, rebuild disk1 to a spare, keep original disk1 as is in case of problems. But I'm concerned that disk2 might fail so badly that we would be in danger of losing disk2 data while we tried to recover disk1 data. If disk1 SMART is OK probably its data is mostly OK, so it might be OK to not try to rebuild it but accept it back into the array and instead rebuild disk2 to a new disk. I'm going to see if we can get storage expert @JorgeB to give us a 2nd opinion. Quote Link to comment
evans.family.mike Posted September 26, 2022 Author Share Posted September 26, 2022 My apologies for the delayed response. I couldn't see the disk in the BIOS initially after a reboot. I powered off the server and performed a visible inspection to make sure cabling looked ok and then powered it back on. This time I was able to see it in the BIOS. It now shows up in UNRAID and is unassigned The 2 new 12TB disks have arrived Quote Link to comment
JorgeB Posted September 26, 2022 Share Posted September 26, 2022 1 hour ago, trurl said: But I'm concerned that disk2 might fail so badly that we would be in danger of losing disk2 data while we tried to recover disk1 data. If disk1 SMART is OK probably its data is mostly OK, so it might be OK to not try to rebuild it but accept it back into the array and instead rebuild disk2 to a new disk. Agree, lets see a SMART report for disk1. Quote Link to comment
evans.family.mike Posted September 26, 2022 Author Share Posted September 26, 2022 See attached unraid-smart-20220926-1442.zip Quote Link to comment
JorgeB Posted September 26, 2022 Share Posted September 26, 2022 Other than have been run too hot in the past it looks OK, lots of UDMA CRC errors so the SATA cable should be replaced, unless they are old. Quote Link to comment
evans.family.mike Posted September 26, 2022 Author Share Posted September 26, 2022 ok, I will order a replacement cable for that disk. I'm already seeing alerts that Disk 1 is running hot. Hopefully that is an issue with this specific disk and not something I'll see with the replacement disk. Now to get the array back in functioning order so I can replace these disks. Is there a special procedure I need to perform due to the current condition of the array? Quote Link to comment
trurl Posted September 26, 2022 Share Posted September 26, 2022 15 minutes ago, evans.family.mike said: order a replacement cable for that disk Since that is a likely cause of disk1 getting disabled and the disk couldn't be seen, if you don't have any spare cables might be best to wait until you get a new one to proceed. Quote Link to comment
evans.family.mike Posted September 26, 2022 Author Share Posted September 26, 2022 ok, sounds good. The new cables should arrive tomorrow Quote Link to comment
evans.family.mike Posted September 27, 2022 Author Share Posted September 27, 2022 (edited) ok, cables received and replaced for Disk 1. See attached SMART Diagnostic for both Disk 1 and Disk 2 unraid-smart-20220927-1459-Disk2.zip unraid-smart-20220927-1458-Disk1.zip Edited September 27, 2022 by evans.family.mike Added files Quote Link to comment
trurl Posted September 27, 2022 Share Posted September 27, 2022 Post new diagnostics also Quote Link to comment
evans.family.mike Posted September 27, 2022 Author Share Posted September 27, 2022 See attached unraid-diagnostics-20220927-1600.zip Quote Link to comment
trurl Posted September 27, 2022 Share Posted September 27, 2022 On 9/26/2022 at 2:09 PM, evans.family.mike said: 2 new 12TB disks have arrived Do you have enough extra ports to connect one (or both) new disks? Quote Link to comment
evans.family.mike Posted September 27, 2022 Author Share Posted September 27, 2022 I should have a couple of additional SATA ports to connect the disks to. The question is whether or not I have additional power connectors. Checking Quote Link to comment
evans.family.mike Posted September 27, 2022 Author Share Posted September 27, 2022 ok, looks like I have everything I need to connect the 2 new drives Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.