markswam Posted May 6, 2020 Author Share Posted May 6, 2020 So...yes, then? Just upload all data to a freshly formatted disk? I just want to make absolutely sure I understand what you're saying so I don't wind up doing something dumb. Quote Link to comment
JorgeB Posted May 6, 2020 Share Posted May 6, 2020 Backup any data still on that disk, format it, restore data or Move data from that disk to other array disks if there's space, format it, continue using it normally. Quote Link to comment
markswam Posted May 6, 2020 Author Share Posted May 6, 2020 Gotcha, understood. Quote Link to comment
markswam Posted May 10, 2020 Author Share Posted May 10, 2020 Hey, me again. Sorry if I'm being irritating. I've just got one more question. I've managed to pull all of the data off of the corrupted disk and onto my local machine, so I just want to make sure I understand my next steps before continuing. Would the best order of operations be: Mount the corrupted disk using UD and delete all the data off of it Leave the corrupted disk unassigned Create a New Config (obviously being careful to get the parity drive assignment right) Allow the parity rebuild to complete Stop the array Assign the corrupted disk to its new slot Reformat the corrupted disk Start the array and proceed as normal Or am I getting that wrong/have extraneous/missed steps? Quote Link to comment
itimpi Posted May 10, 2020 Share Posted May 10, 2020 Much simpler (and faster) would be to follow the procedure for Reformatting the drive, although you may then want to run a parity check in case at some point you have managed to get parity out of sync with your current drives. Quote Link to comment
markswam Posted May 10, 2020 Author Share Posted May 10, 2020 Right on, thank you. I'll try doing that in the morning. Quote Link to comment
markswam Posted July 19, 2020 Author Share Posted July 19, 2020 (edited) Well, it's somehow managed to happen two more times, to the same drive. And I have no idea how, since both times literally nothing got written to that drive in between recoveries. Everything was being written to the next disk in the array (disk 6), and it just suddenly failed, and disks 1, 2, and 5 all showed hundreds of thousands of errors in the "Main" tab of the GUI (but neither 1 nor 2 show up as "unmountable: no file system"). Last week, every single drive passed an extended SMART test with absolutely no problems reported, so I don't think there's a physical issue with any of the drives. I've also swapped out all of the SATA cables with brand-new ones in the last month. At this point, I'm not even losing any data, I'm just getting annoyed. Would it be prudent to go through every drive in the array, move the data off, reformat, and move the data back onto all of them? I'm completely out of ideas otherwise, and just want to stop this monthly ritual of pulling all my shit off a "corrupt" drive and reformatting it. Also: This only ever seems to happen when I invoke the mover while I have a large number of small files (think a couple thousand video frames extracted as photos by ffmpeg) on the cache drive. I have no idea why that would cause an issue, but it seems to be the only constant. Edited July 19, 2020 by markswam Quote Link to comment
itimpi Posted July 19, 2020 Share Posted July 19, 2020 If you suddenly get errors on multiple drives it is likely that the issue is controller related. Posting your system’s diagnostic zip file (obtained via Tools -> Diagnostics) after this has happened might allow us to confirm that this is what appears to have happened. Quote Link to comment
markswam Posted July 19, 2020 Author Share Posted July 19, 2020 (edited) I unfortunately don't have a diagnostic from the more recent incidents, but I did post a diagnostic the first time this failure occurred. Would that be helpful in any way, since it's likely the same failure mode? Edited July 19, 2020 by markswam Quote Link to comment
itimpi Posted July 19, 2020 Share Posted July 19, 2020 1 hour ago, markswam said: I unfortunately don't have a diagnostic from the more recent incidents, but I did post a diagnostic the first time this failure occurred. Would that be helpful in any way, since it's likely the same failure mode? Not sure they will help unfortunately. You really need to get them again the next time this issue happens. You also want to get onto the latest version of Unraid (6.8.3). You mention your last attempt had problems but I suspect that was something specific to your system. Support of older versions is always going to be a bit hit-and-miss as we forget any oddities an old release might exhibit. Quote Link to comment
markswam Posted July 19, 2020 Author Share Posted July 19, 2020 Oh, I probably should have mentioned that earlier. I upgraded to the 6.8.3 Nvidia build after the recovery back in May, so that's up to date at least. Part of me thinks this might be the fault of the PCI-E SATA card I've got in my system (I know my Cache is plugged into that. Not sure which other drive is plugged into it, but I know there's two) since my motherboard doesn't have enough SATA ports to support the 8 drives I've got in the system. Maybe I should ditch that card entirely and just buy a new motherboard with enough SATA ports... Quote Link to comment
markswam Posted July 21, 2020 Author Share Posted July 21, 2020 Well, it's happened again. Here is the diagnostic zip as requested. Hopefully this can help finally figure out why this keeps happening. I literally didn't even get done moving data back onto the disk this time. tower-diagnostics-20200721-1018.zip Quote Link to comment
trurl Posted July 21, 2020 Share Posted July 21, 2020 Can you get diagnostics with array started? Lots of things diagnostics can't tell us with the array stopped. Quote Link to comment
markswam Posted July 21, 2020 Author Share Posted July 21, 2020 (edited) sigh dammit, I rebooted after I pulled the diagnostics so I could re-seat my SATA card. Guess it's time to wait for this failure to happen again... Edited July 21, 2020 by markswam Quote Link to comment
trurl Posted July 21, 2020 Share Posted July 21, 2020 You were getting errors on disks 1,2. Are these on the same controller? Quote Link to comment
JorgeB Posted July 21, 2020 Share Posted July 21, 2020 You should disable the mover logging so it doesn't spam the syslog. Quote Link to comment
markswam Posted July 21, 2020 Author Share Posted July 21, 2020 Well, I may have royally screwed up. I rebooted the machine again after checking to see what disks were in what slots, and now I can't access it via the web anymore. I can log in fine locally, but it only shows my cache and disks 1-4 mounted. I discovered that Disks 1-5 and my cache are all plugged into my motherboard, while my parity drive and Disk 6 are plugged into the external SATA card. So yes, 1,2, and 5 should all be on the same controller. Quote Link to comment
markswam Posted July 21, 2020 Author Share Posted July 21, 2020 (edited) Update: The web UI is accessible again, but now it's showing both Disk 5 and Disk 6 as unmountable with no filesystem. Hooray, I guess. Time to mount Disk 6 with Unassigned Devices, pull data down off of it, and then format both of those disks and start re-uploading stuff. Thankfully I had already pulled all of the data down off Disk 5... Edited July 21, 2020 by markswam Quote Link to comment
markswam Posted July 21, 2020 Author Share Posted July 21, 2020 Further update: I cannot use Unassigned Devices, because the web UI is only accessible in safe mode. Quote Link to comment
markswam Posted July 21, 2020 Author Share Posted July 21, 2020 Update again: I managed to get to the web UI in a normal boot, so I should be able to pull down the data that I need. So at least that crisis has been averted. Please forgive me, I’m kind of a stressful person. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.