GregBobery Posted December 16, 2021 Share Posted December 16, 2021 So i woke up the other day to a warning about an unmountable disk, I log into the UI and Disk 1 is showing Unmountable no filesystem. I have seen this before and ran a file system check with option -L which brought the disk back online, i then ran a parity check which found and corrected about 800 errors. It is now a few days later and i have started to notice some of my files are missing. Now i have daily revisions back up off site, so i am not so worried about lost files I'm just at a loss as to why my disk has done this (this is the second time) and why i have lost files in the process and if so why did the parity not rebuild and fix this? Unraid 6.9.2 I have attached Diagnostics. GregBobery-diagnostics-20211217-0639.zip Quote Link to comment
trurl Posted December 16, 2021 Share Posted December 16, 2021 11 minutes ago, GregBobery said: ran a file system check with option -L which brought the disk back online, i then ran a parity check which found and corrected about 800 errors. How exactly did you do the filesystem repair? If you did it from the webUI then the correct device would have been used and parity maintained. If you did it from the command line, and didn't repair the md device, then parity would have been invalidated. You can check your lost+found share to see if there is anything there you can figure out yourself. That is where repair put the stuff it couldn't figure out. 14 minutes ago, GregBobery said: why did the parity not rebuild and fix this You didn't mention rebuilding so I assumed you didn't do that. Parity typically can't fix corruption since it should be in sync with whatever is on the disk, including the corruption. Quote Link to comment
GregBobery Posted December 16, 2021 Author Share Posted December 16, 2021 1 minute ago, trurl said: How exactly did you do the filesystem repair? If you did it from the webUI then the correct device would have been used and parity maintained. If you did it from the command line, and didn't repair the md device, then parity would have been invalidated. You can check your lost+found share to see if there is anything there you can figure out yourself. That is where repair put the stuff it couldn't figure out. Yes i ran it through the web UI with option -L I checked the Lost+Found Folder and its just full of random folders with nothing in them 1 minute ago, trurl said: You didn't mention rebuilding so I assumed you didn't do that. Parity typically can't fix corruption since it should be in sync with whatever is on the disk, including the corruption. So essentially i have to a restore from offsite to get my files back? Also, why is this happening. Its the second time. Quote Link to comment
trurl Posted December 16, 2021 Share Posted December 16, 2021 Looks like you must have rebooted since I didn't see any parity check in the log. Can't tell anything about what happened before the reboot. Quote Link to comment
GregBobery Posted December 16, 2021 Author Share Posted December 16, 2021 1 minute ago, trurl said: Looks like you must have rebooted since I didn't see any parity check in the log. Can't tell anything about what happened before the reboot. Well shit I did do a clean reboot last night. Is there anyway to get the logs from before that? Quote Link to comment
itimpi Posted December 16, 2021 Share Posted December 16, 2021 2 minutes ago, GregBobery said: Is there anyway to get the logs from before that? Unfortunately not as by default logs are only in RAM and thus lost on reboot. If you want persistent logs then you can set up the syslog server. Quote Link to comment
trurl Posted December 16, 2021 Share Posted December 16, 2021 If you had syslog server setup syslog would have been saved wherever you configured it. Quote Link to comment
GregBobery Posted December 16, 2021 Author Share Posted December 16, 2021 I dont, do you think its worth setting up? So is there anything you can think of that would cause a disk to just drop like that? Quote Link to comment
trurl Posted December 16, 2021 Share Posted December 16, 2021 1 minute ago, GregBobery said: cause a disk to just drop like that Do you mean the disk had actually disconnected? That would usually cause it to become disabled and require rebuild. Didn't notice any controller incompatibilities that might be involved. If we had syslog we might have been able to see I/O errors, such as a bad connection. Some kinds of I/O errors might ultimately result in corruption. I didn't notice problems with any SMART attributes on any disk, except 1 CRC error on one disk, which indicates some connection issue at some time in the past. Quote Link to comment
GregBobery Posted December 16, 2021 Author Share Posted December 16, 2021 Sorry I mean why would it just suddenly show as unmountable out of now where Quote Link to comment
trurl Posted December 16, 2021 Share Posted December 16, 2021 19 minutes ago, GregBobery said: why would it just suddenly show as unmountable out of now where 26 minutes ago, trurl said: Some kinds of I/O errors might ultimately result in corruption Quote Link to comment
GregBobery Posted December 16, 2021 Author Share Posted December 16, 2021 Whats the fix for this? Quote Link to comment
trurl Posted December 16, 2021 Share Posted December 16, 2021 This can happen on any system of course. Have you never had to checkdisk on Windows, for example? Just now, GregBobery said: Whats the fix for this? Depends on the cause. If things are working well, the Errors column on Main should always be zero for every disk. If not you should investigate. Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Quote Link to comment
GregBobery Posted December 16, 2021 Author Share Posted December 16, 2021 Thats how i found out about the unmountable disk. But this is the second time this has happened. last time i didnt lose any data that i could tell though Quote Link to comment
GregBobery Posted December 19, 2021 Author Share Posted December 19, 2021 (edited) On 12/17/2021 at 7:57 AM, trurl said: This can happen on any system of course. Have you never had to checkdisk on Windows, for example? Depends on the cause. If things are working well, the Errors column on Main should always be zero for every disk. If not you should investigate. Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Okay so i have restored all my files and everything seems to be working fine, however last night i got a couple of warning notifications in regards to some SMART errors (See Below). So am i to assume this drive is going to shit itself again, should i look into something like unbalance to move everything to a different drive before this happens? I ran an extended test see results attached ST4000DM004-2CV104_ZTT088XX-20211220-1828.txt Edited December 20, 2021 by GregBobery Quote Link to comment
JorgeB Posted December 20, 2021 Share Posted December 20, 2021 SMART test failed, disk should be replaced. Quote Link to comment
GregBobery Posted December 20, 2021 Author Share Posted December 20, 2021 Just now, JorgeB said: SMART test failed, disk should be replaced. Im not at home right now to do that but I have two other disks in the array, should I use unbalance or the like to shift everything to one of the other disks? Quote Link to comment
JorgeB Posted December 20, 2021 Share Posted December 20, 2021 Not needed, and likely would fail since the disk is bad, replacing the disk will rebuild the data to the new disk. Quote Link to comment
GregBobery Posted December 20, 2021 Author Share Posted December 20, 2021 1 minute ago, JorgeB said: Not needed, and likely would fail since the disk is bad, replacing the disk will rebuild the data to the new disk. Any measures I can put in place to try and help the bad drive limp through until I can get hands on to replace it? Quote Link to comment
JorgeB Posted December 20, 2021 Share Posted December 20, 2021 That's risky since if another drives fails you'll be in trouble, the only thing you can do is to exclude it from the shares so no new data goes there, and also avoid reading from it. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.