December 14, 20169 yr I just put in a new SATA controller, so to test it, I put in a second parity drive and rebooted (the cache, first parity, and four data drives are on the motherboard not on this controller). Parity checked OK, so I redid the parity check again, without "write corrections to disk" selected. That went OK. The second parity check finished yesterday. This appears to mean the SATA controller is OK as is the second parity drive. This morning, I could not access the server from a Windows computer. I turned on the local monitor, logged in as root, then ran a shutdown command. I got info saying it was going to shutdown, but nothing happened. I waited a while, then I briefly pressed the power button. I got the same info on the screen, saying it was going to shutdown. Again, nothing happened after a while. So, I finally performed a hard reboot of the server. Now, I have sync errors and lost files that I had probably 8 hours of work into. They're simply gone. They may have been on the cache, and the system was trying to copy the cache? Anyway, this is what my system currently says: Parity-Check in progress. Cancel will stop the Parity-Check. Total size: 3 TB Elapsed time: 9 hours, 18 minutes Current position: 2.66 TB (88.5 %) Estimated speed: 75.7 MB/sec Estimated finish: 1 hour, 16 minutes Sync errors detected: 5 I have no idea where the sync errors are and if I can recover from them. Here's the part of the syslog I think applies: Dec 14 07:42:01 Data kernel: mdcmd (42): check nocorrect Dec 14 07:42:01 Data kernel: md: recovery thread: check P Q ... Dec 14 07:42:02 Data kernel: md: using 1536k window, over a total of 2930266532 blocks. Dec 14 07:42:14 Data kernel: docker0: port 1(veth0ef4c4a) entered forwarding state Dec 14 07:46:57 Data ntpd[1501]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized Dec 14 09:54:53 Data kernel: md: recovery thread: Q incorrect, sector=1565565768 Dec 14 09:54:53 Data kernel: md: recovery thread: Q incorrect, sector=1565565776 Dec 14 09:54:53 Data kernel: md: recovery thread: Q incorrect, sector=1565565784 Dec 14 09:54:53 Data kernel: md: recovery thread: Q incorrect, sector=1565565792 Dec 14 09:54:53 Data kernel: md: recovery thread: Q incorrect, sector=1565565800 Dec 14 13:31:09 Data kernel: mdcmd (43): spindown 1 Which drive is this? The cache drive? Can I recover anything? Thank you.
December 14, 20169 yr Community Expert A few sync errors are perfectly normal after an unclean shutdown, many more if you were copying data to server when it shutdown, files disappearing it's not normal, but you could have some filesystem corruption, so first thing is to check filesystem on all data disks + cache.
December 14, 20169 yr Author Will do. From what I see here, I have to put the system in maintenance mode, which means I have to wait until the parity check is done (less than an hour now): https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems I'll check following those instructions and report back.
December 15, 20169 yr Author Crap. The files are gone. About 8 hours of work, simply disappeared. The only thing I can think of is they were on the cache and the cache was being copied when I turned off the system. However, nothing was happening, even after I tried to shut it down -- I had to turn it off. Should I not use the cache? Would that have possibly presented this? This is the first time I've lost data on my unraid system in the 8+ years I've had it. I don't trust it anymore. I'm now backing up nightly (used to be once per week). I may turn off the cache for safety reasons. By the way, I was able to run multiple disk checks at the same time, using the web interface and opening up different windows, one for each drive. Unfortunately, none of the drives had errors.
December 15, 20169 yr Author OK, I took the cache out. I'm working from home right now, and I cannot afford to lose any more data. What happens to the plex media server, which wants to use the cache?
December 15, 20169 yr Community Expert I think it's unlikely that cache is the reason for your "lost" files, but I don't know enough details about what you were doing to suggest a better idea. And I don't know enough details about how your particular implementation of plex is using cache.
December 15, 20169 yr When you had a keyboard and monitor attached, trying to shut it down, would have been a good time to type diagnostics in order to give people something to work with. Without it it's just guesswork. If you have some application that needs access to the cache but you've removed it, then obviously the application isn't going to work properly so it would be best to stop it. Did you do a file system check on all disks? It isn't clear whether you did or not. I would put everything back as it was and try to start the array. Then grab diagnostics and attach them to your next post. They might show something that helps.
December 15, 20169 yr Author Thanks, John. I'll try the diagnostics next time. The problem I find with unraid is that I set it up and use it, then don't do anything with it for years. I had to search, for instance, on how to login to the system using the console. Every time I want to telnet in, it takes me 45 minutes to figure out how to do that. I was able through searching to figure out how to shutdown the system from the console, and that took a while as there are different ways of doing it for different versions of unraid. See this thread for instance: http://lime-technology.com/forum/index.php?topic=2781.0 The powerdown script is apparently installed somewhere in versions 5 or 6, or at least I was able to type "powerdown" on the console and get a message that the system was powering down (though nothing else happened and the system did not powerdown). You'll note that thread is 8 years old. Unraid has changed versions multiple times since then. I have a hard time deciphering what's applicable or not applicable to my system (6.2.4). For someone like me, who sets up unraid at a certain point, then never looks at it again, how am I supposed to know that I can type in "diagnostics" and get info? I've only had to hard reset this system twice that I can remember in 8+ years of owning it. I've never lost data until now. I am currently working from home and had just worked over 6.5 hours the day before on the data (two files) that were lost. I was "going" to work and had to access that data. Unraid was unresponsive. Basically, I searched for "unraid how to shutdown from console" and got the idea to try "powerdown" on the console. I did not see there was a "diagnostics" anywhere and honestly I thought unraid would keep a portion of the log from the previous session (why on earth does it start with a new log?). I have a light controller called ISY994, and I just deleted several YEARS worth of logs on it. I had just assumed that unraid would do the same. But it apparently does not. I did run a file system check on all disks. No errors. So, what I did was put all my important data on my own share and removed cache access for that share. I enabled the cache again for the less important shares. I will constantly back up my data, possibly several times per day if I can get my windows machines to do so (why windows 7 home won't let me set up a task without a password, I'm still working on that one).' I sincerely do appreciate your help, but you have to understand that we're not all linux experts, and many of us are like me, where we use unraid as a tool (a NAS) and need to have access to good information that's clear. I did the best searching I could do under the conditions at hand and thought I was doing the correct thing. I had no idea I would lose data. I had no idea commands such as "diagnostics" existed. I was barely able to find "powerdown" and had no idea whether the script was installed or not. And that thread above is the first thread that comes up for my search, and provided a solution of shutdown, so that's what I tried.
December 15, 20169 yr Community Expert Since you are a "set it and forget it" user, it is absolutely critical that you have Notifications setup. Do you?
December 15, 20169 yr Also, if you still want to use a cache drive, but would like a little more redundancy, you can create a cache pool where your cache drives are mirrored. You need multiple drives set as cache, and basically the data is duplicated. Now I'm honestly not sure what would happen in your case. If the lost data *was* caused by cache drive write issues. Would a cache pool have helped? Not sure... perhaps someone more knowledgeable could answer? . Personally, I just write to directly to the array for my important stuff (work, pictures etc). The cache drive is used for VMs, movies etc... More info on cache pools can be found here HTH!
Archived
This topic is now archived and is closed to further replies.