May 23, 201016 yr I had a single parity disk (1TB) and a single data disk (1TB). I filled that disk almost all the way up (99% full). I added a new Samsung 1TB data disk recently, formatted it, restarted the array. Seemed to be fine (though I'm still having flaky booting - see my other posts on that). I started adding more media files, which spilled over onto the new disk. At some point, I had a problem talking to the server while playing back a file, so I restarted it. When I finally got it to come up, it didn't recognize the new disk. I panicked a bit... tried restarting a few times, and at one point when it came up, the files had magically re-appeared. I'm not sure how. But something was still off and I rebooted again... can't recall why, but I did. And then I lost the disk again, and no amount of rebooting was helping. I decided to back-rev the OS from 4.5 to 4.4.2, because some friends of mine claimed that 4.5.x was quirky. No joy. Then I went the other way and updated to 4.5.4. Still no joy. The OS "cleared" the new disk, which took a very long time (an hour or two). Now, it appears unformatted. This is where things stand right now. Here's a screen shot below. Is my data still preserved in the parity drive somehow? If so, how do I use that to properly initialize the "new" drive and regain my data? The data I lost can be replaced, but it took a long time to get my DVDs ripped... probably 10-15 of them.
May 24, 201016 yr I had a single parity disk (1TB) and a single data disk (1TB). I filled that disk almost all the way up (99% full). I added a new Samsung 1TB data disk recently, formatted it, restarted the array. Seemed to be fine (though I'm still having flaky booting - see my other posts on that). I started adding more media files, which spilled over onto the new disk. At some point, I had a problem talking to the server while playing back a file, so I restarted it. When I finally got it to come up, it didn't recognize the new disk. I panicked a bit... tried restarting a few times, and at one point when it came up, the files had magically re-appeared. I'm not sure how. But something was still off and I rebooted again... can't recall why, but I did. And then I lost the disk again, and no amount of rebooting was helping. I decided to back-rev the OS from 4.5 to 4.4.2, because some friends of mine claimed that 4.5.x was quirky. No joy. Then I went the other way and updated to 4.5.4. Still no joy. The OS "cleared" the new disk, which took a very long time (an hour or two). Now, it appears unformatted. This is where things stand right now. Here's a screen shot below. Is my data still preserved in the parity drive somehow? If so, how do I use that to properly initialize the "new" drive and regain my data? The data I lost can be replaced, but it took a long time to get my DVDs ripped... probably 10-15 of them. If the drive was cleared, zeros were written to its entire contents, and the parity calculations kept in sync, with it now knowing of the clearing too. The drive is waiting to be formatted. It it was cleared, your files on it are now erased, gone, replaced by all zero bytes. There is no "recovery" at this point. You asked the unRAID server to clear the newly added drive, and it did. The only way to add a drive with existing data is to set a new initial configuration. That is done by issuing (in 4.5.4 the initconfig command on the command line after assigning a drive with existing data but before pressing "Start".) On earlier releases, prior to 4.5.4, you would need to press the button labeled as "restore" in between the assignment of a drive with data on it and the pressing of the "Start" button. Your friends were partially correct, but going back and forth the way you did between releases with missing drives has gotten you into trouble. The only way a disk would be considered "new" is if you pressed "restore" at some point, or un-assigned it, then re-assigned it after pressing "restore", or completely replaced your old config folder deleting your old disk.cfg and super.dat files. If disk2 was cleared, and by your description of several hours time, I'm pretty certain it was, you can check the box under format, format it, and re-rip your DVDs. Now that you are on 4.5.4, stay on it. It works well and has greatly improved performance over the earlier versions. Ripping 10 or 15 DVDs is painful, but not as bad as if you had filled a 2TB disk, then it might have been 200. For disks to appear and disappear, you likely had loose cables, and you may still have loose cables. Post a syslog if you have questions. Disks do not just get "lost" and the method used to get them back is critical, otherwise, they will be cleared, exactly as you requested. (you pressed "Start" when the screen specifically said: "Start" will record the new disk information and bring the expanded array on-line. All new disks which have not been factory-erased will be cleared first; and, the array will be available after the clear completes. This process takes time, but the array remains protected at all times. Caution: any data on the new disk(s) will be erased! If you want to preserve the data on the new disk(s), reset the array configuration and rebuild parity instead.) Joe L.
May 24, 201016 yr Author Thanks for the info. I should clarify that the disk was never lost - it just went to unassigned or whatever - it was no longer in the list. So, I stopped the array and assigned it. I think that would rule out the cable connection. So, why would a disk all of a sudden fall out of the configuration? See my other posts, particularly about empty directories and booting issues. Maybe there's something wrong with the flash drive? Maybe a mobo/BIOS issue? Are the hidden Mac files causing grief? Or the weird file permissions? It's upsetting that the box won't reliably reboot. I've really been finding unraid frustrating so far. I'm a software engineer, and it just seems to me that more could be done to make this more bullet-proof. Why should a function called "start" ever zero-out a drive? Shouldn't that be a separate, explicit function? Also, when I brought that disk back into the configured set of drives, couldn't unraid have figured out that the data on there was correct (ie, that it created exact parity with the other drive)? This server will house very important data (or else, what would be the point of unraiding it?). It seems like it should be a lot harder to accidentally wipe out a drive via the UI. Don't mean to trash the product - it really fits my needs and I'm sure a lot of hard work has gone into it. I just wish it had some more safeguards and error recovery built in. I've attached my syslog... I'd be curious if you could see anything in there that might help me understand what happened. syslog.txt
May 24, 201016 yr Author I'm going to try putting some data back on the second drive again, and then do a handful of reboots. If this thing is going to fail again, I want to know about it now. If this happens again - if I reboot and the second disk no longer shows up in the configured list - what should I do (besides posting a syslog here)? What sequence of steps should I perform to get this disk, with data on it, back into the array - without losing that data?
May 24, 201016 yr I'm going to try putting some data back on the second drive again, and then do a handful of reboots. If this thing is going to fail again, I want to know about it now. If this happens again - if I reboot and the second disk no longer shows up in the configured list - what should I do (besides posting a syslog here)? What sequence of steps should I perform to get this disk, with data on it, back into the array - without losing that data? First, post a syslog. Now, eventually a disk in your array will fail for real. If one is failing now intermittently, you really need to find and fix the cause, but the only way to know how it is failing is to look in the syslog after it has failed, but before you reboot. (let's say a cable was loose, a "write" to the drive failed, so it was taken off-line. You've powered down, fixed the cable, and powered back up.) The unRAID software expects a failed drive to be replaced with a different drive as the replacement. If the actual drive had failed, all you would need to do is: Stop the array Power down Replace the failed disk with a new one of at least equal size, but not bigger than the parity disk. Power up, the array will notice the change in drives and not start automatically. It will wait for you to press "Start" to begin the process of re-constructing the old contents to the new replacement drive by reading parity and all the other data disks. If you just had a loose cable, unRAID will not see a change in drive model/serial number, so it will not put the old drive back online. You must force it to forget the old drive's model/serial number first. To do that you Stop the array Go to the "Devices" page and un-assign the failed drive Go back to the main page and "Start" the array with the failed drive un-assigned. This will cause it to forget the model/serial number of the failed drive. Stop the array once more. Go back to the devices page and re-assign the drive that had the loose cable to its original slot in the array. It will now think it is a different model/serial (since we made it forget it existed when we started the array with it un-assigned earlier) Go back to the main page and "Start" the array. unRAID will then begin the process of re-constructing the contents of the failed drive onto its "replacement" You will be protected by parity once more when the re-construction is complete. You will see mention in these forums of a "trust-my-parity" procedure where you can get parity protection to be assumed as correct. This, I feel, in many situations is the wrong solution. Let me explain. Let's say you were writing the last chapter of your soon-to-be-published novel, and when you pressed "Save File" on your editor the "write" to your second drive failed (bad power cable connection) The data would not be written to the failing disk, the disk would be taken off-line by unRAID. The file would however be correctly written to parity, and it in combination with the other data drives would be used to simulate the existence of the last chapter of your book. Now, if you tell unRAID to trust that parity is good, and that disk2 is good, and force the error to be ignored, the NEXT time you perform a parity check the actual contents of disk2 (without your last chapter) will be used to correct the parity calculations... erasing the fact you ever wrote it. Only use the "trust" procedure if you were not writing anything important to the disk at the time that it was taken off-line, otherwise, that important data (file/movie/music.etc) will be lost. Joe L.
May 24, 201016 yr Author Thanks, Joe, for all the advice. I've loaded some new data on the second disk and I'm going to try to reboot a handful of times. I just stopped the array... it's been "stopping" for a while now. I've attached a syslog zip, if you want to take a look. syslog2.txt.zip
May 24, 201016 yr Author This doesn't look good... May 24 08:03:27 Tower last message repeated 2 times May 24 08:03:27 Tower kernel: FAT: FAT read failed (blocknr 674) May 24 08:03:27 Tower kernel: sd 0:0:0:0: rejecting I/O to offline device May 24 08:04:27 Tower unmenu[1327]: df: `/boot': Input/output error May 24 08:04:27 Tower kernel: sd 0:0:0:0: rejecting I/O to offline device May 24 08:04:27 Tower kernel: sd 0:0:0:0: rejecting I/O to offline device
May 24, 201016 yr Thanks, Joe, for all the advice. I've loaded some new data on the second disk and I'm going to try to reboot a handful of times. I just stopped the array... it's been "stopping" for a while now. I've attached a syslog zip, if you want to take a look. The array is unable to stop since you have some process that is still accessing the user-shares. It could be a process with a file open (reading or writing to it) Or, it could be a process you started while cd'd to a folder under /mnt/user or, it could be you've cd'd to a folder under /mnt/user, and it is your login shell that has the file-system busy. You can type /usr/bin/fuser -mv /mnt/disk* /mnt/user/* To see the processes keeping your file system busy. Once those processes are stopped, the array will stop.
May 24, 201016 yr Author Wow... nice call. :-) I was telnet'd in. I just assumed it would force me out. That did it - as soon as I killed the telnet session, it showed as stopped. Now I'm back to my booting problems, it seems (link below). I had to reboot the box 3-4 times before it finally came all the way up. And since it's sitting in a closet with no head on it, I can't tell what's going on, other than by the beeping (or lack thereof). (It would REALLY be nice if the OS could somehow incorporate more bell sequences in the startup stuff, so that you could get an idea of the progress without having to have a monitor attached.) http://lime-technology.com/forum/index.php?topic=6323 Now that it's back up, it seems to be running fine. I don't see any errors in the syslog (attached). Does the server save previous syslogs anywhere, or can you only ever see the current one? (That would be another nice enhancement... save the last N syslogs, configurable.) syslog3.txt.zip
May 24, 201016 yr unRAID actually does send some of it's own system beeps. I haven't seen any official documentation about it (since this is a relatively new feature, circa 4.5.3 I think), but what I've noticed on my server is: A single beep when unRAID first starts to load A quick double beep when unRAID is ready (and your shares/web management page are accessible from another computer) A slow double beep when unRAID is shutting down A single beep when the network cable is unplugged (or the network goes down, etc.) A single beep when the network cable is plugged back in (or the network comes back up, etc.) These are all from memory, so they may not be 100% correct. The quick double beep when the array is ready is the most useful, in my opinion. I also run my server headless, so the more audible info the better.
Archived
This topic is now archived and is closed to further replies.