July 28, 201015 yr From time to time my toddler daughter gets into the room the server is in and knocks the eSATA external boxes that 9 of my drives are in. This can cause one or more of my drives to lose the connection and appear as failed. I had a failed drive recently, and since it had been working fine, I assumed it was just a bad connection. I got it reassigned it and did the "trust my parity" on it. All data appeared to be intact, but now no writes or deletions can be carried out on that drive. I have been able to copy anything FROM it, though. The console is showing a: REISERFS error (device md14): vs-4080 _reiserfs_free_block: block 224913xxx: bit already cleared I can't tell how many lines exactly but xxx= 958 to 947 are shown on screen currently. I have enough space on another disk in the array to copy all the contents of this 1 TB drive over. THis would be a back up in case rebuilding the drive fails for some reason. I could then replace the disk, or if anyone suspects it could be fine, maybe I'd do a preclear and see what happens. If all is well I could replace it in the array... or just RMA it. Any advice greatly appreciated.
July 28, 201015 yr From time to time my toddler daughter gets into the room the server is in and knocks the eSATA external boxes that 9 of my drives are in. This can cause one or more of my drives to lose the connection and appear as failed. I had a failed drive recently, and since it had been working fine, I assumed it was just a bad connection. I got it reassigned it and did the "trust my parity" on it. All data appeared to be intact, but now no writes or deletions can be carried out on that drive. I have been able to copy anything FROM it, though. The console is showing a: REISERFS error (device md14): vs-4080 _reiserfs_free_block: block 224913xxx: bit already cleared I can't tell how many lines exactly but xxx= 958 to 947 are shown on screen currently. I have enough space on another disk in the array to copy all the contents of this 1 TB drive over. THis would be a back up in case rebuilding the drive fails for some reason. I could then replace the disk, or if anyone suspects it could be fine, maybe I'd do a preclear and see what happens. If all is well I could replace it in the array... or just RMA it. Any advice greatly appreciated. You need to 1. Get a SMART report on that drive smartctl -d ata -a /dev/sdX where sdX = the device corresponding to disk14. 2. Check the file-system following the procedure in the wiki. The file-system is being made read-only to prevent you from doing damage to it while it is in a corrupted state. Once repaired, odds are it will be fine for many years to come. The procedure is here: http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems Joe L.
July 28, 201015 yr Author Thank you, Joe. I will get these procedures started and post back the results. You (especially) and a few others on these forums deserve a medal.
July 28, 201015 yr smart report looks fine. Odds are the repair of the file-system will be all that is required to get you back to where you can write to the drive once more.
July 28, 201015 yr Author smart report looks fine. Odds are the repair of the file-system will be all that is required to get you back to where you can write to the drive once more. Thanks. REISERFSCK result (last 2 lines): Bad nodes were found, Semantic pass skipped 11 found corruptions can be fixed only when running with --rebuild tree So what now?... there's a lot of red lettering when it comes to running with --rebuild tree in the wiki.
July 28, 201015 yr smart report looks fine. Odds are the repair of the file-system will be all that is required to get you back to where you can write to the drive once more. Thanks. REISERFSCK result (last 2 lines): Bad nodes were found, Semantic pass skipped 11 found corruptions can be fixed only when running with --rebuild tree So what now?... there's a lot of red lettering when it comes to running with --rebuild tree in the wiki. It says to not run it unless it is instructed by a prior run of reiserfsck. You have been instructed. Joe L.
July 28, 201015 yr Author Now I've been instructed twice! I was just wondering whether it would be prudent to copy as much data off the disk as I can first, especially if the rebuild tree can "leave the file system in worse shape than it originally was!" So I started to do it, and then the warning that came up scared me a little. I've decided to see if I can back up as many of the movies and TV shows I have on the faulty disk to one that has space on it before rebuilding the tree.
July 28, 201015 yr Now I've been instructed twice! I was just wondering whether it would be prudent to copy as much data off the disk as I can first, especially if the rebuild tree can "leave the file system in worse shape than it originally was!" If you wish.... It certainly cannot hurt. What will happen with the rebuild-tree is it will create a lost+found directory to put files, parts of files, and directories that it cannot identify. Those same files will probably not be copyable, since they cannot be reached.
July 28, 201015 yr Author Yeah, I figured that might be the case. I may as well try to rebuild the tree now, I guess. So far the files that haven't been copyable are just tag data generated by a program, and only a few kb each.
July 28, 201015 yr Just an FYI You said: , I assumed it was just a bad connection. I got it reassigned it and did the "trust my parity" on it. All data appeared to be intact, but now no writes or deletions can be carried out on that drive. I have been able to copy anything FROM it, though. The data disk was taken out of service because it could not be written. There is a 100% chance that there is a file or directory, or something that is NOT correct. The parity disk was updated however. Let's say you wrote to the disk for hours before discovering it was "red" If you had elected to re-construct the data onto the drive that had not properly been written to it would have had all those files you wrote during those hours. Instead, you elected to "trust" the data disk was correct, and parity wrong. Meaning all the files written would be lost. (Remember, we are certain the data disk is not right, since at least one "write" to it failed. Possibly many "writes" failed.) Next time don't be so quick to use the "trust" procedure. It is more likely than not to get you into situations as you are now. Instead, elect to re-construct the data onto the drive which had gone off-line. It probably would have been correct. 1. Stop the array 2. Un-Assign the disk that became disconnected. 3. Power down 4. Fix the bad connection 5. Power Up 6. Start the array with the disk un-assigned (this will cause unRAID to forget its model/serial number) 7. Stop the array once more 8. Re-assign the disk. (It will treat it as a new replacement, since it forgot its original model/serial number) 9. Start the array one last time. (It will re-construct the contents of the drive back to itself based on parity and the remaining other drives.) When the re-construction is complete you'll have parity protection once more AND all the data that could not be written to the drive when your daughter broke the connection to it.. Joe L.
July 28, 201015 yr Author Thanks, Joe. I normally am more cautious about these things, and would have probably reconstructed the data. However, in this case I know I have all the data that I tried to write to the array since any possible time the write-error occurred. I will, however, be more careful in future. Lesson learned.
Archived
This topic is now archived and is closed to further replies.