November 18, 200916 yr Sometime last night the server became unresponsive. No way to access it via the browser or SSH, and no data going through, would not answer a ping either. I power cycled it and it came up with array stopped. I had a print out of the drives's assignment, and the position was right for each drive, parity and cache included. I restarted the array, and after a very intense minute all drives went from "unformatted" to show their actual size / usage. The data seems fine, and parity-synch is in progress. The crash occurred in the middle of a copy. I am attaching this morning's syslog in hope that someone may glean some insight of how this happened. I upgraded to 4.5b10 from 4.5b8 about 3 days ago. To upgrade I just replaced bzimage and bzroot, as per Joe L's thread. I run a parity check just 2 days ago with 0 errors. Luca edit: grammar
November 18, 200916 yr luka, 4.5b10 was pulled due to some bugs. Strongly reccomend you upgrade to 4.5b11 Also, it won't hurt if you run a memory test on your sever for awhile.
November 18, 200916 yr Author luka, 4.5b10 was pulled due to some bugs. Strongly reccomend you upgrade to 4.5b11 Also, it won't hurt if you run a memory test on your sever for awhile. Ah, thanks. I found the announcements forum, have to start reading that as well. Luca
November 18, 200916 yr Not much to 'glean', more than you already know. There was a crash, and there was disk write activity happening, as you have said, but there is no evidence as to what went wrong. Everything looks fine now, with parity rebuilding, because the super.dat was corrupted. The Reiser file system is a journaling system, so even the last activity was completed as far as it could, by replaying unfinished transactions. The file systems should be fine, but file transfers that were interrupted may have resulted in corrupted or incomplete files, and should be checked. The drives with the most transactions replayed are those that probably had the last activity, where the last files were saved, so that would provide a starting point to locating suspect files. * Disk 13 : 17 transactions * Disk 11 : 175 transactions * Disk 3 : 224 transactions * Disk 6 : 285 transactions * Cache Disk : 555 transactions Do not try to correlate the number of transactions with the number of files or the size of files. There are a lot of background transactions involved in saving any file, and maintaining the file system b-trees. The crash could be (and probably is) because of the beta10 bug, but I cannot say. If you have no further problems, then blame the beta.
November 19, 200916 yr Author Not much to 'glean', more than you already know. There was a crash, and there was disk write activity happening, as you have said, but there is no evidence as to what went wrong. Everything looks fine now, with parity rebuilding, because the super.dat was corrupted. The Reiser file system is a journaling system, so even the last activity was completed as far as it could, by replaying unfinished transactions. The file systems should be fine, but file transfers that were interrupted may have resulted in corrupted or incomplete files, and should be checked. The drives with the most transactions replayed are those that probably had the last activity, where the last files were saved, so that would provide a starting point to locating suspect files. * Disk 13 : 17 transactions * Disk 11 : 175 transactions * Disk 3 : 224 transactions * Disk 6 : 285 transactions * Cache Disk : 555 transactions Do not try to correlate the number of transactions with the number of files or the size of files. There are a lot of background transactions involved in saving any file, and maintaining the file system b-trees. The crash could be (and probably is) because of the beta10 bug, but I cannot say. If you have no further problems, then blame the beta. Thanks Rob. I wonder if I could use something like an MD5 checksum utility to verify the integrity of the files, though I have no idea how to do that in unix. I have lots of spare CPU power on the server, and part of it could be used to generate the checksums. Stupid idea?
November 19, 200916 yr Not much to 'glean', more than you already know. There was a crash, and there was disk write activity happening, as you have said, but there is no evidence as to what went wrong. Everything looks fine now, with parity rebuilding, because the super.dat was corrupted. The Reiser file system is a journaling system, so even the last activity was completed as far as it could, by replaying unfinished transactions. The file systems should be fine, but file transfers that were interrupted may have resulted in corrupted or incomplete files, and should be checked. The drives with the most transactions replayed are those that probably had the last activity, where the last files were saved, so that would provide a starting point to locating suspect files. * Disk 13 : 17 transactions * Disk 11 : 175 transactions * Disk 3 : 224 transactions * Disk 6 : 285 transactions * Cache Disk : 555 transactions Do not try to correlate the number of transactions with the number of files or the size of files. There are a lot of background transactions involved in saving any file, and maintaining the file system b-trees. The crash could be (and probably is) because of the beta10 bug, but I cannot say. If you have no further problems, then blame the beta. Thanks Rob. I wonder if I could use something like an MD5 checksum utility to verify the integrity of the files, though I have no idea how to do that in unix. I have lots of spare CPU power on the server, and part of it could be used to generate the checksums. Stupid idea? In linux (and unRAID) to calculate an md5 checksum on a file you would type md5sum filename To do it on all files under the user shares type find /mnt/user/ -type f -exec md5sum {} \; | tee /boot/md5_file_checksums.txt That command will compute an md5 checksum for every one of your files under the user-shares and also put the output in a file on your flash drive. If you have not enabled user-shares, use find /mnt/disk* -type f -exec md5sum {} \; | tee /boot/md5_file_checksums.txt If you have lots of files it will take many many hours to run. You will be able to follow its progress on the screen. Joe L.
November 19, 200916 yr Author Thanks Rob. I wonder if I could use something like an MD5 checksum utility to verify the integrity of the files, though I have no idea how to do that in unix. I have lots of spare CPU power on the server, and part of it could be used to generate the checksums. Stupid idea? In linux (and unRAID) to calculate an md5 checksum on a file you would type md5sum filename To do it on all files under the user shares type find /mnt/user/ -type f -exec md5sum {} \; | tee /boot/md5_file_checksums.txt That command will compute an md5 checksum for every one of your files under the user-shares and also put the output in a file on your flash drive. If you have not enabled user-shares, use find /mnt/disk* -type f -exec md5sum {} \; | tee /boot/md5_file_checksums.txt If you have lots of files it will take many many hours to run. You will be able to follow its progress on the screen. Joe L. Thanks Joe, that is just what I was looking for. I started md5sum, is sucking up ~20% of CPU power. Not too bad. Luca
Archived
This topic is now archived and is closed to further replies.