Help with server freeze

November 18, 200916 yr

Sometime last night the server became unresponsive. No way to access it via the browser or SSH, and no data going through, would not answer a ping either.

I power cycled it and it came up with array stopped. I had a print out of the drives's assignment, and the position was right for each drive, parity and cache included.

I restarted the array, and after a very intense minute all drives went from "unformatted" to show their actual size / usage. The data seems fine, and parity-synch is in progress. The crash occurred in the middle of a copy. I am attaching this morning's syslog in hope that someone may glean some insight of how this happened.

I upgraded to 4.5b10 from 4.5b8 about 3 days ago. To upgrade I just replaced bzimage and bzroot, as per Joe L's thread. I run a parity check just 2 days ago with 0 errors.

Luca

edit: grammar

Quote

November 18, 200916 yr

luka, 4.5b10 was pulled due to some bugs.

Strongly reccomend you upgrade to 4.5b11

Also, it won't hurt if you run a memory test on your sever for awhile.

Quote

November 18, 200916 yr

Author

luka, 4.5b10 was pulled due to some bugs.

Strongly reccomend you upgrade to 4.5b11

Also, it won't hurt if you run a memory test on your sever for awhile.

Ah, thanks. I found the announcements forum, have to start reading that as well.

Luca

Quote

November 18, 200916 yr

Not much to 'glean', more than you already know. There was a crash, and there was disk write activity happening, as you have said, but there is no evidence as to what went wrong. Everything looks fine now, with parity rebuilding, because the super.dat was corrupted. The Reiser file system is a journaling system, so even the last activity was completed as far as it could, by replaying unfinished transactions. The file systems should be fine, but file transfers that were interrupted may have resulted in corrupted or incomplete files, and should be checked. The drives with the most transactions replayed are those that probably had the last activity, where the last files were saved, so that would provide a starting point to locating suspect files.

* Disk 13 : 17 transactions

* Disk 11 : 175 transactions

* Disk 3 : 224 transactions

* Disk 6 : 285 transactions

* Cache Disk : 555 transactions

Do not try to correlate the number of transactions with the number of files or the size of files. There are a lot of background transactions involved in saving any file, and maintaining the file system b-trees.

The crash could be (and probably is) because of the beta10 bug, but I cannot say. If you have no further problems, then blame the beta.

Quote

November 19, 200916 yr

Author

Not much to 'glean', more than you already know. There was a crash, and there was disk write activity happening, as you have said, but there is no evidence as to what went wrong. Everything looks fine now, with parity rebuilding, because the super.dat was corrupted. The Reiser file system is a journaling system, so even the last activity was completed as far as it could, by replaying unfinished transactions. The file systems should be fine, but file transfers that were interrupted may have resulted in corrupted or incomplete files, and should be checked. The drives with the most transactions replayed are those that probably had the last activity, where the last files were saved, so that would provide a starting point to locating suspect files.

* Disk 13 : 17 transactions

* Disk 11 : 175 transactions

* Disk 3 : 224 transactions

* Disk 6 : 285 transactions

* Cache Disk : 555 transactions

Do not try to correlate the number of transactions with the number of files or the size of files. There are a lot of background transactions involved in saving any file, and maintaining the file system b-trees.

The crash could be (and probably is) because of the beta10 bug, but I cannot say. If you have no further problems, then blame the beta.

Thanks Rob. I wonder if I could use something like an MD5 checksum utility to verify the integrity of the files, though I have no idea how to do that in unix. I have lots of spare CPU power on the server, and part of it could be used to generate the checksums. Stupid idea?

Quote

November 19, 200916 yr

Not much to 'glean', more than you already know. There was a crash, and there was disk write activity happening, as you have said, but there is no evidence as to what went wrong. Everything looks fine now, with parity rebuilding, because the super.dat was corrupted. The Reiser file system is a journaling system, so even the last activity was completed as far as it could, by replaying unfinished transactions. The file systems should be fine, but file transfers that were interrupted may have resulted in corrupted or incomplete files, and should be checked. The drives with the most transactions replayed are those that probably had the last activity, where the last files were saved, so that would provide a starting point to locating suspect files.

* Disk 13 : 17 transactions

* Disk 11 : 175 transactions

* Disk 3 : 224 transactions

* Disk 6 : 285 transactions

* Cache Disk : 555 transactions

Do not try to correlate the number of transactions with the number of files or the size of files. There are a lot of background transactions involved in saving any file, and maintaining the file system b-trees.

The crash could be (and probably is) because of the beta10 bug, but I cannot say. If you have no further problems, then blame the beta.

Thanks Rob. I wonder if I could use something like an MD5 checksum utility to verify the integrity of the files, though I have no idea how to do that in unix. I have lots of spare CPU power on the server, and part of it could be used to generate the checksums. Stupid idea?

In linux (and unRAID) to calculate an md5 checksum on a file you would type

md5sum filename

To do it on all files under the user shares type

find /mnt/user/ -type f -exec md5sum {} \; | tee /boot/md5_file_checksums.txt

That command will compute an md5 checksum for every one of your files under the user-shares and also put the output in a file on your flash drive.

If you have not enabled user-shares, use

find /mnt/disk* -type f -exec md5sum {} \; | tee /boot/md5_file_checksums.txt

If you have lots of files it will take many many hours to run. You will be able to follow its progress on the screen.

Joe L.

Quote

November 19, 200916 yr

Author

Thanks Rob. I wonder if I could use something like an MD5 checksum utility to verify the integrity of the files, though I have no idea how to do that in unix. I have lots of spare CPU power on the server, and part of it could be used to generate the checksums. Stupid idea?

In linux (and unRAID) to calculate an md5 checksum on a file you would type
md5sum filename

To do it on all files under the user shares type

find /mnt/user/ -type f -exec md5sum {} \; | tee /boot/md5_file_checksums.txt

That command will compute an md5 checksum for every one of your files under the user-shares and also put the output in a file on your flash drive.

If you have not enabled user-shares, use

find /mnt/disk* -type f -exec md5sum {} \; | tee /boot/md5_file_checksums.txt

If you have lots of files it will take many many hours to run. You will be able to follow its progress on the screen.

Joe L.

Thanks Joe, that is just what I was looking for. I started md5sum, is sucking up ~20% of CPU power. Not too bad.

Luca

Quote

Help with server freeze

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)