September 10, 200718 yr I was going to upgrade this box but got some errors while doing a parity check. Since I did want the parity "check" to correct stuff that was ok I stopped it. Here is the syslog: Sep 12 13:09:41 Tower2 kernel: mdcmd (4): check Sep 12 13:09:41 Tower2 kernel: md: recovery thread got woken up ... Sep 12 13:09:41 Tower2 kernel: md: writing superblock to /boot/config/super.dat Sep 12 13:09:41 Tower2 emhttp[852]: shcmd (9): mount -t reiserfs -o noatime,nodiratime /dev/md1 /mnt/disk1 Sep 12 13:09:41 Tower2 emhttp[853]: shcmd (9): mount -t reiserfs -o noatime,nodiratime /dev/md2 /mnt/disk2 Sep 12 13:09:41 Tower2 emhttp[854]: shcmd (9): mount -t reiserfs -o noatime,nodiratime /dev/md3 /mnt/disk3 Sep 12 13:09:41 Tower2 emhttp[855]: shcmd (9): mount -t reiserfs -o noatime,nodiratime /dev/md6 /mnt/disk6 Sep 12 13:09:41 Tower2 emhttp[856]: shcmd (9): mount -t reiserfs -o noatime,nodiratime /dev/md7 /mnt/disk7 Sep 12 13:09:41 Tower2 emhttp[862]: shcmd (9): mount -t reiserfs -o noatime,nodiratime /dev/md8 /mnt/disk8 Sep 12 13:09:41 Tower2 kernel: reiserfs: found format "3.6" with standard journal Sep 12 13:09:41 Tower2 last message repeated 4 times Sep 12 13:09:41 Tower2 kernel: md: recovery thread has nothing to resync Sep 12 13:09:44 Tower2 kernel: reiserfs: found format "3.6" with standard journal Sep 12 13:10:06 Tower2 kernel: reiserfs: checking transaction log (device md(9,2)) ... Sep 12 13:10:06 Tower2 kernel: for (md(9,2)) Sep 12 13:10:06 Tower2 kernel: reiserfs: checking transaction log (device md(9,1)) ... Sep 12 13:10:06 Tower2 kernel: for (md(9,1)) Sep 12 13:10:06 Tower2 kernel: reiserfs: replayed 2 transactions in 0 seconds Sep 12 13:10:06 Tower2 kernel: md(9,2):Using r5 hash to sort names Sep 12 13:10:06 Tower2 emhttp[853]: remount: /dev/md2 Sep 12 13:10:06 Tower2 kernel: md(9,2):can't shrink filesystem on-line Sep 12 13:10:06 Tower2 kernel: md(9,1):Using r5 hash to sort names Sep 12 13:10:06 Tower2 emhttp[852]: remount: /dev/md1 Sep 12 13:10:06 Tower2 kernel: md(9,1):can't shrink filesystem on-line Sep 12 13:10:07 Tower2 kernel: reiserfs: checking transaction log (device md(9,3)) ... Sep 12 13:10:07 Tower2 kernel: for (md(9,3)) Sep 12 13:10:07 Tower2 kernel: md(9,3):Using r5 hash to sort names Sep 12 13:10:07 Tower2 emhttp[854]: remount: /dev/md3 Sep 12 13:10:07 Tower2 kernel: md(9,3):can't shrink filesystem on-line Sep 12 13:10:08 Tower2 kernel: reiserfs: checking transaction log (device md(9,) ... Sep 12 13:10:08 Tower2 kernel: for (md(9,) Sep 12 13:10:08 Tower2 kernel: md(9,:Using r5 hash to sort names Sep 12 13:10:08 Tower2 emhttp[862]: remount: /dev/md8 Sep 12 13:10:08 Tower2 kernel: md(9,:can't shrink filesystem on-line Sep 12 13:10:08 Tower2 kernel: reiserfs: checking transaction log (device md(9,6)) ... Sep 12 13:10:08 Tower2 kernel: for (md(9,6)) Sep 12 13:10:08 Tower2 kernel: md(9,6):Using r5 hash to sort names Sep 12 13:10:08 Tower2 emhttp[855]: remount: /dev/md6 Sep 12 13:10:08 Tower2 kernel: md(9,6):can't shrink filesystem on-line Sep 12 13:10:09 Tower2 kernel: reiserfs: checking transaction log (device md(9,7)) ... Sep 12 13:10:09 Tower2 kernel: for (md(9,7)) Sep 12 13:10:09 Tower2 kernel: md(9,7):Using r5 hash to sort names Sep 12 13:10:09 Tower2 emhttp[856]: remount: /dev/md7 Sep 12 13:10:09 Tower2 kernel: md(9,7):can't shrink filesystem on-line Sep 12 13:10:09 Tower2 emhttp[842]: shcmd (9): killall -w smbd nmbd Sep 12 13:10:09 Tower2 nmbd[838]: [2007/09/12 13:10:09, 0] nmbd/nmbd.c:terminate(56) Sep 12 13:10:09 Tower2 nmbd[838]: Got SIGTERM: going down... Sep 12 13:10:11 Tower2 emhttp[842]: Scanning user shares... Sep 12 13:10:11 Tower2 emhttp[842]: shcmd (10): rm -r /mnt/user/* 2>/dev/null Sep 12 13:10:11 Tower2 emhttp[842]: oldpath=/mnt/disk6/Torrents Sorted/Lesbian/Alex Venice & Amber Rayne/dxva_sig.txt already exists Sep 12 13:10:11 Tower2 emhttp[842]: user share: DVD Sep 12 13:10:11 Tower2 emhttp[842]: user share: Pictures Sep 12 13:10:11 Tower2 emhttp[842]: user share: APAC - Videos Sep 12 13:10:11 Tower2 emhttp[842]: user share: TV Sep 12 13:10:11 Tower2 emhttp[842]: user share: Usenet Sep 12 13:10:11 Tower2 emhttp[842]: user share: Torrents Sorted Sep 12 13:10:11 Tower2 emhttp[842]: user share: Games Sep 12 13:10:11 Tower2 emhttp[842]: user share: Image Sep 12 13:10:11 Tower2 emhttp[842]: user share: Music Video Sep 12 13:10:11 Tower2 emhttp[842]: user share: Video Sort Sep 12 13:10:11 Tower2 emhttp[842]: shcmd (11): /usr/sbin/nmbd -D Sep 12 13:10:11 Tower2 emhttp[842]: shcmd (12): /usr/sbin/smbd -D Sep 12 13:10:29 Tower2 emhttp[783]: driver cmd: check Sep 12 13:10:29 Tower2 kernel: mdcmd (10): check Sep 12 13:10:29 Tower2 kernel: md: recovery thread got woken up ... Sep 12 13:10:29 Tower2 kernel: md: recovery thread checking parity... Sep 12 13:10:29 Tower2 kernel: md: writing superblock to /boot/config/super.dat Sep 12 13:10:29 Tower2 kernel: md: using 256k window, over a total of 312571192 blocks. Sep 12 13:10:29 Tower2 kernel: md0: parity incorrect: 128 Sep 12 13:10:35 Tower2 kernel: ata2: translated ATA stat/err 0x51/40 to SCSI SK/ASC/ASCQ 0x3/11/04 Sep 12 13:10:35 Tower2 kernel: ata2: status=0x51 { DriveReady SeekComplete Error } Sep 12 13:10:35 Tower2 kernel: ata2: error=0x40 { UncorrectableError } Sep 12 13:10:38 Tower2 kernel: ata2: translated ATA stat/err 0x51/40 to SCSI SK/ASC/ASCQ 0x3/11/04 Sep 12 13:10:38 Tower2 kernel: ata2: status=0x51 { DriveReady SeekComplete Error } Sep 12 13:10:38 Tower2 kernel: ata2: error=0x40 { UncorrectableError } Sep 12 13:10:42 Tower2 kernel: ata2: translated ATA stat/err 0x51/40 to SCSI SK/ASC/ASCQ 0x3/11/04 Sep 12 13:10:42 Tower2 kernel: ata2: status=0x51 { DriveReady SeekComplete Error } Sep 12 13:10:42 Tower2 kernel: ata2: error=0x40 { UncorrectableError } Sep 12 13:10:46 Tower2 kernel: ata2: translated ATA stat/err 0x51/40 to SCSI SK/ASC/ASCQ 0x3/11/04 Sep 12 13:10:46 Tower2 kernel: ata2: status=0x51 { DriveReady SeekComplete Error } Sep 12 13:10:46 Tower2 kernel: ata2: error=0x40 { UncorrectableError } Sep 12 13:10:50 Tower2 kernel: ata2: translated ATA stat/err 0x51/40 to SCSI SK/ASC/ASCQ 0x3/11/04 Sep 12 13:10:50 Tower2 kernel: ata2: status=0x51 { DriveReady SeekComplete Error } Sep 12 13:10:50 Tower2 kernel: ata2: error=0x40 { UncorrectableError } Sep 12 13:10:50 Tower2 kernel: scsi2: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 00 00 aa 3f 00 00 c8 00 Sep 12 13:10:50 Tower2 kernel: Current sd08:21: sns = 70 3 Sep 12 13:10:50 Tower2 kernel: ASC=11 ASCQ= 4 Sep 12 13:10:50 Tower2 kernel: Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x11 0x04 0x00 0x00 0x00 0x00 Sep 12 13:10:50 Tower2 kernel: I/O error: dev 08:21, sector 43520 Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43520/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43528/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43536/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43544/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43552/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43560/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43568/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43576/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43584/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43592/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43600/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43608/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43616/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43624/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43632/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43640/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43648/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43656/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43664/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43672/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43680/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43688/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43696/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43704/2, count: 1, uptodate 0. Sep 12 13:10:50 Tower2 kernel: md2: read error! Sep 12 13:10:50 Tower2 kernel: end_read_request 43712/2, count: 1, uptodate 0. Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44320 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44328 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44336 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44344 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44352 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44360 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44368 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44376 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44384 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44392 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44400 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44408 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44416 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44424 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44432 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44440 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44448 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44456 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44464 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44472 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44480 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44488 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44496 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44504 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44512 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44520 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44528 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44536 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44544 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44552 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44560 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44568 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44576 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44584 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44592 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 44600 Sep 12 13:10:53 Tower2 kernel: md0: parity incorrect: 65680 Sep 12 13:10:55 Tower2 emhttp[783]: shcmd (13): killall -w smbd nmbd Sep 12 13:10:55 Tower2 nmbd[903]: [2007/09/12 13:10:55, 0] nmbd/nmbd.c:terminate(56) Sep 12 13:10:55 Tower2 nmbd[903]: Got SIGTERM: going down... Sep 12 13:10:56 Tower2 emhttp[783]: shcmd (14): sync Sep 12 13:10:56 Tower2 emhttp[930]: shcmd (15): umount /mnt/disk1 Sep 12 13:10:56 Tower2 emhttp[931]: shcmd (15): umount /mnt/disk2 Sep 12 13:10:56 Tower2 emhttp[932]: shcmd (15): umount /mnt/disk3 Sep 12 13:10:56 Tower2 emhttp[933]: shcmd (15): umount /mnt/disk6 Sep 12 13:10:56 Tower2 emhttp[934]: shcmd (15): umount /mnt/disk7 Sep 12 13:10:56 Tower2 emhttp[940]: shcmd (15): umount /mnt/disk8 Sep 12 13:10:56 Tower2 emhttp[783]: driver cmd: stop Sep 12 13:10:56 Tower2 kernel: mdcmd (13): stop Sep 12 13:10:56 Tower2 kernel: md: md_do_sync() got signal ... exiting Sep 12 13:10:56 Tower2 kernel: md: sync done. time=27sec rate=2616K/sec Sep 12 13:10:56 Tower2 kernel: md: writing superblock to /boot/config/super.dat Sep 12 13:10:56 Tower2 kernel: md: recovery thread sync completion status: -4 Sep 12 13:10:56 Tower2 kernel: md1: stopping Sep 12 13:10:56 Tower2 kernel: md2: stopping Sep 12 13:10:56 Tower2 kernel: md3: stopping Sep 12 13:10:56 Tower2 kernel: md6: stopping Sep 12 13:10:56 Tower2 kernel: md7: stopping Sep 12 13:10:56 Tower2 kernel: md8: stopping Sep 12 13:10:56 Tower2 kernel: md: writing superblock to /boot/config/super.dat Sep 12 13:10:56 Tower2 kernel: md: stopped. Sep 12 13:10:56 Tower2 kernel: md: reading superblock from /boot/config/super.dat Sep 12 13:10:56 Tower2 kernel: md: superblock events: 19 Sep 12 13:10:56 Tower2 kernel: md: import sdb ST3320620AS 3QF04NZR offset: 63 size: 312571192 Sep 12 13:10:56 Tower2 kernel: md: import sdd Maxtor 6L300S0 L624D3TG offset: 63 size: 293057320 Sep 12 13:10:56 Tower2 kernel: md: import sdc ST3300831AS 3NF0KG58 offset: 63 size: 293036152 Sep 12 13:10:56 Tower2 kernel: md: import sde ST3320620AS 3QF04P2Z offset: 63 size: 312571192 Sep 12 13:10:56 Tower2 kernel: md: no device Sep 12 13:10:56 Tower2 kernel: md: no device Sep 12 13:10:56 Tower2 kernel: md: import hda ST3300631A 4NF0ZPTE offset: 63 size: 293036152 Sep 12 13:10:56 Tower2 kernel: md: import hdb ST3300831A 3NF03QEZ offset: 63 size: 293036152 Sep 12 13:10:56 Tower2 kernel: md: import hdc ST3300831A 5NF1AHVC offset: 63 size: 293036152 Sep 12 13:11:04 Tower2 emhttp[783]: Scanning user shares... Sep 12 13:11:04 Tower2 emhttp[783]: shcmd (15): rm -r /mnt/user/* 2>/dev/null Sep 12 13:11:04 Tower2 emhttp[783]: shcmd (16): /usr/sbin/nmbd -D Sep 12 13:11:05 Tower2 emhttp[783]: shcmd (17): /usr/sbin/smbd -D Sep 12 13:11:05 Tower2 kernel: md: reading superblock from /boot/config/super.dat Sep 12 13:11:05 Tower2 kernel: md: superblock events: 19 Sep 12 13:11:05 Tower2 kernel: md: import sdb ST3320620AS 3QF04NZR offset: 63 size: 312571192 Sep 12 13:11:05 Tower2 kernel: md: import sdd Maxtor 6L300S0 L624D3TG offset: 63 size: 293057320 Sep 12 13:11:05 Tower2 kernel: md: import sdc ST3300831AS 3NF0KG58 offset: 63 size: 293036152 Sep 12 13:11:05 Tower2 kernel: md: import sde ST3320620AS 3QF04P2Z offset: 63 size: 312571192 Sep 12 13:11:05 Tower2 kernel: md: no device Sep 12 13:11:05 Tower2 kernel: md: no device Sep 12 13:11:05 Tower2 kernel: md: import hda ST3300631A 4NF0ZPTE offset: 63 size: 293036152 Sep 12 13:11:05 Tower2 kernel: md: import hdb ST3300831A 3NF03QEZ offset: 63 size: 293036152 Sep 12 13:11:05 Tower2 kernel: md: import hdc ST3300831A 5NF1AHVC offset: 63 size: 293036152 Sep 12 13:11:42 Tower2 kernel: md: reading superblock from /boot/config/super.dat Sep 12 13:11:42 Tower2 kernel: md: superblock events: 19 Sep 12 13:11:42 Tower2 kernel: md: import sdb ST3320620AS 3QF04NZR offset: 63 size: 312571192 Sep 12 13:11:42 Tower2 kernel: md: import sdd Maxtor 6L300S0 L624D3TG offset: 63 size: 293057320 Sep 12 13:11:42 Tower2 kernel: md: import sdc ST3300831AS 3NF0KG58 offset: 63 size: 293036152 Sep 12 13:11:42 Tower2 kernel: md: import sde ST3320620AS 3QF04P2Z offset: 63 size: 312571192 Sep 12 13:11:42 Tower2 kernel: md: no device Sep 12 13:11:42 Tower2 kernel: md: no device Sep 12 13:11:42 Tower2 kernel: md: import hda ST3300631A 4NF0ZPTE offset: 63 size: 293036152 Sep 12 13:11:42 Tower2 kernel: md: import hdb ST3300831A 3NF03QEZ offset: 63 size: 293036152 Sep 12 13:11:42 Tower2 kernel: md: import hdc ST3300831A 5NF1AHVC offset: 63 size: 293036152 Sep 12 13:12:42 Tower2 kernel: md: reading superblock from /boot/config/super.dat Sep 12 13:12:42 Tower2 kernel: md: superblock events: 19 Sep 12 13:12:42 Tower2 kernel: md: import sdb ST3320620AS 3QF04NZR offset: 63 size: 312571192 Sep 12 13:12:42 Tower2 kernel: md: import sdd Maxtor 6L300S0 L624D3TG offset: 63 size: 293057320 Sep 12 13:12:42 Tower2 kernel: md: import sdc ST3300831AS 3NF0KG58 offset: 63 size: 293036152 Sep 12 13:12:42 Tower2 kernel: md: import sde ST3320620AS 3QF04P2Z offset: 63 size: 312571192 Sep 12 13:12:42 Tower2 kernel: md: no device Sep 12 13:12:42 Tower2 kernel: md: no device Sep 12 13:12:42 Tower2 kernel: md: import hda ST3300631A 4NF0ZPTE offset: 63 size: 293036152 Sep 12 13:12:42 Tower2 kernel: md: import hdb ST3300831A 3NF03QEZ offset: 63 size: 293036152 Sep 12 13:12:42 Tower2 kernel: md: import hdc ST3300831A 5NF1AHVC offset: 63 size: 293036152 root@Tower2:~# Tom Can you assist?
September 11, 200718 yr You have a disk with a media error. Probably only 1 sector. Note that because low-level drivers often consolidate many small sequential requests into one large one, if that one large one fails, then the low level driver will "fail" all the commands that were consolidated - resulting in many error messages. Run parity check again and see if the error repeats. When parity check gets a read error, it will reconstruct the data and attempt to re-write it. If the re-write was successful, another parity check should get past this point without error. If the error persists, consider replacing the drive.
December 19, 200718 yr Hi there! I'm bumping on this thread as I got the same errors as hypyke : [ 4319.127071] md0: parity incorrect: 52448192 First of all, did you solve the issue? Is there any way to find out which of the drives has the problem? Is md0 actually the parity drive?
December 19, 200718 yr Although the example you picked does mention a parity error, it is probably quite different than yours. The real error above is the read error from Disk 2, especially in a very low sector, probably part of the Reiser file system structures. That caused cascading errors afterward, both additional read errors and parity errors, which are probably false. All you have indicated is a detection of parity incorrect for one very high sector. As you guessed, md0 is the parity drive, but that is what a 'parity incorrect' error will always indicate, since parity is calculated across all of the drives, and there is no immediate way to determine which drive has had a bit changed incorrectly. There are various possible causes (power fluctuation?), so we would need to see more of the syslog to determine the actual problem, if that is possible. Tom's advice above still applies, in that when a parity check discovers a parity error, it corrects it, so running parity check again should show no more errors. You could consider hypyke's read error as a hard error, whereas his and your parity errors are soft errors, easily correctable. The read error usually results in the drive being marked disabled in unRAID.
December 20, 200718 yr OK I think I see what you mean thanks for pointing this out I'll rerun a check later for the moment I'm still in the process of moving my data from my old disks to the new unRAID array. I don't think it's a good idea to run a parity check in that kind of situation. By the way could you please tell me how to export the syslog to an exploitable file? The only way I know is to make a more then copy/paste... Is that the only way?
December 21, 200718 yr Copy your syslog to the USB key and make sure you are exporting the USB key, then you can read it from windows, such as ... cp /var/log/syslog /boot/syslog.txt Hope that's right, I'm at work right now. Bill
February 9, 200818 yr Hi there! Sorry for the late reply but this is getting more and more concerning, my last parity check (about 4 hours for 1.5TB) got me over 6000 errors but there are some devices error as well. I ran a memtest (12 passes) as well as diagnostic utilities for the hard drives without finding any glitch. Another problem is that after running a parity check the array gets very slow and if I don't reboot it at once I need to force a physical shutdown. Here's my syslog I hope you'll be able to help me solve this issue... In the meantime I'll run another check...
February 12, 200818 yr Second check reported only 2 errors... I guess it's all fixed now... A bit concerning though...
February 15, 200818 yr I'm afraid I'm the bearer of bad news, concerning your sdb - WDC_WD3200KS-00_WD-WCAPD2312215, connected to an SiI3132 port. I'm curious if you are seeing a temperature for that drive? Your syslog contained a rather rare sequence of 6 exception emask errors, with this "device error via D2H FIS" phrase, and followed by "get_temperature: ioctl (smart): Input/output error". The only place I've found that sequence was in an old syslog of mine, when I tried to add an electrically damaged Samsung HD400LJ to my unRAID array. It was last summer and I don't recall the exact cause, but it was either a nearby lightning strike and spike, or a SATA connector that had slipped half off, damaging the firmware and SMART system of the drive and the SiI3112 chipset it was connected to. Because the drive was fast, quiet, cool, 400GB, and STILL worked great and reliably when attached elsewhere (although with 5 or 6 clicks every time it powers on, no SMART, and occasionally invisible on reboot!), I decided to try it in unRAID. When I connected it to an SiI3132 port, it would appear to be successful initially, and then I would get the exact sequence of errors you have, plus parity errors later, and finally it would take the array down. If I moved it to a different SATA port, I also received fatal errors, though different. The "device error via D2H FIS" phrase seems to be specific to the sata_sil24 driver, used by the SiI3132 chipset. I would recommend moving the data off sdb, and replacing it, hopefully RMA'ing it. It may very well be responsible for all of the errors you are seeing. The 2 errors you saw on the last check are 2 too many. I still use my drive in a more error tolerant WinXP system, storing almost 400GB of the most expendable of my TV recordings. In the BIOS, I disabled SMART for the drive and turned off the F1 pause on errors, and it works fine, fast and reliable, so long as you ignore the clicks when it first powers on, and the occasional disappearance.
February 16, 200818 yr Hi Rob! I was afraid of that, thank you for the highlight anyway. Your guess was correct, I can't read the SMART info on the drive but I don't have the booting clicks (I guess it's a good sign...) I'll try to negociate with WD for an RMA but first things first, I need to move the data from disk3 to disk 4 & 6 (6 is not used at all for the moment) To do that I though about changing the shares behaviour, I've got a few that have disk3 in the included disk. Ex : Situations will go from, Included disk(s): disk3 disk4 Excluded disk(s): disk1 disk2 disk5 to, Included disk(s): disk4 disk6 Excluded disk(s): disk1 disk2 disk3 disk5 There's enough free space on 4 & 6 to handle disk3 data. Do you think this would work or will I lose the data on disk3 as soon as it's moved to the excluded disks? Thank you so much for your help on this
February 16, 200818 yr I'm probably not the best person to ask, because I have never used User Shares. From what I have read, your idea looks correct. I don't think there is any chance of losing data from a physical disk, but files on an excluded disk will probably not be visible any longer in the corresponding User Share. I think I would first move the data to where I want it, then reconfigure User Shares, then stop and restart unRAID to recognize the changes. Someone else with more experience here may well correct me though.
February 16, 200818 yr Hello! I followed your advise : it's best to be on the safe side, so I copied the data first then modified the user shares and had to stop & restart the box to apply the changes. I verified the copies before deleting the content of disk3 disk3 is now empty and I have another question I'd like to remove it from the array without breaking the parity, what would be the best way to do that in your opinion? I'm afraid I can't just physically remove the drive and leave it that way, the array would be unprotected and we don't want that
February 16, 200818 yr Unfortunately unRAID, like most RAID systems, is all about protecting AGAINST the loss of any drive. There is no way to remove a drive, without redoing the parity drive. Currently, what is usually done is to either replace the drive and let unRAID rebuild the new drive from the old, or to un-assign the drive and 'Restore' (if I remember right) and rebuild the parity drive from the remaining drives. Either way leaves you temporarily vulnerable. You could just leave the drive in, assigned but unused, until a replacement drive is available, but I think you probably need it to ship back before you'll receive the new drive. There is a current thread in the Feature Request forum (http://lime-technology.com/forum/index.php?topic=1455.0) about adding a button to remove a drive, but I don't believe they are aware of all of the ramifications of this. If the drive to remove exists and is still valid, then the removal process would involve marking it as disabled, and reading every bit on that drive and 'removing' it from the parity info, and rewriting almost every sector on the parity drive. The only parity sectors that would not have to be written would be those that were entirely zeroes on the drive being removed, probably some at the end of the drive. This might be 5% to 20% faster than just rebuilding the parity drive. In the other case where the drive is already missing or disabled, then 'removing' it from the existing parity info would involve reconstructing it first before removing its bits from the parity info. This would take a little longer than just rebuilding parity, as every drive including parity would have to be read first, then the parity drive written. Rebuilding parity involves reading only the data drives, then writing to the parity drive. The idea behind requesting a button or other official method is to try to preserve parity across these operations, just in case another drive failure or other disk error occurs during the process. But it looks to me that if any other error were to occur during this drive removal, then the parity changes would be incomplete, and therefore the parity drive would be invalid anyway, and it becomes wasted effort and time. Requires further thought, I may not have analyzed it fully or correctly. There may still be some benefit to the first scenario (drive still good), as it may be a little quicker, could possibly run with the array still up, and would avoid reading all of the other disks, thereby avoiding finding disk errors on those other disks. That makes it a little safer. I'm not sure if the benefit gained will raise it very high on the Requested Features list, compared to other requests. In the other scenario (drive missing or disabled), the safest action would be to quickly rebuild parity from the remaining drives. A second complete drive failure could not be handled anyway, no matter what you did, and a new disk error would only be a delay, while you apply appropriate remedial actions to the affected drive (reiserfsck, fix bad sectors, and/or copy off all good data, etc), then rebuild parity. (sorry, I ventured away from just answering your question, should probably have posted my thoughts in the other thread, will post a link there back to here)
February 17, 200818 yr OK I see, it makes sense so I'm just going to let this drive in and empty for the moment just in case there's another drive failure. I guess it's better than leave the array unprotected... I planned to buy another drive in a month and as my box is not powered on 24/7 I guess I can take the chance... Thanks a lot for all your advises :-) I'll let you know what WD answer will be. I'll have to negotiate hard as my drive was bought in Canada and in the meantime I moved back to Belgium so it's "Out of warranty region"... Shit happens ...
Archived
This topic is now archived and is closed to further replies.