October 12, 201015 yr Ok folks, here's a question for the community. I have unRAID Pro running with three WD20EARS drives (jumpered on 7-8 as recommended). When I transferred all my files from an existing, external WD20EADS drive (not another EARS) I popped the EADS drive out of it's cage and into the unRAID server. I waited the 10 hours or so for the clearing process, but awoke to find the web interface non responsive. I had to manually reboot the unRAID server from the command line. When it rebooted, of course, it saw the EADS drive as "new" and began to "clear" it again. I let it run through and again, near the end it locked up. After searching the forums, I found out how to capture my syslog and I'll post it here to let a guru peruse it, but I'm puzzled why this drive (which was full and functioning perfectly in it's cage) would throw errors to the unRAID server? Is it a jumper setting on these EADS drives? I didn't seem to see any reference to the jumpers apart from the EARS units, not the EADS units. Anyway, below is a truncated example of the flood of errors in the syslog relating to this drive: -------------------------------------------SYSLOG---------------------------------------------------------------------- Oct 11 19:05:18 Angband emhttp: ... clearing 83% complete Oct 11 19:11:17 Angband kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Oct 11 19:11:17 Angband kernel: ata4.00: failed command: WRITE DMA EXT Oct 11 19:11:17 Angband kernel: ata4.00: cmd 35/00:00:3f:6c:5a/00:01:c3:00:00/e0 tag 0 dma 131072 out Oct 11 19:11:17 Angband kernel: res 40/00:00:00:00:00/00:00:00:00:00/10 Emask 0x4 (timeout) Oct 11 19:11:17 Angband kernel: ata4.00: status: { DRDY } Oct 11 19:11:17 Angband kernel: ata4: hard resetting link Oct 11 19:11:22 Angband kernel: ata4: link is slow to respond, please be patient (ready=-19) Oct 11 19:11:27 Angband kernel: ata4: COMRESET failed (errno=-16) Oct 11 19:11:27 Angband kernel: ata4: hard resetting link Oct 11 19:11:32 Angband kernel: ata4: link is slow to respond, please be patient (ready=-19) Oct 11 19:11:37 Angband kernel: ata4: COMRESET failed (errno=-16) Oct 11 19:11:37 Angband kernel: ata4: hard resetting link Oct 11 19:11:42 Angband kernel: ata4: link is slow to respond, please be patient (ready=-19) Oct 11 19:12:12 Angband kernel: ata4: COMRESET failed (errno=-16) Oct 11 19:12:12 Angband kernel: ata4: hard resetting link Oct 11 19:12:17 Angband kernel: ata4: COMRESET failed (errno=-16) Oct 11 19:12:17 Angband kernel: ata4: reset failed, giving up Oct 11 19:12:17 Angband kernel: ata4.00: disabled Oct 11 19:12:17 Angband kernel: ata4.00: device reported invalid CHS sector 0 Oct 11 19:12:17 Angband kernel: ata4: EH complete Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] Unhandled error code Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00 Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 c3 5a 6c 3f 00 01 00 00 Oct 11 19:12:17 Angband kernel: end_request: I/O error, dev sdb, sector 3277483071 Oct 11 19:12:17 Angband kernel: Buffer I/O error on device sdb1, logical block 3277483008 Oct 11 19:12:17 Angband kernel: lost page write due to I/O error on sdb1 Oct 11 19:12:17 Angband kernel: Buffer I/O error on device sdb1, logical block 3277483009 Oct 11 19:12:17 Angband kernel: lost page write due to I/O error on sdb1 Oct 11 19:12:17 Angband kernel: Buffer I/O error on device sdb1, logical block 3277483010 Oct 11 19:12:17 Angband kernel: lost page write due to I/O error on sdb1 Oct 11 19:12:17 Angband kernel: Buffer I/O error on device sdb1, logical block 3277483011 Oct 11 19:12:17 Angband kernel: lost page write due to I/O error on sdb1 Oct 11 19:12:17 Angband kernel: Buffer I/O error on device sdb1, logical block 3277483012 Oct 11 19:12:17 Angband kernel: lost page write due to I/O error on sdb1 Oct 11 19:12:17 Angband kernel: Buffer I/O error on device sdb1, logical block 3277483013 Oct 11 19:12:17 Angband kernel: lost page write due to I/O error on sdb1 Oct 11 19:12:17 Angband kernel: Buffer I/O error on device sdb1, logical block 3277483014 Oct 11 19:12:17 Angband kernel: lost page write due to I/O error on sdb1 Oct 11 19:12:17 Angband kernel: Buffer I/O error on device sdb1, logical block 3277483015 Oct 11 19:12:17 Angband kernel: lost page write due to I/O error on sdb1 Oct 11 19:12:17 Angband kernel: Buffer I/O error on device sdb1, logical block 3277483016 Oct 11 19:12:17 Angband kernel: lost page write due to I/O error on sdb1 Oct 11 19:12:17 Angband kernel: Buffer I/O error on device sdb1, logical block 3277483017 Oct 11 19:12:17 Angband kernel: lost page write due to I/O error on sdb1 Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] Unhandled error code Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00 Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 c3 5a 6d 3f 00 01 00 00 Oct 11 19:12:17 Angband kernel: end_request: I/O error, dev sdb, sector 3277483327 Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] Unhandled error code Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00 Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 c3 5a 6e 3f 00 01 00 00 Oct 11 19:12:17 Angband kernel: end_request: I/O error, dev sdb, sector 3277483583 Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] Unhandled error code Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00 Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 c3 5a 6f 3f 00 01 00 00 Oct 11 19:12:17 Angband kernel: end_request: I/O error, dev sdb, sector 3277483839 Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] Unhandled error code Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00 Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 c3 5a 70 3f 00 01 00 00 Oct 11 19:12:17 Angband kernel: end_request: I/O error, dev sdb, sector 3277484095 Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] Unhandled error code Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00 Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 c3 5a 71 3f 00 01 00 00 Oct 11 19:12:17 Angband kernel: end_request: I/O error, dev sdb, sector 3277484351 Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] Unhandled error code Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00 Oct 11 19:12:17 Angband kernel: sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 c3 5a 72 3f 00 01 00 00 Oct 11 19:12:17 Angband kernel: end_request: I/O error, dev sdb, sector 3277484607 ------------------------------------------END SYSLOG--------------------------------------------------------------
October 12, 201015 yr Some EADS drives do require jumpers as well - best way to determine this is to look for the words "Advanced Format" on the label. If it says this, it needs the jumper. If not, it doesn't. So I guess the first step would be confirming if your EADS drive needs the jumper or not.
October 12, 201015 yr Author Some EADS drives do require jumpers as well - best way to determine this is to look for the words "Advanced Format" on the label. If it says this, it needs the jumper. If not, it doesn't. So I guess the first step would be confirming if your EADS drive needs the jumper or not. JazzySmooth: Thanks, I'll give that a look-see when I get home. God I feel like a total noob, and thanks for not treating me like one!
October 12, 201015 yr It looks like you may also have a loose or bad cable. Double check the connections. You could also try swapping out the cable if you have a spare.
October 13, 201015 yr Author Ok, well the saga continues! Thanks for the excellent and prompt suggestions, I've taken a further peek at things and here's what I've found out. The WD20EADS drive in question doesn't indicate anywhere on it that it's an "Advanced Format" drive, so that Jumper on 7-8 doesn't seem like it's going do anything... At this point I'm stumped. I've removed the drive and brought it with me to run the WD utility to see if it generates any errors. I'll go from there I guess... Any other ideas? I've already tried different cables/power plugs etc. I'm stumped! In the interim, if I decide that maybe the PC that I'm using has the issues, can I switch everything over to a new PC without losing the raid? Obviously all my stuff is on it now, and I'd rather not re-do 4TB of information I'm assuming things will change as the new mobo will have different "ports" for the SATA cables and unRAID will not find them appropriately.
October 13, 201015 yr unRAID is very hardware independent. You can swap out the RAM, CPU, etc. and unRAID won't even notice (well, it will notice, but it won't care). The only issue you may have with swapping out the motherboard is, as you've correctly guessed, unRAID will see the disks in different drive slots. However, unRAID is even good at dealing with this - often times it will just figure it out on its own. However, as a precaution, take a screenshot of your devices page BEFORE you move the drives over to the new hardware. Be sure to save the screenshot somewhere besides the unRAID server, since the server will be down when you need to consult it. Once you boot from the new hardware (remember to change the boot priority to the flash drive), just go to the devices page and assign the drives to the correct drive slots. You should then be able to start the array. The only drive that is truly critical is the parity drive. If you place a data drive in the parity drive slot and start the array, you are very likely to lose the data on that drive. Assigning your parity drive to a data slot won't hurt it, since unRAID will see it as unformatted (just don't format it and you'll be fine). So if you are ever unsure as to which drive is your parity drive, all you have to do is assign all drives as data drives. Whichever drive shows up as unformatted on the main page is your parity drive. If more than one drive shows up as unformatted, then you'll need to seek help. If you mix up your data drives (such as assigning the drive that was formally disk2 into the disk5 slot), it won't hurt anything, your data will still be safe. You would have to redo all your share settings, though (included/excluded disks, etc).
October 13, 201015 yr Author Reviewing things I see that I can just "pick and choose" which drive belongs in which slot, so I've print screened my setup and will set them back the same way when I get the new server put together. The fact that the 80Gb drive (independent of the array) keeps showing DMA errors and the like is prolly an indication that the machine is not suitable. I'm going to rummage through some mobos and stuff that I have and see if I can get something that's more stable. *sniff* I just want this to work, in theory it's such a great idea! Thanks again for the help!
October 14, 201015 yr Author Ok, so last night I swapped motherboards/cpus and everything seems to be up and running! Re-configuring the array was easy, I can't believe I was so nervous about it, but when all your stuff is in one place... you know... SO I'm into hour 15 of the pre-clear script of the new WD20EADS drive, hopefully this works! I'll keep everyone posted, just in case you're on the edge of your seats.
October 14, 201015 yr Author Ok, so I tried pre-clearing the WD20EADS drive, and this is what shows up at the end of the process in my syslog. Could this be the 7-8 jumper needing to be set, or is the drive fubar? Any thoughts? Oct 14 15:07:43 Angband kernel: end_request: I/O error, dev sdd, sector 0 Oct 14 15:07:43 Angband kernel: Buffer I/O error on device sdd, logical block 0 Oct 14 15:07:43 Angband kernel: Buffer I/O error on device sdd, logical block 1 Oct 14 15:07:43 Angband kernel: Buffer I/O error on device sdd, logical block 2 Oct 14 15:07:43 Angband kernel: Buffer I/O error on device sdd, logical block 3 Oct 14 15:07:43 Angband kernel: sd 8:0:0:0: [sdd] Unhandled error code Oct 14 15:07:43 Angband kernel: sd 8:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 Oct 14 15:07:43 Angband kernel: sd 8:0:0:0: [sdd] CDB: cdb[0]=0x28: 28 00 00 00 00 00 00 00 08 00 Oct 14 15:07:43 Angband kernel: end_request: I/O error, dev sdd, sector 0 Oct 14 15:07:43 Angband kernel: Buffer I/O error on device sdd, logical block 0 Oct 14 15:07:48 Angband kernel: sd 8:0:0:0: [sdd] READ CAPACITY(16) failed Oct 14 15:07:48 Angband kernel: sd 8:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 Oct 14 15:07:48 Angband kernel: sd 8:0:0:0: [sdd] Sense not available. Oct 14 15:07:48 Angband kernel: sd 8:0:0:0: [sdd] READ CAPACITY failed Oct 14 15:07:48 Angband kernel: sd 8:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 Oct 14 15:07:48 Angband kernel: sd 8:0:0:0: [sdd] Sense not available. Oct 14 15:07:48 Angband kernel: sd 8:0:0:0: [sdd] Asking for cache data failed Oct 14 15:07:48 Angband kernel: sd 8:0:0:0: [sdd] Assuming drive cache: write through Oct 14 15:07:48 Angband kernel: sdd: detected capacity change from 2000398934016 to 0 Oct 14 15:07:58 Angband kernel: udev: starting version 141 Oct 14 15:14:34 Angband emhttp: shcmd (76): /etc/rc.d/rc.samba stop | logger Oct 14 15:14:34 Angband emhttp: shcmd (77): /etc/rc.d/rc.nfsd stop | logger Oct 14 15:14:35 Angband emhttp: Spinning up all drives... Oct 14 15:14:35 Angband emhttp: shcmd (78): sync Oct 14 15:14:35 Angband kernel: mdcmd (4166): spinup 0 Oct 14 15:14:35 Angband kernel: mdcmd (4167): spinup 1 Oct 14 15:14:35 Angband kernel: mdcmd (4168): spinup 2 Oct 14 15:14:36 Angband emhttp: shcmd (79): umount /mnt/user >/dev/null 2>&1 Oct 14 15:14:36 Angband emhttp: shcmd (80): rmdir /mnt/user >/dev/null 2>&1 Oct 14 15:14:36 Angband emhttp: shcmd (81): umount /mnt/disk1 >/dev/null 2>&1 Oct 14 15:14:36 Angband emhttp: shcmd (82): rmdir /mnt/disk1 >/dev/null 2>&1 Oct 14 15:14:36 Angband emhttp: shcmd (83): umount /mnt/disk2 >/dev/null 2>&1 Oct 14 15:14:37 Angband emhttp: shcmd (84): rmdir /mnt/disk2 >/dev/null 2>&1 Oct 14 15:14:37 Angband kernel: mdcmd (4171): stop Oct 14 15:14:37 Angband kernel: md1: stopping Oct 14 15:14:37 Angband kernel: md2: stopping Oct 14 15:14:37 Angband emhttp: shcmd (85): rm /etc/samba/smb-shares.conf >/dev/null 2>&1 Oct 14 15:14:37 Angband emhttp: shcmd (86): cp /etc/exports- /etc/exports Oct 14 15:14:37 Angband emhttp: shcmd (87): /etc/rc.d/rc.samba start | logger Oct 14 15:14:37 Angband logger: Starting Samba: /usr/sbin/nmbd -D Oct 14 15:14:37 Angband logger: /usr/sbin/smbd -D Oct 14 15:14:37 Angband emhttp: shcmd (88): /etc/rc.d/rc.nfsd start | logger Oct 14 15:14:51 Angband emhttp: ckmbr: read: Success Oct 14 15:14:51 Angband emhttp: get_fstype: open /dev/sdd1: No such file or directory Oct 14 15:14:51 Angband emhttp: shcmd (89): modprobe -rw md-mod 2>&1 | logger Oct 14 15:14:51 Angband emhttp: shcmd (90): modprobe md-mod super=/boot/config/super.dat slots=8,0,8,32,8,16,8,48,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2>&1 | logger Oct 14 15:14:51 Angband kernel: md: unRAID driver removed Oct 14 15:14:51 Angband kernel: xor: automatically using best checksumming function: pIII_sse Oct 14 15:14:51 Angband kernel: pIII_sse : 6145.200 MB/sec Oct 14 15:14:51 Angband kernel: xor: using function: pIII_sse (6145.200 MB/sec) Oct 14 15:14:51 Angband kernel: md: unRAID driver 0.95.4 installed Oct 14 15:14:51 Angband kernel: md: import disk0: [8,0] (sda) WDC WD20EARS-00M WD-WMAZA0848746 offset: 63 size: 1953514552 Oct 14 15:14:51 Angband kernel: md: import disk1: [8,32] (sdc) WDC WD20EARS-00M WD-WMAZA0836750 offset: 63 size: 1953514552 Oct 14 15:14:51 Angband kernel: md: import disk2: [8,16] (sdb) WDC WD20EARS-00M WD-WMAZA0797388 offset: 63 size: 1953514552 Oct 14 15:14:51 Angband kernel: md: import disk3: [8,48] (sdd) WD-WMAZA0797388 WD-WMAZA0797388 offset: 63 size: 2147483616 Oct 14 15:14:51 Angband kernel: md: disk3 new disk Oct 14 15:14:51 Angband kernel: mdcmd (2): set md_num_stripes 1280 Oct 14 15:14:51 Angband kernel: mdcmd (3): set md_write_limit 768 Oct 14 15:14:51 Angband kernel: mdcmd (4): set md_sync_window 288 Oct 14 15:14:51 Angband kernel: mdcmd (5): set spinup_group 0 0 Oct 14 15:14:51 Angband kernel: mdcmd (6): set spinup_group 1 0 Oct 14 15:14:51 Angband kernel: mdcmd (7): set spinup_group 2 0 Oct 14 15:14:51 Angband kernel: mdcmd (: set spinup_group 3 0 Oct 14 15:14:51 Angband emhttp: disk_temperature: ioctl (smart_enable): Input/output error Oct 14 15:15:10 Angband last message repeated 2 times Oct 14 15:15:13 Angband emhttp: shcmd (91): modprobe -rw md-mod 2>&1 | logger Oct 14 15:15:13 Angband emhttp: shcmd (92): modprobe md-mod super=/boot/config/super.dat slots=8,0,8,32,8,16,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2>&1 | logger Oct 14 15:15:13 Angband kernel: md: unRAID driver removed Oct 14 15:15:13 Angband kernel: xor: automatically using best checksumming function: pIII_sse Oct 14 15:15:13 Angband kernel: pIII_sse : 6145.200 MB/sec Oct 14 15:15:13 Angband kernel: xor: using function: pIII_sse (6145.200 MB/sec) Oct 14 15:15:13 Angband kernel: md: unRAID driver 0.95.4 installed Oct 14 15:15:13 Angband kernel: md: import disk0: [8,0] (sda) WDC WD20EARS-00M WD-WMAZA0848746 offset: 63 size: 1953514552 Oct 14 15:15:13 Angband kernel: md: import disk1: [8,32] (sdc) WDC WD20EARS-00M WD-WMAZA0836750 offset: 63 size: 1953514552 Oct 14 15:15:13 Angband kernel: md: import disk2: [8,16] (sdb) WDC WD20EARS-00M WD-WMAZA0797388 offset: 63 size: 1953514552 Oct 14 15:15:13 Angband kernel: mdcmd (2): set md_num_stripes 1280 Oct 14 15:15:13 Angband kernel: mdcmd (3): set md_write_limit 768 Oct 14 15:15:13 Angband kernel: mdcmd (4): set md_sync_window 288 Oct 14 15:15:13 Angband kernel: mdcmd (5): set spinup_group 0 0 Oct 14 15:15:13 Angband kernel: mdcmd (6): set spinup_group 1 0 Oct 14 15:15:13 Angband kernel: mdcmd (7): set spinup_group 2 0 Oct 14 15:15:18 Angband emhttp: shcmd (93): /usr/local/sbin/set_ncq sda 1 >/dev/null Oct 14 15:15:18 Angband emhttp: shcmd (94): /usr/local/sbin/set_ncq sdc 1 >/dev/null Oct 14 15:15:18 Angband emhttp: shcmd (95): /usr/local/sbin/set_ncq sdb 1 >/dev/null Oct 14 15:15:18 Angband kernel: mdcmd (11): start STOPPED Oct 14 15:15:18 Angband kernel: unraid: allocating 18360K for 1280 stripes (3 disks) Oct 14 15:15:18 Angband kernel: md1: running, size: 1953514552 blocks Oct 14 15:15:18 Angband kernel: md2: running, size: 1953514552 blocks Oct 14 15:15:19 Angband emhttp: shcmd (96): udevadm settle Oct 14 15:15:19 Angband emhttp: shcmd (97): mkdir /mnt/disk2 Oct 14 15:15:19 Angband emhttp: shcmd (97): mkdir /mnt/disk1 Oct 14 15:15:19 Angband emhttp: shcmd (98): set -o pipefail ; mount -t reiserfs -o noacl,nouser_xattr,noatime,nodiratime /dev/md2 /mnt/disk2 2>&1 | logger Oct 14 15:15:19 Angband emhttp: shcmd (99): set -o pipefail ; mount -t reiserfs -o noacl,nouser_xattr,noatime,nodiratime /dev/md1 /mnt/disk1 2>&1 | logger Oct 14 15:15:19 Angband kernel: mdcmd (13): check Oct 14 15:15:19 Angband kernel: md: recovery thread woken up ... Oct 14 15:15:19 Angband kernel: md: recovery thread has nothing to resync Oct 14 15:15:19 Angband kernel: REISERFS (device md1): found reiserfs format "3.6" with standard journal Oct 14 15:15:19 Angband kernel: REISERFS (device md1): using ordered data mode Oct 14 15:15:19 Angband kernel: REISERFS (device md2): found reiserfs format "3.6" with standard journal Oct 14 15:15:19 Angband kernel: REISERFS (device md2): using ordered data mode Oct 14 15:15:19 Angband kernel: REISERFS (device md1): journal params: device md1, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Oct 14 15:15:19 Angband kernel: REISERFS (device md1): checking transaction log (md1) Oct 14 15:15:19 Angband kernel: REISERFS (device md2): journal params: device md2, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Oct 14 15:15:19 Angband kernel: REISERFS (device md2): checking transaction log (md2) Oct 14 15:15:19 Angband kernel: REISERFS (device md1): Using r5 hash to sort names Oct 14 15:15:19 Angband kernel: REISERFS (device md2): Using r5 hash to sort names Oct 14 15:15:19 Angband emhttp: shcmd (101): rm /etc/samba/smb-shares.conf >/dev/null 2>&1 Oct 14 15:15:19 Angband emhttp: shcmd (102): cp /etc/exports- /etc/exports Oct 14 15:15:19 Angband emhttp: shcmd (103): mkdir /mnt/user Oct 14 15:15:19 Angband emhttp: shcmd (104): /usr/local/sbin/shfs /mnt/user -o noatime,big_writes,allow_other,default_permissions Oct 14 15:15:20 Angband emhttp: shcmd (105): killall -HUP smbd Oct 14 15:15:20 Angband emhttp: shcmd (106): /etc/rc.d/rc.nfsd restart | logger Oct 14 15:15:34 Angband in.telnetd[32682]: connect from 204.101.124.189 (204.101.124.189) Oct 14 15:15:44 Angband login[32683]: ROOT LOGIN on `pts/0' from `204.101.124.189' Oct 14 15:17:11 Angband kernel: sd 8:0:0:0: [sdd] READ CAPACITY(16) failed Oct 14 15:17:11 Angband kernel: sd 8:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 Oct 14 15:17:11 Angband kernel: sd 8:0:0:0: [sdd] Sense not available. Oct 14 15:17:11 Angband kernel: sd 8:0:0:0: [sdd] READ CAPACITY failed Oct 14 15:17:11 Angband kernel: sd 8:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 Oct 14 15:17:11 Angband kernel: sd 8:0:0:0: [sdd] Sense not available. Oct 14 15:17:11 Angband kernel: sd 8:0:0:0: [sdd] Asking for cache data failed Oct 14 15:17:11 Angband kernel: sd 8:0:0:0: [sdd] Assuming drive cache: write through
October 15, 201015 yr There's definitely something screwy with that drive. The syslog you posted above shows: Oct 14 15:14:51 Angband kernel: md: import disk0: [8,0] (sda) WDC WD20EARS-00M WD-WMAZA0848746 offset: 63 size: 1953514552 Oct 14 15:14:51 Angband kernel: md: import disk1: [8,32] (sdc) WDC WD20EARS-00M WD-WMAZA0836750 offset: 63 size: 1953514552 Oct 14 15:14:51 Angband kernel: md: import disk2: [8,16] (sdb) WDC WD20EARS-00M WD-WMAZA0797388 offset: 63 size: 1953514552 Oct 14 15:14:51 Angband kernel: md: import disk3: [8,48] (sdd) WD-WMAZA0797388 WD-WMAZA0797388 offset: 63 size: 2147483616 According to the above, you have 4 disks in your server: sda, sdb, sdc, and sdd. sda-sdc are all 2 TB WD EARS, and they all look normal. sdd is reporting as larger than 2 TB (obviously wrong), and it is using the serial number from sdb as both it's model number and serial number. This is a new one for me. My guess is that the drive is fubarred, but let's see what the experts have to say about it. Since the drive is empty and you aren't putting any data at risk, you could always try some experimentation. For example, install a jumper and see what happens. Also, run some WD diagnostic software on it and see what it says. A SMART report could be helpful as well.
October 15, 201015 yr There's definitely something screwy with that drive. The syslog you posted above shows: Oct 14 15:14:51 Angband kernel: md: import disk0: [8,0] (sda) WDC WD20EARS-00M WD-WMAZA0848746 offset: 63 size: 1953514552 Oct 14 15:14:51 Angband kernel: md: import disk1: [8,32] (sdc) WDC WD20EARS-00M WD-WMAZA0836750 offset: 63 size: 1953514552 Oct 14 15:14:51 Angband kernel: md: import disk2: [8,16] (sdb) WDC WD20EARS-00M WD-WMAZA0797388 offset: 63 size: 1953514552 Oct 14 15:14:51 Angband kernel: md: import disk3: [8,48] (sdd) WD-WMAZA0797388 WD-WMAZA0797388 offset: 63 size: 2147483616 According to the above, you have 4 disks in your server: sda, sdb, sdc, and sdd. sda-sdc are all 2 TB WD EARS, and they all look normal. sdd is reporting as larger than 2 TB (obviously wrong), and it is using the serial number from sdb as both it's model number and serial number. This is a new one for me. My guess is that the drive is fubarred, but let's see what the experts have to say about it. Since the drive is empty and you aren't putting any data at risk, you could always try some experimentation. For example, install a jumper and see what happens. Also, run some WD diagnostic software on it and see what it says. A SMART report could be helpful as well. I'd like to see the full syslog, so we can look how the drive is initially identified on the disk controller. Please attach the full syslog from an initial clean boot to your next post. (you can zip it if it too large otherwise, or use a pastebin service) It could just as easily be a disk controller bug... Since there is no way I can think of that a second drive could use as its model number the serial number of a different drive. Is the disk controller perhaps set to a RAID mode? Or buggy firmware on a disk controller could do it though. Another possibility is a flaw in the udev implementation when creating the /dev/disk/by-id entries. Please also post the output of ls -l /dev/disk/by-id Since the drive assignments are by PCI device, and it appears as if two devices are involved, I'm curious how they could share a serial number. You might want to send an e-mail to lime-tech... and another to the disk controller manufacturer... Joe L.
October 15, 201015 yr Author There's definitely something screwy with that drive. The syslog you posted above shows: Oct 14 15:14:51 Angband kernel: md: import disk0: [8,0] (sda) WDC WD20EARS-00M WD-WMAZA0848746 offset: 63 size: 1953514552 Oct 14 15:14:51 Angband kernel: md: import disk1: [8,32] (sdc) WDC WD20EARS-00M WD-WMAZA0836750 offset: 63 size: 1953514552 Oct 14 15:14:51 Angband kernel: md: import disk2: [8,16] (sdb) WDC WD20EARS-00M WD-WMAZA0797388 offset: 63 size: 1953514552 Oct 14 15:14:51 Angband kernel: md: import disk3: [8,48] (sdd) WD-WMAZA0797388 WD-WMAZA0797388 offset: 63 size: 2147483616 According to the above, you have 4 disks in your server: sda, sdb, sdc, and sdd. sda-sdc are all 2 TB WD EARS, and they all look normal. sdd is reporting as larger than 2 TB (obviously wrong), and it is using the serial number from sdb as both it's model number and serial number. This is a new one for me. My guess is that the drive is fubarred, but let's see what the experts have to say about it. Since the drive is empty and you aren't putting any data at risk, you could always try some experimentation. For example, install a jumper and see what happens. Also, run some WD diagnostic software on it and see what it says. A SMART report could be helpful as well. I'd like to see the full syslog, so we can look how the drive is initially identified on the disk controller. Please attach the full syslog from an initial clean boot to your next post. (you can zip it if it too large otherwise, or use a pastebin service) It could just as easily be a disk controller bug... Since there is no way I can think of that a second drive could use as its model number the serial number of a different drive. Is the disk controller perhaps set to a RAID mode? Or buggy firmware on a disk controller could do it though. Another possibility is a flaw in the udev implementation when creating the /dev/disk/by-id entries. Please also post the output of ls -l /dev/disk/by-id Since the drive assignments are by PCI device, and it appears as if two devices are involved, I'm curious how they could share a serial number. You might want to send an e-mail to lime-tech... and another to the disk controller manufacturer... Joe L. Ok, so I ran the WD utility and it zero-filled the drive, SMART passes. I'm doing another "pre_clear" in unRAID, this time with the jumper on 7-8. I'm attaching the syslog (zipped with winrar) here as requested. Also, below is the result of the ls -l /dev/disk/by-id command: total 0 lrwxrwxrwx 1 root root 9 Oct 14 21:07 ata-WDC_WD20EADS-11R6B1_WD-WCAVY1226097 -> ../../sdd lrwxrwxrwx 1 root root 9 Oct 14 21:07 ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0797388 -> ../../sdb lrwxrwxrwx 1 root root 10 Oct 14 21:07 ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0797388-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 9 Oct 14 21:08 ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0836750 -> ../../sdc lrwxrwxrwx 1 root root 10 Oct 14 21:08 ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0836750-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 9 Oct 14 21:07 ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0848746 -> ../../sda lrwxrwxrwx 1 root root 10 Oct 14 21:07 ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0848746-part1 -> ../../sda1 lrwxrwxrwx 1 root root 9 Oct 14 21:08 ata-WDC_WD800JB-00CRA1_WD-WCA8E5497388 -> ../../hda lrwxrwxrwx 1 root root 10 Oct 14 21:08 ata-WDC_WD800JB-00CRA1_WD-WCA8E5497388-part1 -> ../../hda1 lrwxrwxrwx 1 root root 9 Oct 14 21:07 scsi-SATA_WDC_WD20EADS-11_WD-WCAVY1226097 -> ../../sdd lrwxrwxrwx 1 root root 9 Oct 14 21:07 scsi-SATA_WDC_WD20EARS-00_WD-WMAZA0797388 -> ../../sdb lrwxrwxrwx 1 root root 10 Oct 14 21:07 scsi-SATA_WDC_WD20EARS-00_WD-WMAZA0797388-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 9 Oct 14 21:08 scsi-SATA_WDC_WD20EARS-00_WD-WMAZA0836750 -> ../../sdc lrwxrwxrwx 1 root root 10 Oct 14 21:08 scsi-SATA_WDC_WD20EARS-00_WD-WMAZA0836750-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 9 Oct 14 21:07 scsi-SATA_WDC_WD20EARS-00_WD-WMAZA0848746 -> ../../sda lrwxrwxrwx 1 root root 10 Oct 14 21:07 scsi-SATA_WDC_WD20EARS-00_WD-WMAZA0848746-part1 -> ../../sda1 lrwxrwxrwx 1 root root 9 Oct 14 21:08 usb-FLASH_Drive_UT_USB20_100822c30e1799-0:0 -> ../../sde lrwxrwxrwx 1 root root 10 Oct 14 21:08 usb-FLASH_Drive_UT_USB20_100822c30e1799-0:0-part1 -> ../../sde1 Thanks again for the fellowship on this, it's great to have a place to post puzzles without being treated like an idiot syslog.zip
October 15, 201015 yr The /dev/disk/by-id shows a distinct model/serial number for /dev/sdd lrwxrwxrwx 1 root root 9 Oct 14 21:07 ata-WDC_WD20EADS-11R6B1_WD-WCAVY1226097 -> ../../sdd Now, if you can post the contents of your /boot/config/disk.cfg file You can get it with by typing cat /boot/config/disk.cfg Joe L.
October 15, 201015 yr It shows up correctly when initially scanned by the disk controller: Oct 14 21:08:08 Angband kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Oct 14 21:08:08 Angband kernel: ata8.00: ATA-8: WDC WD20EADS-11R6B1, 01.00A01, max UDMA/133 Oct 14 21:08:08 Angband kernel: ata8.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 0/32) Oct 14 21:08:08 Angband kernel: ata8.00: configured for UDMA/100 Oct 14 21:08:08 Angband kernel: scsi 8:0:0:0: Direct-Access ATA WDC WD20EADS-11R 01.0 PQ: 0 ANSI: 5 Oct 14 21:08:08 Angband kernel: sd 8:0:0:0: [sdd] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) Oct 14 21:08:08 Angband kernel: sd 8:0:0:0: [sdd] 4096-byte physical blocks Oct 14 21:08:08 Angband kernel: sd 8:0:0:0: [sdd] Write Protect is off Oct 14 21:08:08 Angband kernel: sd 8:0:0:0: [sdd] Mode Sense: 00 3a 00 00 Oct 14 21:08:08 Angband kernel: sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA And again is correct when listed by emhttp in the device inventory Oct 14 21:08:09 Angband emhttp: Device inventory: Oct 14 21:08:09 Angband emhttp: pci-0000:00:06.0-ide-0:0 ide0 (hda) WDC_WD800JB-00CRA1_WD-WCA8E5497388 Oct 14 21:08:09 Angband emhttp: pci-0000:05:08.0-scsi-0:0:0:0 host5 (sda) WDC_WD20EARS-00MVWB0_WD-WMAZA0848746 Oct 14 21:08:09 Angband emhttp: pci-0000:05:08.0-scsi-1:0:0:0 host6 (sdb) WDC_WD20EARS-00MVWB0_WD-WMAZA0797388 Oct 14 21:08:09 Angband emhttp: pci-0000:05:08.0-scsi-2:0:0:0 host7 (sdc) WDC_WD20EARS-00MVWB0_WD-WMAZA0836750 Oct 14 21:08:09 Angband emhttp: pci-0000:05:08.0-scsi-3:0:0:0 host8 (sdd) WDC_WD20EADS-11R6B1_WD-WCAVY1226097 It looked good when you started a pre-clear on it, but you did not complete the pre-clear (no post-clear entries in the log) You apparently dropped the telnet connection. Then, when you assigned it to the array it also looked good: Oct 15 08:37:06 Angband kernel: md: import disk0: [8,0] (sda) WDC WD20EARS-00M WD-WMAZA0848746 offset: 63 size: 1953514552 Oct 15 08:37:06 Angband kernel: md: import disk1: [8,32] (sdc) WDC WD20EARS-00M WD-WMAZA0836750 offset: 63 size: 1953514552 Oct 15 08:37:06 Angband kernel: md: import disk2: [8,16] (sdb) WDC WD20EARS-00M WD-WMAZA0797388 offset: 63 size: 1953514552 Oct 15 08:37:06 Angband kernel: md: import disk3: [8,48] (sdd) WDC WD20EADS-11R WD-WCAVY1226097 offset: 63 size: 1953514552 Apparently, when you rebooted since posting the prior segment of the syslog it correctly identified itself. Were you hot-plugging the drive? (unRAID is NOT a hot-plug OS and that could have gotten it confused)
October 15, 201015 yr Author It shows up correctly when initially scanned by the disk controller: Oct 14 21:08:08 Angband kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Oct 14 21:08:08 Angband kernel: ata8.00: ATA-8: WDC WD20EADS-11R6B1, 01.00A01, max UDMA/133 Oct 14 21:08:08 Angband kernel: ata8.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 0/32) Oct 14 21:08:08 Angband kernel: ata8.00: configured for UDMA/100 Oct 14 21:08:08 Angband kernel: scsi 8:0:0:0: Direct-Access ATA WDC WD20EADS-11R 01.0 PQ: 0 ANSI: 5 Oct 14 21:08:08 Angband kernel: sd 8:0:0:0: [sdd] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) Oct 14 21:08:08 Angband kernel: sd 8:0:0:0: [sdd] 4096-byte physical blocks Oct 14 21:08:08 Angband kernel: sd 8:0:0:0: [sdd] Write Protect is off Oct 14 21:08:08 Angband kernel: sd 8:0:0:0: [sdd] Mode Sense: 00 3a 00 00 Oct 14 21:08:08 Angband kernel: sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA I agree, it's always looked fine when it's scanned on boot. And again is correct when listed by emhttp in the device inventory Oct 14 21:08:09 Angband emhttp: Device inventory: Oct 14 21:08:09 Angband emhttp: pci-0000:00:06.0-ide-0:0 ide0 (hda) WDC_WD800JB-00CRA1_WD-WCA8E5497388 Oct 14 21:08:09 Angband emhttp: pci-0000:05:08.0-scsi-0:0:0:0 host5 (sda) WDC_WD20EARS-00MVWB0_WD-WMAZA0848746 Oct 14 21:08:09 Angband emhttp: pci-0000:05:08.0-scsi-1:0:0:0 host6 (sdb) WDC_WD20EARS-00MVWB0_WD-WMAZA0797388 Oct 14 21:08:09 Angband emhttp: pci-0000:05:08.0-scsi-2:0:0:0 host7 (sdc) WDC_WD20EARS-00MVWB0_WD-WMAZA0836750 Oct 14 21:08:09 Angband emhttp: pci-0000:05:08.0-scsi-3:0:0:0 host8 (sdd) WDC_WD20EADS-11R6B1_WD-WCAVY1226097 Yup, I saw that too... It looked good when you started a pre-clear on it, but you did not complete the pre-clear (no post-clear entries in the log) You apparently dropped the telnet connection. I had been running the pre-clear on it yesterday but I did lose the telnet (stupid windows updates). I have it doing another pre-clear as we type Then, when you assigned it to the array it also looked good: Oct 15 08:37:06 Angband kernel: md: import disk0: [8,0] (sda) WDC WD20EARS-00M WD-WMAZA0848746 offset: 63 size: 1953514552 Oct 15 08:37:06 Angband kernel: md: import disk1: [8,32] (sdc) WDC WD20EARS-00M WD-WMAZA0836750 offset: 63 size: 1953514552 Oct 15 08:37:06 Angband kernel: md: import disk2: [8,16] (sdb) WDC WD20EARS-00M WD-WMAZA0797388 offset: 63 size: 1953514552 Oct 15 08:37:06 Angband kernel: md: import disk3: [8,48] (sdd) WDC WD20EADS-11R WD-WCAVY1226097 offset: 63 size: 1953514552 Apparently, when you rebooted since posting the prior segment of the syslog it correctly identified itself. Were you hot-plugging the drive? (unRAID is NOT a hot-plug OS and that could have gotten it confused) I put it into the array, but I took it out fearing that the pre-clear was not completed and it would lock out the array all day (wife is ill at home and wants media ) while it was clearing. It's about 20% done the preclear_drive.sh script now. I definitely do not hot-plug these drives.
October 15, 201015 yr Author The saga continues... another non-successful pre-clear attempt, syslog attached to post. I should mention this was a drive taken out of a "mybook" external cage. I don't know if that matters... Le sigh... syslog.txt
October 18, 201015 yr Author Ok gang, so I put in a WD20EAVS 1TB drive this weekend and it pre-cleard and then added itself fine. I'm RMAing the 2TB drive on the grounds that it won't perform a zero-fill correctly. Wish me luck
Archived
This topic is now archived and is closed to further replies.