greybeard

Members
  • Posts

    127
  • Joined

  • Last visited

Everything posted by greybeard

  1. Found this over on SlickDeals. "Expires 6/13/10 or after first 2,000 uses." Limit: 1 per person. eWiz (SuperBiiz) Samsung HD154UI $84.99 before coupon code. Use coupon code "ILOVEDAD". Actually the coupon is good for $10 a purchase over $75.
  2. In my case the errors all showed up when using various functions of unMenu or running the preclear script. I am guessing that in a pure unRAID system there would have been no errors but have not tested that theory. I am not suggesting there is anything wrong with either unMenu or the preclear script, they are great tools that do things unRAID doesn't. Ultimately I supsect something at a lower level, like the MV8 driver or something in the SAS support module.
  3. For one drive I issued command "smartctl -s on /dev/sdb" from a telnet session. Be sure to change /sdb to your device. For another drive it was turnd on automatically by connecting the drive to a motherboard port and running a smart status display. Not sure if it was just connecting the drive to the mobo port or actually running the smart status that actually enabled SMART reporting, since I did both. I understand there is a command in the preclear script to enable smart reporting but for some reason it failed to enable it on these drives in my system. I ran preclear on all three drives but still had to explicitly enable SMART reproting after the preclear was done on all three of them.
  4. Just 2 AOC-SASLP-MV8. BIOS init is fine. On my system all error messages and problems with the 2TB sammys stopped after enabling SMART reporting on them. They are now active and working perfectly. One of them is my parity drive.
  5. FWIW, I used a 4" steel C clamp on a 590 to fold those tabs down as flat as a pancake. Started them in the right direction with pliars and then tightened the calmp on them. Had to do them one at a time so it took a little while but I am very happy with the result. Due to differences in case design I could not get the same process to work on an Antech 1200.
  6. Unfortunately the syslog I saved after letting the preclear run through the full script had so many error messages in it that it had wrapped. Here is what came out at the end of the preclear. May 15 23:53:51 Cyclops kernel: ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 15 23:53:51 Cyclops kernel: ata1: status=0x51 { DriveReady SeekComplete Error } May 15 23:53:51 Cyclops kernel: ata1: error=0x04 { DriveStatusError } May 15 23:53:51 Cyclops preclear_disk-finish[12377]: smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen May 15 23:53:51 Cyclops preclear_disk-finish[12377]: Home page is http://smartmontools.sourceforge.net/ May 15 23:53:51 Cyclops preclear_disk-finish[12377]: May 15 23:53:51 Cyclops preclear_disk-finish[12377]: Smartctl: Device Read Identity Failed (not an ATA/ATAPI device) May 15 23:53:51 Cyclops preclear_disk-finish[12377]: May 15 23:53:51 Cyclops preclear_disk-finish[12377]: A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. May 15 23:53:51 Cyclops kernel: ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 15 23:53:51 Cyclops kernel: ata1: status=0x51 { DriveReady SeekComplete Error } May 15 23:53:51 Cyclops kernel: ata1: error=0x04 { DriveStatusError } May 15 23:53:51 Cyclops preclear_disk-diff[12390]: ============================================================================ May 15 23:53:51 Cyclops preclear_disk-diff[12390]: == May 15 23:53:51 Cyclops preclear_disk-diff[12390]: == Disk /dev/sdb has been successfully precleared May 15 23:53:51 Cyclops preclear_disk-diff[12390]: == May 15 23:53:51 Cyclops preclear_disk-diff[12390]: == Ran 1 preclear-disk cycle May 15 23:53:51 Cyclops preclear_disk-diff[12390]: == May 15 23:53:51 Cyclops preclear_disk-diff[12390]: == Using :Read block size = 8225280 Bytes May 15 23:53:51 Cyclops preclear_disk-diff[12390]: == Last Cycle's Pre Read Time : 7:28:13 (74 MB/s) May 15 23:53:51 Cyclops preclear_disk-diff[12390]: == Last Cycle's Zeroing time : 6:41:36 (83 MB/s) May 15 23:53:51 Cyclops preclear_disk-diff[12390]: == Last Cycle's Post Read Time : 15:01:32 (36 MB/s) May 15 23:53:51 Cyclops preclear_disk-diff[12390]: == Last Cycle's Total Time : 29:12:25 May 15 23:53:51 Cyclops preclear_disk-diff[12390]: == May 15 23:53:51 Cyclops preclear_disk-diff[12390]: == Total Elapsed Time 29:12:25 May 15 23:53:51 Cyclops preclear_disk-diff[12390]: == May 15 23:53:51 Cyclops preclear_disk-diff[12390]: == May 15 23:53:51 Cyclops preclear_disk-diff[12390]: ============================================================================ May 15 23:53:51 Cyclops preclear_disk-diff[12390]: S.M.A.R.T. error count differences detected after pre-clear May 15 23:53:51 Cyclops preclear_disk-diff[12390]: note, some 'raw' values may change, but not be an indication of a problem May 15 23:53:51 Cyclops preclear_disk-diff[12390]: 4,11c4 May 15 23:53:51 Cyclops preclear_disk-diff[12390]: < === START OF INFORMATION SECTION === May 15 23:53:51 Cyclops preclear_disk-diff[12390]: < Device Model: SAMSUNG HD203WI May 15 23:53:51 Cyclops preclear_disk-diff[12390]: < Serial Number: S1UYJ1KZ402327 May 15 23:53:51 Cyclops preclear_disk-diff[12390]: < Firmware Version: 1AN10002 May 15 23:53:51 Cyclops preclear_disk-diff[12390]: < User Capacity: 2,000,398,934,016 bytes May 15 23:53:51 Cyclops preclear_disk-diff[12390]: < Device is: In smartctl database [for details use: -P show] May 15 23:53:51 Cyclops preclear_disk-diff[12390]: < ATA Version is: 8 May 15 23:53:51 Cyclops preclear_disk-diff[12390]: < ATA Standard is: Not recognized. Minor revision code: 0x28 May 15 23:53:51 Cyclops preclear_disk-diff[12390]: --- May 15 23:53:51 Cyclops preclear_disk-diff[12390]: > Smartctl: Device Read Identity Failed (not an ATA/ATAPI device) May 15 23:53:51 Cyclops preclear_disk-diff[12390]: 13,18c6 May 15 23:53:51 Cyclops preclear_disk-diff[12390]: < ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details. May 15 23:53:51 Cyclops preclear_disk-diff[12390]: < May 15 23:53:51 Cyclops preclear_disk-diff[12390]: < SMART support is: Available - device has SMART capability. May 15 23:53:51 Cyclops preclear_disk-diff[12390]: < SMART support is: Disabled May 15 23:53:51 Cyclops preclear_disk-diff[12390]: < May 15 23:53:51 Cyclops preclear_disk-diff[12390]: < SMART Disabled. Use option -s with argument 'on' to enable it. May 15 23:53:51 Cyclops preclear_disk-diff[12390]: --- May 15 23:53:51 Cyclops preclear_disk-diff[12390]: > A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. May 15 23:53:51 Cyclops preclear_disk-diff[12390]: ============================================================================ May 15 23:53:51 Cyclops preclear_disk-diff[12390]:
  7. I recently picked up three Samsung HD203WI 2TB drives. Getting them working correctly with a AOC-SASLP-MV8 controller was not without issues. Ultimately they are working fine. Maybe this will help someone else to understand any odditites they might run into. All drives were first run through a standalone, multi pattern, multi pass wipe and verify operation so when they were first add to the unRAID box they had already been through a couple of days of operation and were completely erased to zeros. I first connected a drive to the MV8 and used unMenu to check it out. Here is what I got. (During all this the array was stopped) First time I clicked "Disk Management": <-------------- May 17 18:29:56 Cyclops kernel: ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 17 18:29:56 Cyclops kernel: ata1: status=0x51 { DriveReady SeekComplete Error } May 17 18:29:56 Cyclops kernel: ata1: error=0x04 { DriveStatusError } May 17 18:29:56 Cyclops kernel: ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 17 18:29:56 Cyclops kernel: ata1: status=0x51 { DriveReady SeekComplete Error } May 17 18:29:56 Cyclops kernel: ata1: error=0x04 { DriveStatusError } May 17 18:29:56 Cyclops kernel: ------------[ cut here ]------------ May 17 18:29:56 Cyclops kernel: WARNING: at drivers/ata/libata-core.c:5186 ata_qc_issue+0x10b/0x308() May 17 18:29:56 Cyclops kernel: Hardware name: To Be Filled By O.E.M. May 17 18:29:56 Cyclops kernel: Modules linked in: md_mod xor atiixp ahci r8169 mvsas libsas scst scsi_transport_sas May 17 18:29:56 Cyclops kernel: Pid: 2424, comm: smartctl Tainted: G W 2.6.32.9-unRAID #1 May 17 18:29:56 Cyclops kernel: Call Trace: May 17 18:29:56 Cyclops kernel: [] warn_slowpath_common+0x60/0x77 May 17 18:29:56 Cyclops kernel: [] warn_slowpath_null+0xd/0x10 May 17 18:29:56 Cyclops kernel: [] ata_qc_issue+0x10b/0x308 May 17 18:29:56 Cyclops kernel: [] ata_scsi_translate+0xd1/0xff May 17 18:29:56 Cyclops kernel: [] ? scsi_done+0x0/0xd May 17 18:29:56 Cyclops kernel: [] ? scsi_done+0x0/0xd May 17 18:29:56 Cyclops kernel: [] ata_sas_queuecmd+0x120/0x1d7 May 17 18:29:56 Cyclops kernel: [] ? ata_scsi_pass_thru+0x0/0x21d May 17 18:29:56 Cyclops kernel: [] sas_queuecommand+0x65/0x20d [libsas] May 17 18:29:56 Cyclops kernel: [] ? scsi_done+0x0/0xd May 17 18:29:56 Cyclops kernel: [] scsi_dispatch_cmd+0x147/0x181 May 17 18:29:56 Cyclops kernel: [] scsi_request_fn+0x351/0x376 May 17 18:29:56 Cyclops kernel: [] __generic_unplug_device+0x26/0x29 May 17 18:29:56 Cyclops kernel: [] blk_execute_rq_nowait+0x56/0x73 May 17 18:29:56 Cyclops kernel: [] blk_execute_rq+0x75/0x91 May 17 18:29:56 Cyclops kernel: [] ? blk_end_sync_rq+0x0/0x28 May 17 18:29:56 Cyclops kernel: [] ? get_request+0x204/0x28d May 17 18:29:56 Cyclops kernel: [] ? get_request_wait+0x2b/0xd9 May 17 18:29:56 Cyclops kernel: [] ? __blk_put_request+0x8c/0x92 May 17 18:29:56 Cyclops kernel: [] scsi_execute+0xbf/0x113 May 17 18:29:56 Cyclops kernel: [] ata_task_ioctl+0xb4/0x165 May 17 18:29:56 Cyclops kernel: [] ata_sas_scsi_ioctl+0x1d5/0x1e8 May 17 18:29:56 Cyclops kernel: [] ? sched_clock_local+0x11/0x135 May 17 18:29:56 Cyclops kernel: [] sas_ioctl+0x36/0x43 [libsas] May 17 18:29:56 Cyclops kernel: [] scsi_ioctl+0x299/0x2ad May 17 18:29:56 Cyclops kernel: [] ? sas_ioctl+0x0/0x43 [libsas] May 17 18:29:56 Cyclops kernel: [] sd_ioctl+0x80/0x8c May 17 18:29:56 Cyclops kernel: [] __blkdev_driver_ioctl+0x50/0x62 May 17 18:29:56 Cyclops kernel: [] blkdev_ioctl+0x8b0/0x8dc May 17 18:29:56 Cyclops kernel: [] ? kobject_get+0x12/0x17 May 17 18:29:56 Cyclops kernel: [] ? get_disk+0x4a/0x61 May 17 18:29:56 Cyclops kernel: [] ? kmap_atomic+0x14/0x16 May 17 18:29:56 Cyclops kernel: [] ? radix_tree_lookup_slot+0xd/0xf May 17 18:29:56 Cyclops kernel: [] ? filemap_fault+0xb8/0x305 May 17 18:29:56 Cyclops kernel: [] ? unlock_page+0x18/0x1b May 17 18:29:56 Cyclops kernel: [] ? __do_fault+0x3a7/0x3da May 17 18:29:56 Cyclops kernel: [] ? handle_mm_fault+0x42d/0x8f1 May 17 18:29:56 Cyclops kernel: [] block_ioctl+0x2a/0x32 May 17 18:29:56 Cyclops kernel: [] ? block_ioctl+0x0/0x32 May 17 18:29:56 Cyclops kernel: [] vfs_ioctl+0x22/0x67 May 17 18:29:56 Cyclops kernel: [] do_vfs_ioctl+0x478/0x4ac May 17 18:29:56 Cyclops kernel: [] ? expand_downwards+0x109/0x136 May 17 18:29:56 Cyclops kernel: [] sys_ioctl+0x2c/0x45 May 17 18:29:56 Cyclops kernel: [] syscall_call+0x7/0xb May 17 18:29:56 Cyclops kernel: ---[ end trace 170bf42208302810 ]--- May 17 18:29:56 Cyclops ata_id[2445]: HDIO_GET_IDENTITY failed for '/dev/block/8:32' Everytime I clicked "Disk Management" after the first time: <------- May 17 18:31:39 Cyclops kernel: ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 17 18:31:39 Cyclops kernel: ata1: status=0x51 { DriveReady SeekComplete Error } May 17 18:31:39 Cyclops kernel: ata1: error=0x04 { DriveStatusError } May 17 18:31:39 Cyclops kernel: ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 17 18:31:39 Cyclops kernel: ata1: status=0x51 { DriveReady SeekComplete Error } May 17 18:31:39 Cyclops kernel: ata1: error=0x04 { DriveStatusError } May 17 18:31:39 Cyclops ata_id[2643]: HDIO_GET_IDENTITY failed for '/dev/block/8:32' Every time I clicked "Smart Status" for the drive: <---------- SMART status Info for /dev/sdb smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: SAMSUNG HD203WI Serial Number: S1UYJ1KZ402329 Firmware Version: 1AN10002 User Capacity: 2,000,398,934,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Not recognized. Minor revision code: 0x28 Local Time is: Mon May 17 18:33:16 2010 GMT+5 ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details. SMART support is: Available - device has SMART capability. SMART support is: Disabled SMART Disabled. Use option -s with argument 'on' to enable it. When running preclear on the drive: <------------- May 17 18:40:19 Cyclops kernel: ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 17 18:40:19 Cyclops kernel: ata1: status=0x51 { DriveReady SeekComplete Error } May 17 18:40:19 Cyclops kernel: ata1: error=0x04 { DriveStatusError } May 17 18:40:19 Cyclops kernel: ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 17 18:40:19 Cyclops kernel: ata1: status=0x51 { DriveReady SeekComplete Error } May 17 18:40:19 Cyclops kernel: ata1: error=0x04 { DriveStatusError } May 17 18:40:19 Cyclops kernel: ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 17 18:40:19 Cyclops kernel: ata1: status=0x51 { DriveReady SeekComplete Error } May 17 18:40:19 Cyclops kernel: ata1: error=0x04 { DriveStatusError } May 17 18:40:19 Cyclops kernel: ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 17 18:40:19 Cyclops kernel: ata1: status=0x51 { DriveReady SeekComplete Error } May 17 18:40:19 Cyclops kernel: ata1: error=0x04 { DriveStatusError } May 17 18:40:19 Cyclops kernel: ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 17 18:40:19 Cyclops kernel: ata1: status=0x51 { DriveReady SeekComplete Error } May 17 18:40:19 Cyclops kernel: ata1: error=0x04 { DriveStatusError } May 17 18:40:19 Cyclops kernel: ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 17 18:40:19 Cyclops kernel: ata1: status=0x51 { DriveReady SeekComplete Error } May 17 18:40:19 Cyclops kernel: ata1: error=0x04 { DriveStatusError } The above set of messages were displayed every time the preclear Telnet status screen was updated. This went on through the entire pre read and post read. No messages while writing zeros. Even though one drive would exibit all these issues a second HD203WI, that had already been "fixed" was connected to another port of the MV8 and it was working perfectly at exactly the same time. I had these same problems with every new 2TB drive until they were "fixed". From then on all was good. So what did it take to "fix" (very bad word, it has more to do with configuration than fixing) the drive? It seems all that was really necessary is to follow the advice of SMART status display and enable SMART using command: smartctl -s on /dev/sdb (if you issue this command be sure to use your device ID) The really interresting thing is there seems to be something that does this automatically in certain situations. Before I figured this out I had moved on of the drives from an MV8 port to a mobo port, clicked the Smart Status button and to my surprise a full SMART status was displayed. Moved the drive back to a MV8 port and all the problems were gawn. On another drive I issued the smartctl command to enable SMART reporting without ever moving the drive and that fixed the probelms too. So there must be something that comes into play when running a SmartStatus from a mobo port connected drive that stays sitting on the sideline while the drive is connected to the MV8.
  8. I decided to take a chance and test the simulated read error with my primary array. Here are the results. Before: Disk status Model / Serial No. Temperature Size Free Reads Writes Errors parity SAMSUNG_HD203WI_S1UYJ1KZ402326 22°C 1,953,514,552 - 40 17 0 disk1 SAMSUNG_HD154UI_S1XWJDWZ203486 18°C 1,465,138,552 42,088 58 5 0 disk2 SAMSUNG_HD154UI_S1XWJDWZ203489 17°C 1,465,138,552 72,340 56 5 0 disk3 SAMSUNG_HD154UI_S1XWJ1KZ104867 19°C 1,465,138,552 101,888,144 58 5 0 disk4 SAMSUNG_HD154UI_S1XWJ1KS912511 18°C 1,465,138,552 1,465,060,992 50 5 0 Telent session: Cyclops login: root Linux 2.6.32.9-unRAID. root@Cyclops:~# /root/mdcmd set rderror 4 cmdOper=set cmdResult=ok root@Cyclops:~# Syslog: May 16 10:14:52 Cyclops in.telnetd[2203]: connect from 192.168.0.21 (192.168.0.21) May 16 10:14:55 Cyclops login[2204]: ROOT LOGIN on `pts/0' from `192.168.0.21' May 16 10:15:43 Cyclops kernel: mdcmd (59): set rderror 4 May 16 10:15:43 Cyclops kernel: May 16 10:17:16 Cyclops kernel: md: disk4 read error May 16 10:17:16 Cyclops kernel: handle_stripe read error: 262144/4, count: 1 After: Disk status Model / Serial No. Temperature Size Free Reads Writes Errors parity SAMSUNG_HD203WI_S1UYJ1KZ402326 22°C 1,953,514,552 - 47 28 0 disk1 SAMSUNG_HD154UI_S1XWJDWZ203486 18°C 1,465,138,552 42,088 59 5 0 disk2 SAMSUNG_HD154UI_S1XWJDWZ203489 18°C 1,465,138,552 72,340 57 5 0 disk3 SAMSUNG_HD154UI_S1XWJ1KZ104867 19°C 1,465,138,552 101,888,144 59 5 0 disk4 SAMSUNG_HD154UI_S1XWJ1KS912511 18°C 1,465,138,552 1,465,060,992 56 17 1 Thoughts and conclusions: 1. unRAID handling of the simulated read error is perfect 2. There is a serious problem with something relating specifically to the AOC-SASLP-MV8 controller. If a Samsung HD154UI reports just one sector read error the result is a barage of hundreds of nasty error messages that evetually lead to the drive being disabled. It is entirely possible that the barage of error messages is the result of continued attempts to access the drive that fail not because of a hardware problem but rather because software driver problems, up until the drive is finally disabled. The unRAID write to the drive apparently does not get through because the drive is left with a pending re-allocate. 3. Another way to state the problem is that when one sector read error is reported by the drive the driver looses the ability to correctly communicate to the drive. 4. Even though I am using Samsung drives, my guess is the symptoms would be the same for any drive connected to a AOC-SASLP-MV8. Since there is no way to force the hard drive to report a sector read error it seems we need to wait for read errors to hapen naturally to collect more evidence that supports or refutes my conclusions. Does anyone have drives connected to a AOC-SASLP-MV8 that have had read errors reported in the management console or syslog?
  9. Thanks for the tip. I will figure out someway to safely give that a try. Literal interpitation suggests that this will exercise the error logic in the OS and unRAID layers, not in the driver itself. If that is the case and the driver itself is actually causing the problem, the simulated error will probably be recoverd just fine. I'll provide an update when I can give it a try but it will be several days. I am thinking I will boot a 3 drive free version of unRAID and create a test array. Also the fact that "197 Current_Pending_Sector = 2" while the drive was in the array and only accessed via unRAID is not a good sign. I would expect pending re-allocates to always be cleared by properly functioning error recover.
  10. Agreed, to the extent the driver is reporting it correctly to the OS. The SMART doc I read is pretty clear that "187 Reported_Uncorrect = 4" indicates the drive reported 4 unrecoverable read errors. Yes I did. Immidiatly after having this problem SMART showed "197 Current_Pending_Sector = 2". This only changed back to zero after I took the drive out of the array and ran a preclear on it. I understand that just because a sector is marked for pending re-allocation that does not necessairly mean it is reallocated. The drive decides what to do on the next write to that sector. Also note that on both 5/7 and 5/8 the problem started at about the same place on the drive and everything was just fine prior to these errors. May 7 12:54:37 Cyclops kernel: end_request: I/O error, dev sdc, sector 1926635807 May 8 06:55:12 Cyclops kernel: end_request: I/O error, dev sdc, sector 1926635783 I don't beleive that is a coincidence. I suppose the bad behavior could be a controller firmware problem but that would be very discumforting. I do know that by replacing the drive with one that has never reported a sector read error I have gotten arround the problem. I am concerned about what will happen the next time any one of my drives connected to the MV8 reports an unreadable sector. Maybe I will just have to wait and see and hope it does not happen for a long time. I fear that one read error will result in the drive being disabled (red, invalid). This might be the price I pay for using a controller that has only been supported for a month or two.
  11. I am concerned that there might be an issue when a drive connected to a AOC-SASLP-MV8 controller reports a read error. In my case the drive is a 1.5TB Samsung. My first hint of a prblem came on the third pass of a preclear something bad happend that caused it to completely fail. I did not capture any documentation that time but just went on and ran another preclear that finished without error. - reboot - Added the drive to the arrary - Filled it with data - Started running a utility to verify that all files copied correctly - During the file verify I got the following and the drive was disabled May 7 12:54:37 Cyclops kernel: sas: command 0xf6786240, task 0xc39d7400, timed out: BLK_EH_NOT_HANDLED May 7 12:54:37 Cyclops kernel: sas: Enter sas_scsi_recover_host May 7 12:54:37 Cyclops kernel: sas: trying to find task 0xc39d7400 May 7 12:54:37 Cyclops kernel: sas: sas_scsi_find_task: aborting task 0xc39d7400 May 7 12:54:37 Cyclops kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 1701:mvs_abort_task:rc= 5 May 7 12:54:37 Cyclops kernel: sas: sas_scsi_find_task: querying task 0xc39d7400 May 7 12:54:37 Cyclops kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 1645:mvs_query_task:rc= 5 May 7 12:54:37 Cyclops kernel: sas: sas_scsi_find_task: task 0xc39d7400 failed to abort May 7 12:54:37 Cyclops kernel: sas: task 0xc39d7400 is not at LU: I_T recover May 7 12:54:37 Cyclops kernel: sas: I_T nexus reset for dev 0100000000000000 May 7 12:54:37 Cyclops kernel: sas: I_T 0100000000000000 recovered May 7 12:54:37 Cyclops kernel: sas: --- Exit sas_scsi_recover_host May 7 12:54:37 Cyclops kernel: ata2: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 7 12:54:37 Cyclops kernel: ata2: status=0x41 { DriveReady Error } May 7 12:54:37 Cyclops kernel: ata2: error=0x04 { DriveStatusError } May 7 12:54:37 Cyclops kernel: ata2: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 7 12:54:37 Cyclops kernel: ata2: status=0x41 { DriveReady Error } May 7 12:54:37 Cyclops kernel: ata2: error=0x04 { DriveStatusError } May 7 12:54:37 Cyclops kernel: ata2: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 7 12:54:37 Cyclops kernel: ata2: status=0x41 { DriveReady Error } May 7 12:54:37 Cyclops kernel: ata2: error=0x04 { DriveStatusError } May 7 12:54:37 Cyclops kernel: ata2: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 7 12:54:37 Cyclops kernel: ata2: status=0x41 { DriveReady Error } May 7 12:54:37 Cyclops kernel: ata2: error=0x04 { DriveStatusError } May 7 12:54:37 Cyclops kernel: ata2: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 7 12:54:37 Cyclops kernel: ata2: status=0x41 { DriveReady Error } May 7 12:54:37 Cyclops kernel: ata2: error=0x04 { DriveStatusError } May 7 12:54:37 Cyclops kernel: sd 2:0:1:0: [sdc] Result: hostbyte=0x00 driverbyte=0x08 May 7 12:54:37 Cyclops kernel: sd 2:0:1:0: [sdc] Sense Key : 0xb [current] [descriptor] May 7 12:54:37 Cyclops kernel: Descriptor sense data with sense descriptors (in hex): May 7 12:54:37 Cyclops kernel: 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 May 7 12:54:37 Cyclops kernel: 00 00 00 c0 May 7 12:54:37 Cyclops kernel: sd 2:0:1:0: [sdc] ASC=0x0 ASCQ=0x0 May 7 12:54:37 Cyclops kernel: sd 2:0:1:0: [sdc] CDB: cdb[0]=0x28: 28 00 72 d6 21 1f 00 02 00 00 May 7 12:54:37 Cyclops kernel: end_request: I/O error, dev sdc, sector 1926635807 May 7 12:54:37 Cyclops kernel: md: disk3 read error May 7 12:54:37 Cyclops kernel: handle_stripe read error: 1926635744/3, count: 1 - The previous line repeated several times. Over 150 read errors were reported in the stats screen. - There were a few more instances of the sense data and disk3 read error messages - Over the next few hours there were a thousands of these messages logged May 7 17:30:47 Cyclops kernel: ata2: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 May 7 17:30:47 Cyclops kernel: ata2: status=0x51 { DriveReady SeekComplete Error } May 7 17:30:47 Cyclops kernel: ata2: error=0x04 { DriveStatusError } - Sprinked in here and there were these May 7 17:18:28 Cyclops kernel: mdcmd (15046): spindown 3 May 7 17:18:28 Cyclops kernel: md: disk3: ATA_OP_STANDBYNOW1 ioctl error: -5 - All of the above is from the 5/7 log - rebooted - enabled the drive and used the trust my array procedure to restart - got a hand full of parity updates right at the beginning of the parity check (as expected) - about 3 hours into the parity check pretty much the same thing happened except this time over 500 read errors were reported in status display - See 5/8 log - rebooted - removed problem drive from array - reset arrary and rebuilt parity from remaining drives (I still had not deleted the source files) - two of the drives in the array were still empty so I moved on of them to the physical port the problem drive was plugged into - packed the new drive full of data, ran file and parity verifies without any problems Now am nearly certain that this is not a cabling or power problem. Looking at the SMART data for the problem drive I currently see: 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 4 Right after having the above problems 197 Current_Pending_Sector showed a count of 2. I had moved the drive to a MOBO port and ran another preclear on it. Now I see: 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 So it does not look like the read errors resulted in a sector reallocate. Conclusions: - No cabling or power problems - The problem drive reproted 4 errors while all this was going on - When an error was reported something (the driver?) went bonkers - unRAID read error recovery did not work as expected To correct the read errors I had to remove the drive from the array and use preclear. My expectation was that when unRAID encountered a read error during normal processing or parity check it would use the other drives in the array to reconstruct the missing data and write it back to the block that could not be read, thus allowing the drive to do it's possible relocte sector thing. unRAID might have been going through the motions but maybe the write was not getting to the drive. On the bright side unRAID did a perfect job of simulating the inaccessable data. The file verifies I was running completed without error illustrating perfect data recovery. I am sure I could have let unRAID just rebuild the data on a replacement drive but chose not to do that. There may be flaws in my logic and conclusion but there is no doubt that something bad happened and it involved the only drive I have for which the 187 Reported_Uncorrect value is not 0. I have not yet added this drive back to the array but will in the next week or so. Of course I don't know what would have happened if the hardware configuration were different. I suspect something related to the MV8 but don't have enough data to prove anything. All I have is a smoking gun, specifically 187 Reported_Uncorrect = 4. syslog-2010-05-07.zip syslog-2010-05-08.zip
  12. I just discovered that installing a second MV8 created a problem that when I soft power off the system it restarts all by itself after a second or two. To stop it from automatically powering on I had to disable "PCI Device Power On" in the BIOS. Now, of course, WOL does not work at all. This problem persist regardles what slots I put the two cards in. I'll be contacting ASRock about this one.
  13. Second MV8 card arrived. Intalled it and everything is working fine. Currently have one drive on each MV8 and one plugged into the mobo. Running a parity sync. There are some ugly error messages in the log when something tries to load a second copy of the MV8 driver but they don't seem to be causing any real problem. I would not be at all surpriesed if adding a third MV8 would work but have no idea what to do with 30 SATA ports in one system, my 22 are just fine. Going to do few more preclear cycles and then it will be ready to serve it's real purpose. As I wrote before suspend/resume does not with the MV8s but I don't know what to blame for that. syslog-2010-04-01b.zip
  14. In the past I would "break in" new drives by running multiple erase/verify passes using a standalone utility. I would make 5 passes varying the bit pattern from X'00' to x'55' to x'aa' to x'ff' and finally another x'00'. Just thought I would suggest there might be some benifit to this instead of always using x'00'. I understand the the final pass of a multiple pass preclean would always have to use x'00' to leave the drive in proper condition for addition to the the array but that doesn't mean different patterns couldn't be used in earlier passes. I guess you would also only want to write the signature indicating the disk is clean after an x'00' pass and make sure the signature is destroyed at the beginning of a non x'00' pass. One other observation. I am wondering about is if the preread could be skipped on passes 2-n based on the assumption a verify read would have just finished from the previous pass. I wouldn't think doing two read passes in a row helps to achieve the desired objective and it sure would save a lot of time. Just a couple of thoughts to take or leave as you see fit. Either way, thanks for a great tool. PS: I am sure those bit patterns are not ideal because of the way the way data is encoded before being written to the drive. It would take a smarter person then me to figure out what patterens would really be best but I have got to think some varyation would be better than always using x'00'.
  15. Also keep in mind that this board does not have built in video so you would need an old PCI video card for a console session and maybe even to be able to boot the system.
  16. Do you think this might be why my rig won't resume from S3 sleep when I have drives connected to the MV8 controller? I had posted details over in the PCIe MV8 thread of the hardware forum but have not read any posts from anyone saying that their system does or does not sleep/resume properly with drives connected to an MV8 card. I would hate to replace a mobo just to try and get sleep/resume to work and then find out it was a driver problem. http://lime-technology.com/forum/index.php?topic=3382.msg28924#msg28924 end of page 8 Except for the sleep resume problem everthing is working perfect so long as I don't plug a Monoprice 2 port card in any slot, not even the x1 slot. I am expecting a second MV8 later today so I will be able to test having two of them plugged into this board. Assuming it works I will have 22 SATA ports, which is more than enough.
  17. Turns out this was a DA newb thing. It is perfectly normal. After running the preclear the drives are really empty, no reiserfs file system. This little detail is perfectly clear in big red letters in the instructions, you have to format the drives after you start the array.
  18. I have run into a proble with resuming from s3 sleep when a drive is attached to the MV8. If all drives are attahed to mobo ports then sleep/resume works just fine but when one or more drives is attached to the MV8 the system will resume however the shares and web console do not start working again. The tlenet sesion from where I initiated the sleep comes back as does the system console. The log shows a lot of timeouts and the activity LED on the drives connected to the MV8 flash at about a 1Hz rate. I have attached a syslog from both a good suspend/resume and one that shows it failing. Anyone sucessfully doing suspend/resume with drives connected to an MV8 on their system? One more thing, WOL from a power off state works just fine with drives connect to the MV8. syslog-2010-03-28.zip
  19. Thanks! Must be a driver thing. I have 3 identical Samsung 1.5G drives. Put them all on mobo ports, no error messages. If I move one drive to the MV8, no error messages. (Tried several ports.) When two drives are connected to the MV8, I get one error message each time. If I connect all three drives to the MV8 I get two error message every time. Not yet in a position to try connecting more than three drives to the MV8. I can only speculate that adding more will result in more error messages. I'll do like you suggest and ignore the messages but thought there might be some value in knowing the additional details.
  20. I get one of these lines in the syslog 100% of the time when I click on Main, myMain or refresh either page. The number inside the [ ] just keeps getting larger and larger. Mar 21 13:41:12 Cyclops ata_id[2108]: HDIO_GET_IDENTITY failed for '/dev/block/8:32' ... Mar 21 14:08:53 Cyclops ata_id[9936]: HDIO_GET_IDENTITY failed for '/dev/block/8:32' Mar 21 14:08:56 Cyclops ata_id[10023]: HDIO_GET_IDENTITY failed for '/dev/block/8:32' Mar 21 14:08:57 Cyclops ata_id[10110]: HDIO_GET_IDENTITY failed for '/dev/block/8:32' Running unRAID 4.5.3 on ASRock M3A785GXH with one Supermicro MV8 card. unmenu.awk: Version 1.2 Joe L.... with modifications as suggested by bjp999 and many othersPlug-in-modules 07-unmenu-mymain.awk: 1.45 - contributed by bjp999 08-unmenu-array_mgmt.awk: 1.0 - Joe L. 09-unmenu-disk_mgmt.awk: 1.2 - print reiserfsck commands for rebuild-tree (if needed) to screen. Joe L. 10-unmenu-links.awk: 1.2 Fixed increment of link counter 16-unmenu-syslog.awk: Version: .9 - modified by Joe L. to escape special charaters 17-unmenu-syslog.awk: Version: .8 - modified by Joe L. to use a pattern file. 18-unmenu-lsof.awk: Version: .1 - Joe L. 20-unmenu-usage.awk: 1.0 25-unmenu-dupe_files.awk: 1.2 deal with appostraphe in file name 29-unmenu-sysinfo.awk: 1.3 - modified to better deal with utilities not yet installed Joe L. 30-unmenu-file_browser.awk: 1.1.4 fixed handling of directory names with embedded spaces. Joe L. 50-unmenu-user_scripts.cgi: .3 - Fixed to allow single quotes in button labels 600-unmenu-file_edit.awk: .2 Updated with ideas borrowed from go-script-manager plug-in to keep backup versions of files. 99-unmenu-utility.awk: 1.0 - contributed by bjp999 990-unmenu-wget.awk: 1.5 Improved handling of download when 404 "not-found" error returned on download URL 999-unmenu-unraid_main.awk: 1.1 Joe L. -- Added parse of "ps" to check for non-standard emhttp port.
  21. Centurion 590 (Have Antec Twelve Hundred on order) CORSAIR CMPSU-850TX ASRock M3A785GXH/128M BIOS 1.70 AMD Athlon II X2 245 Regor 2.9GHz G.SKILL Ripjaws 4G F3-12800CL9D-4GBRL Monoprice 2 port SATA PCIe cards (wound up taking them out) Supermicro AOC-SASLP-MV8 3 x Supermicro CSE-M35T1 3 x Samsun 1.5G echo green Planning to expand. In the end it is up and running, syslog is attached. Note that perity check did pretty good "sync done. time=17806sec rate=82283K/sec". During this time two drives were connected to the MV8 and one to a mobo port. However I did have problems along the way which i will explain in hope that it might help someone else some day. I suspect they are all related to mobo BIOS compatibility problems. One last detail, this board has 3 - 16x and 1 - 1x PCIe slots. I confugured the 16x to be 8x, 8x and 4x. 1. My original intent was to start with 2 port SATA cards and later get some Adaptec 4 port. The 2 port cards caused me nothing but trouble. With the main board SATA ports set to AHCI and a one of the 2 port cards installed the system would not boot nor would it let me into the BIOS setting. To get back into the BIOS setting I had to remove the 2 port card. 2. Changing the SATA ports to EIDE worked, sort of. In this config unRAID could only see 4 of the 6 SATA ports. This is one thing I can not blame on the BIOS. 3. Changing the SATA ports to RAID worked but might have cause another problem, read on. 4. I managed to get 12 ports recognized by installing 2 port cards in all 3 16x slots. However if I installed just one card in the 1x slot the system would skip right over the card BIOS for it. It even ignored a card installed in one of the 16x slots. Since I have no other 1x PCIe cards I could not try anything else in the 1x slot. While I was messing arround with this Lime Tech came to the rescue by releasing 4.5.3 so I quicly ordered up a MV8 card and while I was waiting for it ran a preclear on all 3 drives connected to mobo ports configured as RAID. 5. Got the MV8, installed, updated the USB drive to 4.5.3 and all was good, but that only gave me 14 ports and I had 15 trays. So I installed on of the 2 port cards into one of the 16x slots. Now when trying to boot unRAID is would get through about 2 1/2 lines of dots loading bzroot and then give me an "out of memory" error. So out came the Monoprice card and life was good again. 6. I changed the main board SATA ports from RAID back to AHCI and spread out the drives. Finally ready to start the array. But one more problem. I got this error on both drives even though the preclear finished just fine. Could this have been because the SATA ports were configured as RAID when the preclear ran? The other difference was that the the preclear ran with version 4.5.1 but I did not try to start the array until after updating to 4.5.3. Mar 17 20:42:49 Tower emhttp: disk2 mount error: 32 Mar 17 20:42:49 Tower emhttp: shcmd (20): rmdir /mnt/disk2 Mar 17 20:42:49 Tower kernel: REISERFS warning (device md2): sh-2021 reiserfs_fill_super: can not find reiserfs on md2 Anyway, after reading a bit about this error, I decided to just let the initial parity build finish and then format the drives. Now life was really good. To tidy up I ran another parity check which can be seen in the attached log. This is where I am at now. Got a bigger case on the way, will order a second MV8 card in April (spent too much money in the last few weeks) and will eventually add a 4th drive bay. I'll post an update when I can test with 2 MV8 cards installed. In the end I would not recommend this mobo. I got it because it had 3 16x slots, got high marks and was only $94 back when I thought I would need 3 Adaptec cards to be able to get to 20 ports. Now two MV8s and 4 mobo ports will get you there. Guess I should have procrastinated a little longer before starting this project. Probably would have been a lot smoother process. Later Jerry syslog-2010-03-18.txt
  22. Well, the board is out of stock and the price listed jumped from $90 to $220 so I won't be testing this any time soon, if at all. He who hesitates looses. I'll be watching for an alternative while collecting some of the other parts I need. Thanks for the pointers.
  23. What are the chances this would work as the foundation of a 20 drive server along with 4 Adaptec 1430SA cards? Price seemse to be right. If there is nothing about it that is known to be incompatible, I am tempted to get one just to test it out in a 3 drive setup. http://www.newegg.com/Product/Product.aspx?Item=N82E16813186149 Is there a better, under $500, way to be able to connect 20 SATA drives? Thanks for any thoughts on the subject. Jerry