March 9, 201214 yr It also appears that you have two different LSI 1068E based controllers but with different firmware. Mar 8 13:54:57 Angband kernel: ioc0: LSISAS1068E B3: Capabilities={Initiator} Mar 8 13:54:57 Angband kernel: mptsas 0000:01:00.0: setting latency timer to 64 Mar 8 13:54:57 Angband kernel: scsi7 : ioc0: LSISAS1068E B3, FwRev=01160100h, Ports=1, MaxQ=366, IRQ=24 and Mar 8 13:54:57 Angband kernel: ioc1: LSISAS1068E B3: Capabilities={Initiator} Mar 8 13:54:57 Angband kernel: mptsas 0000:05:00.0: setting latency timer to 64 Mar 8 13:54:57 Angband kernel: scsi8 : ioc1: LSISAS1068E B3, FwRev=011a0000h, Ports=1, MaxQ=478, IRQ=16 I believe up to a certain revision the IT firmware for these cards came in two flavors. flavor one - support for SATA1 and SATA2 drives flavor two - support for SATA2 and SATA3 drives I also believe in the latest release they unified the firmware (no need for "flavors") You may be suffering from one of these symptoms so you will need to flash them to the latest firmware and BIOS.
March 9, 201214 yr Author I am using two LSI controllers, I am not sure how to flash them, but I will Googlify it It also appears that you have two different LSI 1068E based controllers but with different firmware. Mar 8 13:54:57 Angband kernel: ioc0: LSISAS1068E B3: Capabilities={Initiator} Mar 8 13:54:57 Angband kernel: mptsas 0000:01:00.0: setting latency timer to 64 Mar 8 13:54:57 Angband kernel: scsi7 : ioc0: LSISAS1068E B3, FwRev=01160100h, Ports=1, MaxQ=366, IRQ=24 and Mar 8 13:54:57 Angband kernel: ioc1: LSISAS1068E B3: Capabilities={Initiator} Mar 8 13:54:57 Angband kernel: mptsas 0000:05:00.0: setting latency timer to 64 Mar 8 13:54:57 Angband kernel: scsi8 : ioc1: LSISAS1068E B3, FwRev=011a0000h, Ports=1, MaxQ=478, IRQ=16 I believe up to a certain revision the IT firmware for these cards came in two flavors. flavor one - support for SATA1 and SATA2 drives flavor two - support for SATA2 and SATA3 drives I also believe in the latest release they unified the firmware (no need for "flavors") You may be suffering from one of these symptoms so you will need to flash them to the latest firmware and BIOS.
March 9, 201214 yr Author Not sure which to download: http://www.lsi.com/support/products/Pages/LSISAS1068E.aspx Anyone have a pointer for me?
March 9, 201214 yr Go there: http://www.lsi.com/products/storagecomponents/Pages/LSISAS3081E-R.aspx Click on the fourth tab - "support and download" Expand the "firmware" and download the first file with the long name SAS3081ER Package P21..... You do it one card at at time (remove one to have a single controller during the flashing) Your chips are B3 revision
March 10, 201214 yr Author Will this work even if I have a 1608E? Go there: http://www.lsi.com/products/storagecomponents/Pages/LSISAS3081E-R.aspx Click on the fourth tab - "support and download" Expand the "firmware" and download the first file with the long name SAS3081ER Package P21..... You do it one card at at time (remove one to have a single controller during the flashing) Your chips are B3 revision
March 12, 201214 yr Author *BUMP* No answers from Tom - need to figure out what to do here before I am out of space... any takers?
March 12, 201214 yr I think the data on that drive is gone. I suspect that the drive showed as unformatted because it was unmounted and then you screwed it up by moving the partition. You might be able to get the data back if you have the original disk and you fix the partition and then run reiserfsck on it. Just to note, a data rebuild will just put the data on the disk directly onto a new one. If the original disk has screwed up data then the rebuild will too. You will have to try the earlier beta and if that doesn't work then I'd be suspect of the power supply. You should first determine if the "failing" drives are on a single SAS cable or power splitter or the like.
March 12, 201214 yr Author I have the original disk still, but when I moved the partition on that one, that's when it showed as unformatted. I think maybe the best option is to format the new disk, then put the old disk on a different system and try to recover what is/was on it, sound good? I think the data on that drive is gone. I suspect that the drive showed as unformatted because it was unmounted and then you screwed it up by moving the partition. You might be able to get the data back if you have the original disk and you fix the partition and then run reiserfsck on it. Just to note, a data rebuild will just put the data on the disk directly onto a new one. If the original disk has screwed up data then the rebuild will too. You will have to try the earlier beta and if that doesn't work then I'd be suspect of the power supply. You should first determine if the "failing" drives are on a single SAS cable or power splitter or the like.
March 13, 201214 yr You might as well format the new one so it can be used. Install but don't assign that old disk so you can move the partition back and run reiserfsck on it again.
March 13, 201214 yr Author You might as well format the new one so it can be used. Install but don't assign that old disk so you can move the partition back and run reiserfsck on it again. Ok, I'll try that! Sorry to be such a newb pain in the arse...
March 15, 201214 yr Author Ok, so I've put in the old "bad" drive and run the DD command, here's the output: root@Angband:~# dd if=/dev/sdm count=195 | od -c -A d | sed 30q 195+0 records in 195+0 records out 99840 bytes (100 kB) copied, 0.00328202 s, 30.4 MB/s 0000000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0000448 \0 \0 203 \0 \0 \0 ? \0 \0 \0 q 210 340 350 \0 \0 0000464 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0000496 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 U 252 0000512 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0065536 020 021 034 035 P U 243 \n 323 375 \f \0 022 \0 \0 \0 0065552 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0 \0 \0 \0 \0 0065568 204 003 \0 \0 036 \0 \0 \0 \0 \0 \0 \0 \0 020 314 003 0065584 314 003 001 \0 R e I s E r 2 F s \0 \0 \0 0065600 003 \0 \0 \0 005 \0 9 : 002 \0 \0 \0 \0 \0 \0 \0 0065616 \0 \0 \0 \0 ( N 330 ^ 032 332 O 223 214 = 345 L 0065632 022 262 022 E \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0065648 \0 \0 \0 \0 001 \0 036 \0 036 031 P O \0 N 355 \0 0065664 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0065728 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 001 \0 \0 \0 0065744 0 \0 \0 \0 1 \0 \0 \0 W \0 \0 \0 X \0 \0 \0 0065760 204 \0 \0 \0 205 \0 \0 \0 206 \0 \0 \0 207 \0 \0 \0 0065776 245 \0 \0 \0 246 \0 \0 \0 257 \0 \0 \0 261 \0 \0 \0 0065792 273 \0 \0 \0 275 \0 \0 \0 276 \0 \0 \0 300 \0 \0 \0 0065808 301 \0 \0 \0 303 \0 \0 \0 327 \0 \0 \0 330 \0 \0 \0 0065824 353 \0 \0 \0 355 \0 \0 \0 357 \0 \0 \0 360 \0 \0 \0 0065840 362 \0 \0 \0 363 \0 \0 \0 365 \0 \0 \0 367 \0 \0 \0 0065856 370 \0 \0 \0 371 \0 \0 \0 377 \0 \0 \0 \0 001 \0 \0 0065872 001 001 \0 \0 002 001 \0 \0 003 001 \0 \0 005 001 \0 \0 0065888 \a 001 \0 \0 \b 001 \0 \0 \f 001 \0 \0 \r 001 \0 \0 0065904 016 001 \0 \0 017 001 \0 \0 023 001 \0 \0 024 001 \0 \0 I've also run the unraid_partition_utility.sh and here's the output from that: root@Angband:/boot# unraid_partition_disk.sh /dev/sdm ######################################################################## terminate called after throwing an instance of 'int' Disk /dev/sdm: 2000.4 GB, 2000398934016 bytes 1 heads, 63 sectors/track, 62016336 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdm1 63 3907029167 1953514552+ 83 Linux Partition 1 does not end on cylinder boundary. ######################################################################## ============================================================================ == == DISK /dev/sdm IS partitioned for unRAID properly == expected start = 63, actual start = 63 == expected size = 3907029105, actual size = 3907029105 == ============================================================================ root@Angband:/boot# So, is it just a matter of running reiserfsck --check /dev/sdm1 now and seeing if it can re-construct the data? I can then use MC or something to copy the data back to the array, then format and re-add the drive back. Here is output from the smartctl report on that drive: root@Angband:/boot# smartctl -a /dev/sdm smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: WDC WD20EARX-00PASB0 Serial Number: WD-WCAZAD207818 Firmware Version: 51.0AB51 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu Mar 15 09:53:08 2012 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (37560) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 179 173 021 Pre-fail Always - 6025 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 70 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 2140 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 36 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 29 193 Load_Cycle_Count 0x0032 196 196 000 Old_age Always - 13323 194 Temperature_Celsius 0x0022 122 119 000 Old_age Always - 28 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 1989 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. root@Angband:/boot#
March 15, 201214 yr perhaps.... but consider this... On my disk, with the partition starting on sector 63, I found the "ReiserFS string here: 0097840 220 \0 002 \0 R e I s E r 2 F s \0 \0 \0 If your file-system had started on sector 64, it should have been located at an address 512 bytes higher (at address 98352) Your string seems to be located at address 65534. That is 63 sectors prior to where mine is located. In other words, in some fashion, you apparently created a file system on the raw device instead of the partition. You might have an additional "reiserFs" string further up in sectors on your disk, but who knows. Instead of running the "dd" command as specified, you might try dd if=/dev/sdm count=195 | od -c -A d | sed 3000q | tee /boot/dd-output.txt Then, look to see what is at addresses 0097840 and 0098352. Somehow, I doubt that the reiserfsck --check option will be able to do anything, since it will probably fail to find the superblock it expects at correct point in the first partition. So far, it appears as if the first partition was created starting in sector 0, and not as unRAID would have at all. Of course, it is possible that is from a prior use of the drive, and you did not pre-clear/clear it ever. Joe L.
March 15, 201214 yr Author Hey Joe; This drive has been pre-cleared and used only for unRAID, nothing else. I probably fooked something up, it's *very* likely. I'm running the reiserfsck on it now, I'll see what it finds and maybe get lucky. I'll post the dd output after it finishes this pass. Thanks, Km.
March 19, 201214 yr Author OK, well it looks like the saga is at an end. I moved the partition back to sector 63 and it ran reiserfsck and "found" all my files. Problem was, when I tried to view them, the corruption was too bad. At least I got a list of what I need to re-capture, so that was the silver lining I suppose. The question now is, based on the smart report, does anyone see any reason why I should not re-introduce that drive to my array? I saw no "pending" sectors on it... Thanks!
March 19, 201214 yr Author Also, what's the best way to add it now, pre_clear it again? or just format it?
March 20, 201214 yr Author Are all of the other dots green? Yes, the array is functioning perfectly as of the past few days...
March 20, 201214 yr Author Post a SMART report for the faulty drive. smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: WDC WD20EARX-00PASB0 Serial Number: WD-WCAZAD207818 Firmware Version: 51.0AB51 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Tue Mar 20 13:19:50 2012 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (37560) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 179 173 021 Pre-fail Always - 6025 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 70 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 2264 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 36 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 29 193 Load_Cycle_Count 0x0032 196 196 000 Old_age Always - 13372 194 Temperature_Celsius 0x0022 124 119 000 Old_age Always - 26 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 1989 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. root@Angband:~#
March 20, 201214 yr Author Is the is disk showing a red dot? Have you read the previous posts? No, this is not in the array... no dots at all.
Archived
This topic is now archived and is closed to further replies.