September 14, 201114 yr Hi I finished my unraid server some days ago and I'm migrating data from the old server. This is my hardware setup: Motherboard: ASUS 8PH67-V CPU: Intel G620 RAM: 2x2GB SATA controller: IBM BR10i (LSI SAS 1068E) in the PCIe x16. Flashed with IT firmware as in http://lime-technology.com/forum/index.php?topic=12767.0 unRaid 5.0-beta 12 Here's a mymain screenshot: Disk2 and disk4 has some reallocated sectors, but I guess that's ok as long as it doesn't increase. Parity, disk1, 2 and 3 is connected to the motherboard SATA connectors and disk4 hangs on port0 on the BR10i card. For some reason, mymain won't display SMART info for disk4. In a telnet window, I get this: root@Tower:~# smartctl -a -d ata /dev/sdf smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Smartctl: Device Read Identity Failed (not an ATA/ATAPI device) A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. terminate called after throwing an instance of 'int' Aborted root@Tower:~# smartctl -A /dev/sdf smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail Always - 231116623 3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 48 5 Reallocated_Sector_Ct 0x0033 098 098 036 Pre-fail Always - 84 7 Seek_Error_Rate 0x000f 076 060 030 Pre-fail Always - 45427125 9 Power_On_Hours 0x0032 081 081 000 Old_age Always - 16992 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 43 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 078 078 000 Old_age Always - 22 190 Airflow_Temperature_Cel 0x0022 072 051 045 Old_age Always - 28 (Min/Max 19/28) 194 Temperature_Celsius 0x0022 028 049 000 Old_age Always - 28 (0 12 0 0) 195 Hardware_ECC_Recovered 0x001a 046 015 000 Old_age Always - 231116623 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 156031866913371 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 1989695539 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 2899967634 root@Tower:~# So some SMART info can be shown for disk4, but not the way mymain does it. How can I fix this? The normal device status window (at port 80) shows temp for all disks, but mymenu won't for disk4. EDIT: Fixed this with the info in this thread: http://lime-technology.com/forum/index.php?topic=9337.msg141761#msg141761 Now, the syslog (attached) shows some errors around 21:00. What are these and do I have to be worried? Any other issues I have to look out for? EDIT: Here's the errors: Sep 13 20:59:28 Tower kernel: ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen (Errors) Sep 13 20:59:28 Tower kernel: ata1.00: irq_stat 0x08000000, interface fatal error (Errors) Sep 13 20:59:28 Tower kernel: ata1: SError: { UnrecovData Handshk } (Errors) Sep 13 20:59:28 Tower kernel: ata1.00: failed command: WRITE DMA EXT (Minor Issues) Sep 13 20:59:28 Tower kernel: ata1.00: cmd 35/00:00:e0:53:d3/00:04:03:00:00/e0 tag 0 dma 524288 out (Drive related) Sep 13 20:59:28 Tower kernel: res 50/00:00:5f:53:d3/00:00:03:00:00/e0 Emask 0x10 (ATA bus error) (Errors) Sep 13 20:59:28 Tower kernel: ata1.00: status: { DRDY } (Drive related) Sep 13 20:59:28 Tower kernel: ata1: hard resetting link (Minor Issues) Sep 13 20:59:28 Tower kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) (Drive related) Sep 13 20:59:28 Tower kernel: ata1.00: configured for UDMA/133 (Drive related) Sep 13 20:59:28 Tower kernel: ata1: EH complete (Drive related) Sep 13 20:59:58 Tower kernel: ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen (Errors) Sep 13 20:59:58 Tower kernel: ata1.00: irq_stat 0x08000000, interface fatal error (Errors) Sep 13 20:59:58 Tower kernel: ata1: SError: { UnrecovData Handshk } (Errors) Sep 13 20:59:58 Tower kernel: ata1.00: failed command: WRITE DMA EXT (Minor Issues) Sep 13 20:59:58 Tower kernel: ata1.00: cmd 35/00:60:88:05:e5/00:02:03:00:00/e0 tag 0 dma 311296 out (Drive related) Sep 13 20:59:58 Tower kernel: res 50/00:00:87:05:e5/00:00:03:00:00/e0 Emask 0x10 (ATA bus error) (Errors) Sep 13 20:59:58 Tower kernel: ata1.00: status: { DRDY } (Drive related) Sep 13 20:59:58 Tower kernel: ata1: hard resetting link (Minor Issues) Sep 13 20:59:58 Tower kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) (Drive related) Sep 13 20:59:58 Tower kernel: ata1.00: configured for UDMA/133 (Drive related) Sep 13 20:59:58 Tower kernel: ata1: EH complete (Drive related) Sep 13 21:00:02 Tower kernel: ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen (Errors) Sep 13 21:00:02 Tower kernel: ata1.00: irq_stat 0x08000000, interface fatal error (Errors) Sep 13 21:00:02 Tower kernel: ata1: SError: { UnrecovData Handshk } (Errors) Sep 13 21:00:02 Tower kernel: ata1.00: failed command: WRITE DMA EXT (Minor Issues) Sep 13 21:00:02 Tower kernel: ata1.00: cmd 35/00:80:88:c4:e7/00:02:03:00:00/e0 tag 0 dma 327680 out (Drive related) Sep 13 21:00:02 Tower kernel: res 50/00:00:07:c4:e7/00:00:03:00:00/e3 Emask 0x10 (ATA bus error) (Errors) Sep 13 21:00:02 Tower kernel: ata1.00: status: { DRDY } (Drive related) Sep 13 21:00:02 Tower kernel: ata1: hard resetting link (Minor Issues) Sep 13 21:00:02 Tower kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) (Drive related) Sep 13 21:00:02 Tower kernel: ata1.00: configured for UDMA/133 (Drive related) Sep 13 21:00:02 Tower kernel: ata1: EH complete (Drive related) Sep 13 21:00:42 Tower kernel: ata1: limiting SATA link speed to 3.0 Gbps (Drive related) Sep 13 21:00:42 Tower kernel: ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen (Errors) Sep 13 21:00:42 Tower kernel: ata1.00: irq_stat 0x08000000, interface fatal error (Errors) Sep 13 21:00:42 Tower kernel: ata1: SError: { UnrecovData Handshk } (Errors) Sep 13 21:00:42 Tower kernel: ata1.00: failed command: WRITE DMA EXT (Minor Issues) Sep 13 21:00:42 Tower kernel: ata1.00: cmd 35/00:00:50:4b:fd/00:04:03:00:00/e0 tag 0 dma 524288 out (Drive related) Sep 13 21:00:42 Tower kernel: res 50/00:00:4f:4b:fd/00:00:03:00:00/e0 Emask 0x10 (ATA bus error) (Errors) Sep 13 21:00:42 Tower kernel: ata1.00: status: { DRDY } (Drive related) The server was rebooted 19:17 and disk4 was added. For some reason I couldn't add it to the array, so I left it for a while and tried again 22:45 with success. Are the 21:00 errors related to this? Nothing was done between 19:17 and 22:45. I also have a few errors 09:40 (segfault) Another thing is that I have to manually start the array every time I reboot. Parity is fine and I use the powerdown script. How can I fix this? Thanks for all help. regards, Ketil syslog-2011-09-14.txt
September 14, 201114 yr In Setting->Disk Setting set "Enable auto start" to Yes. The parity drive is ata1.00 and the problem is explained here: http://lime-technology.com/wiki/index.php?title=The_Analysis_of_Drive_Issues#Drive_interface_issue_.234 Keep an eye on the reallocated sector counts and errors count in the main page. Keep a pre-cleared replacement on-hand. I use my cache drive as a ready spare.
September 14, 201114 yr Author In Setting->Disk Setting set "Enable auto start" to Yes. It's already set to yes, so it's not that. The parity drive is ata1.00 and the problem is explained here: http://lime-technology.com/wiki/index.php?title=The_Analysis_of_Drive_Issues#Drive_interface_issue_.234 Seems like a loose connection somewhere then. I'll have to look into that.
September 14, 201114 yr Your motherboard uses Atheros based LAN chipset. These ones did not play nice with the older Unraid versions, not sure if any development has been done to fix that in 5. Disable any unused hardware feature that you are not using (serial and Par. ports, Audio, firewire, even the VIA-based IDE controller) That will free more resources or prevent them interfering with each other under the new kernel...
September 19, 201114 yr Author Thanks for the answers so far. The on-board LAN controller seems to be working ok with 5beta12. I have a Intel PRO/1000 PCI card I'll try instead and disable the on-board LAN, at least until 5.0 final is ready. I've been looking around to see if the PCI-E 16x slot that is connected directly to the CPU can be used for non-graphic cards. I currently have the BR10i card in that slot, and it seems to be ok. The other 16x slot will only run in 4x mode so I can't use that. 14 sata ports (8+6) should be ok for while I guess. Have disabled everything in BIOS that's not needed. I still have to manually start the array after a clean power down. Slightly annoying.
September 19, 201114 yr I've been looking around to see if the PCI-E 16x slot that is connected directly to the CPU can be used for non-graphic cards. I currently have the BR10i card in that slot, and it seems to be ok. The other 16x slot will only run in 4x mode so I can't use that. 14 sata ports (8+6) should be ok for while I guess. BR10i will work in an "electrical" PCIe x 4 connector.
September 21, 201114 yr If this were my array I would replace disk2. It has a ton of hours, a high reallocated sector count and it is only 500G. It is time for it to retire. Disk4 is a judgment call, but it I would watch it like a hawk.
September 21, 201114 yr If this were my array I would replace disk2. It has a ton of hours, a high reallocated sector count and it is only 500G. It is time for it to retire. Disk4 is a judgment call, but it I would watch it like a hawk. I have 2 500GB drives that are both over 30000. I have another 750GB drive that is over 20000. Of all my drives those are the 3 original I started with and none of them have EVER given me a problem (knocks on wood). I have a couple drives standing by, but if it ain't broke don't fix it.
Archived
This topic is now archived and is closed to further replies.