Jump to content

New server build, sys log errors


ketiljo

Recommended Posts

Hi

 

I finished my unraid server some days ago and I'm migrating data from the old server. This is my hardware setup:

Motherboard: ASUS 8PH67-V

CPU: Intel G620

RAM: 2x2GB

SATA controller: IBM BR10i (LSI SAS 1068E) in the PCIe x16. Flashed with IT firmware as in http://lime-technology.com/forum/index.php?topic=12767.0

 

unRaid 5.0-beta 12

 

Here's a mymain screenshot:

Mymain.jpg

 

Disk2 and disk4 has some reallocated sectors, but I guess that's ok as long as it doesn't increase.

 

Parity, disk1, 2 and 3 is connected to the motherboard SATA connectors and disk4 hangs on port0 on the BR10i card. For some reason, mymain won't display SMART info for disk4.

 

In a telnet window, I get this:

 

root@Tower:~# smartctl -a -d ata /dev/sdf
smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Smartctl: Device Read Identity Failed (not an ATA/ATAPI device)

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
terminate called after throwing an instance of 'int'
Aborted
root@Tower:~# smartctl -A /dev/sdf
smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate     0x000f   119   099   006    Pre-fail  Always       -       231116623
 3 Spin_Up_Time            0x0003   095   095   000    Pre-fail  Always       -       0
 4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       48
 5 Reallocated_Sector_Ct   0x0033   098   098   036    Pre-fail  Always       -       84
 7 Seek_Error_Rate         0x000f   076   060   030    Pre-fail  Always       -       45427125
 9 Power_On_Hours          0x0032   081   081   000    Old_age   Always       -       16992
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       43
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   078   078   000    Old_age   Always       -       22
190 Airflow_Temperature_Cel 0x0022   072   051   045    Old_age   Always       -       28 (Min/Max 19/28)
194 Temperature_Celsius     0x0022   028   049   000    Old_age   Always       -       28 (0 12 0 0)
195 Hardware_ECC_Recovered  0x001a   046   015   000    Old_age   Always       -       231116623
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       156031866913371
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       1989695539
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       2899967634

root@Tower:~#

So some SMART info can be shown for disk4, but not the way mymain does it. How can I fix this? The normal device status window (at port 80) shows temp for all disks, but mymenu won't for disk4.

 

EDIT: Fixed this with the info in this thread: http://lime-technology.com/forum/index.php?topic=9337.msg141761#msg141761

 

Now, the syslog (attached) shows some errors around 21:00. What are these and do I have to be worried? Any other issues I have to look out for?

 

EDIT: Here's the errors:

Sep 13 20:59:28 Tower kernel: ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen (Errors)
Sep 13 20:59:28 Tower kernel: ata1.00: irq_stat 0x08000000, interface fatal error (Errors)
Sep 13 20:59:28 Tower kernel: ata1: SError: { UnrecovData Handshk } (Errors)
Sep 13 20:59:28 Tower kernel: ata1.00: failed command: WRITE DMA EXT (Minor Issues)
Sep 13 20:59:28 Tower kernel: ata1.00: cmd 35/00:00:e0:53:d3/00:04:03:00:00/e0 tag 0 dma 524288 out (Drive related)
Sep 13 20:59:28 Tower kernel:          res 50/00:00:5f:53:d3/00:00:03:00:00/e0 Emask 0x10 (ATA bus error) (Errors)
Sep 13 20:59:28 Tower kernel: ata1.00: status: { DRDY } (Drive related)
Sep 13 20:59:28 Tower kernel: ata1: hard resetting link (Minor Issues)
Sep 13 20:59:28 Tower kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) (Drive related)
Sep 13 20:59:28 Tower kernel: ata1.00: configured for UDMA/133 (Drive related)
Sep 13 20:59:28 Tower kernel: ata1: EH complete (Drive related)
Sep 13 20:59:58 Tower kernel: ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen (Errors)
Sep 13 20:59:58 Tower kernel: ata1.00: irq_stat 0x08000000, interface fatal error (Errors)
Sep 13 20:59:58 Tower kernel: ata1: SError: { UnrecovData Handshk } (Errors)
Sep 13 20:59:58 Tower kernel: ata1.00: failed command: WRITE DMA EXT (Minor Issues)
Sep 13 20:59:58 Tower kernel: ata1.00: cmd 35/00:60:88:05:e5/00:02:03:00:00/e0 tag 0 dma 311296 out (Drive related)
Sep 13 20:59:58 Tower kernel:          res 50/00:00:87:05:e5/00:00:03:00:00/e0 Emask 0x10 (ATA bus error) (Errors)
Sep 13 20:59:58 Tower kernel: ata1.00: status: { DRDY } (Drive related)
Sep 13 20:59:58 Tower kernel: ata1: hard resetting link (Minor Issues)
Sep 13 20:59:58 Tower kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) (Drive related)
Sep 13 20:59:58 Tower kernel: ata1.00: configured for UDMA/133 (Drive related)
Sep 13 20:59:58 Tower kernel: ata1: EH complete (Drive related)
Sep 13 21:00:02 Tower kernel: ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen (Errors)
Sep 13 21:00:02 Tower kernel: ata1.00: irq_stat 0x08000000, interface fatal error (Errors)
Sep 13 21:00:02 Tower kernel: ata1: SError: { UnrecovData Handshk } (Errors)
Sep 13 21:00:02 Tower kernel: ata1.00: failed command: WRITE DMA EXT (Minor Issues)
Sep 13 21:00:02 Tower kernel: ata1.00: cmd 35/00:80:88:c4:e7/00:02:03:00:00/e0 tag 0 dma 327680 out (Drive related)
Sep 13 21:00:02 Tower kernel:          res 50/00:00:07:c4:e7/00:00:03:00:00/e3 Emask 0x10 (ATA bus error) (Errors)
Sep 13 21:00:02 Tower kernel: ata1.00: status: { DRDY } (Drive related)
Sep 13 21:00:02 Tower kernel: ata1: hard resetting link (Minor Issues)
Sep 13 21:00:02 Tower kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) (Drive related)
Sep 13 21:00:02 Tower kernel: ata1.00: configured for UDMA/133 (Drive related)
Sep 13 21:00:02 Tower kernel: ata1: EH complete (Drive related)
Sep 13 21:00:42 Tower kernel: ata1: limiting SATA link speed to 3.0 Gbps (Drive related)
Sep 13 21:00:42 Tower kernel: ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen (Errors)
Sep 13 21:00:42 Tower kernel: ata1.00: irq_stat 0x08000000, interface fatal error (Errors)
Sep 13 21:00:42 Tower kernel: ata1: SError: { UnrecovData Handshk } (Errors)
Sep 13 21:00:42 Tower kernel: ata1.00: failed command: WRITE DMA EXT (Minor Issues)
Sep 13 21:00:42 Tower kernel: ata1.00: cmd 35/00:00:50:4b:fd/00:04:03:00:00/e0 tag 0 dma 524288 out (Drive related)
Sep 13 21:00:42 Tower kernel:          res 50/00:00:4f:4b:fd/00:00:03:00:00/e0 Emask 0x10 (ATA bus error) (Errors)
Sep 13 21:00:42 Tower kernel: ata1.00: status: { DRDY } (Drive related)

 

The server was rebooted 19:17 and disk4 was added. For some reason I couldn't add it to the array, so I left it for a while and tried again 22:45 with success. Are the 21:00 errors related to this? Nothing was done between 19:17 and 22:45. I also have a few errors 09:40 (segfault)

 

Another thing is that I have to manually start the array every time I reboot. Parity is fine and I use the powerdown script. How can I fix this?

 

Thanks for all help.

 

regards,

 

Ketil

syslog-2011-09-14.txt

Link to comment

Your motherboard uses Atheros based LAN chipset. These ones did not play nice with the older Unraid versions, not sure if any development has been done to fix that in 5.

 

Disable any unused hardware feature that you are not using (serial and Par. ports, Audio, firewire, even the VIA-based IDE controller) That will free more resources or prevent them interfering with each other under the new kernel...

Link to comment

Thanks for the answers so far.

 

The on-board LAN controller seems to be working ok with 5beta12. I have a Intel PRO/1000 PCI card I'll try instead and disable the on-board LAN, at least until 5.0 final is ready.

 

I've been looking around to see if the PCI-E 16x slot that is connected directly to the CPU can be used for non-graphic cards. I currently have the BR10i card in that slot, and it seems to be ok. The other 16x slot will only run in 4x mode so I can't use that. 14 sata ports (8+6) should be ok for while I guess.

 

Have disabled everything in BIOS that's not needed. I still have to manually start the array after a clean power down. Slightly annoying.

 

 

Link to comment

I've been looking around to see if the PCI-E 16x slot that is connected directly to the CPU can be used for non-graphic cards. I currently have the BR10i card in that slot, and it seems to be ok. The other 16x slot will only run in 4x mode so I can't use that. 14 sata ports (8+6) should be ok for while I guess.

 

BR10i will work in an "electrical" PCIe x 4 connector.

Link to comment

If this were my array I would replace disk2. It has a ton of hours, a high reallocated sector count and it is only 500G. It is time for it to retire. Disk4 is a judgment call, but it I would watch it like a hawk.

I have 2 500GB drives that are both over 30000. I have another 750GB drive that is over 20000.  Of all my drives those are the 3 original I started with and none of them have EVER given me a problem (knocks on wood).

 

I have a couple drives standing by, but if it ain't broke don't fix it.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...