unRAID Server Release 5.0-beta13 Available

capler · October 29, 2011

I found a problem with my network (Just need to figure out how to correct it.)

In system information it shows that I have 100Mb/s full duplex, it should be 1000Mb/s full duplex.

I was going to attach a syslog, but I am having trouble finding it. (The only one I can find has just 6 lines, and I know that can't be it.)

not sure if this helps you, but I originally had the same issue when doing a Reboot. However, after I completely shutdown my server, off at the mains & restarted the NIC set itself to Gigabit. Don't know why, but it worked for me.

Auggie · October 29, 2011

Not sure if there was any effort to work this issue from my last report, but NFS network connectivity is still intermittent during any "real" access (e.g. continuous activity that lasts 30 seconds or more in duration).

I use YAMJ to index my unRAID media server, and it usually can NOT complete successfully as the NFS network connection will momentarily be broken between unRAID and OS X.

OS X works fine with other NFS servers on my network, so this is specific to unRAID (now b13).

As before, gist of the error message is:

Oct 29 14:13:49 UnRAID rpc.statd[1194]: No canonical hostname found for 10.0.1.200
Oct 29 14:13:49 UnRAID rpc.statd[1194]: STAT_FAIL to UnRAID for SM_MON of 10.0.1.200
Oct 29 14:13:49 UnRAID kernel: lockd: cannot monitor The Matrix

This will eventually lead to OS X displaying a dialog that the server has been disconnected and if the mounted share from that server should be unmounted. Eventually, if I don't do anything the connection will reestablish itself after a minute or so and OS X will remove the dialog.

ALSO, another issue that hasn't been resolved is trying to copy a file to unRAID via NFS from OS X's Finder; OS X reports an error, but unRAID still creates an empty (0 k) file into the destination. But I can delete files and copy files through other file management programs (such as YAMJ).

I can modify and delete existing files via OS X Finder, as well as create, modify and delete folders with no problems; it just seems that copying files via OS X's Finder is broken.

NFS has been many orders of magnitude faster than SMB could ever possibly hope for (I'm interested in how fast SMB2 will be), which is a necessity for me as due to my large video collection YAMJ creates 10's of thousands of files almost all at the same level in its "jukebox" folder. Trying to navigate in this folder can be excruciatingly slow to fully populate under SMB. Hence my hope for fixing this NFS issue with unRAID.

syslog-2011-10-29.txt.zip

speeding_ant · October 29, 2011

For general use, all seems to be working fine on my server. Full gigabit, etc etc. No apparent change in speed.

Johnm · October 29, 2011

Just an update from my camp. after beating up my server, no errors at all after 18 hours.

full drive rebuild and pushing about 500MB of data to it and non stop movie streaming from it.

I am going to expand the array tonight and add another drive or 2.

Alex.vision · October 30, 2011

Has anyone experienced the BLK_EH_NOT_HANDLED error with the MV8SAS-LP cards? I have been running 5b12a of a couple of weeks with 2 MV8SAS-LP with 10 WD Ears, as I understand it I run the possibility of experiance a BLK_EH_NOT_HANDLED error at any time.

BRiT · October 30, 2011

Just an update from my camp. after beating up my server, no errors at all after 18 hours.

full drive rebuild and pushing about 500MB of data to it and non stop movie streaming from it.

I am going to expand the array tonight and add another drive or 2.

Have you tried letting the drives (some or all) spin down and then try using it so it has to spin some of them up?

Johnm · October 30, 2011

Just an update from my camp. after beating up my server, no errors at all after 18 hours.

full drive rebuild and pushing about 500MB of data to it and non stop movie streaming from it.

I am going to expand the array tonight and add another drive or 2.

Have you tried letting the drives (some or all) spin down and then try using it so it has to spin some of them up?

That was Exactly what blew up!

Right after my post, I saw my parity drive was spun down.

I manually evoked the mover and my server blew up. same error but on the parity drive...

Red balled parity....

Parity is on an MV1015 P10 bios i believe.

the last crash was a spun down drive i believe also..

so question now is, did we fix the SASLP drivers and kill the LSI?

so much for expanding my array. now I have to recompute parity. just as i am about to leave on a project for a week... I might have to roll back to b12 for now.

Attached is my syslog and smart report for red balled drive.

SMART:

=== START OF INFORMATION SECTION ===
Device Model:     Hitachi HDS5C3030ALA630
Serial Number:    MJ1321YNG0XRKA
Firmware Version: MEAOA580
User Capacity:    3,000,592,982,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sat Oct 29 20:31:40 2011 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
				was suspended by an interrupting command from host.
				Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
				without error or no self-test has ever 
				been run.
Total time to complete Offline 
data collection: 		 (38469) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				Offline surface scan supported.
				Self-test supported.
				No Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 255) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
				SCT Error Recovery Control supported.
				SCT Feature Control supported.
				SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
 2 Throughput_Performance  0x0005   131   131   054    Pre-fail  Offline      -       122
 3 Spin_Up_Time            0x0007   127   127   024    Pre-fail  Always       -       548 (Average 548)
 4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       336
 5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
 7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
 8 Seek_Time_Performance   0x0005   135   135   020    Pre-fail  Offline      -       31
 9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       2554
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       78
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       337
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       337
194 Temperature_Celsius     0x0002   206   206   000    Old_age   Always       -       29 (Min/Max 21/39)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

General Purpose Logging (GPL) feature set supported
General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
GP/S  Log at address 0x00 has    1 sectors [Log Directory]
SMART Log at address 0x01 has    1 sectors [summary SMART error log]
GP    Log at address 0x03 has    1 sectors [Ext. Comprehensive SMART error log]
GP    Log at address 0x04 has    7 sectors [Device Statistics]
SMART Log at address 0x06 has    1 sectors [sMART self-test log]
GP    Log at address 0x07 has    1 sectors [Extended self-test log]
GP    Log at address 0x08 has    1 sectors [Reserved]
SMART Log at address 0x09 has    1 sectors [selective self-test log]
GP    Log at address 0x10 has    1 sectors [NCQ Command Error]
GP    Log at address 0x11 has    1 sectors [sATA Phy Event Counters]
GP    Log at address 0x20 has    1 sectors [streaming performance log]
GP    Log at address 0x21 has    1 sectors [Write stream error log]
GP    Log at address 0x22 has    1 sectors [Read stream error log]
GP/S  Log at address 0x80 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x81 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x82 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x83 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x84 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x85 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x86 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x87 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x88 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x89 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8a has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8b has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8c has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8d has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8e has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8f has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x90 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x91 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x92 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x93 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x94 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x95 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x96 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x97 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x98 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x99 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9a has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9b has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9c has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9d has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9e has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9f has   16 sectors [Host vendor specific log]
GP/S  Log at address 0xe0 has    1 sectors [sCT Command/Status]
GP/S  Log at address 0xe1 has    1 sectors [sCT Data Transfer]

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
   1        0        0  Not_testing
   2        0        0  Not_testing
   3        0        0  Not_testing
   4        0        0  Not_testing
   5        0        0  Not_testing
Selective self-test flags (0x0):
 After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       256 (0x0100)
SCT Support Level:                   1
Device State:                        SMART Off-line Data Collection executing in background (4)
Current Temperature:                    28 Celsius
Power Cycle Min/Max Temperature:     28/37 Celsius
Lifetime    Min/Max Temperature:     21/39 Celsius
Under/Over Temperature Limit Count:   0/0
SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -40/70 Celsius
Temperature History Size (Index):    128 (2)

Index    Estimated Time   Temperature Celsius
  3    2011-10-29 18:24    28  *********
...    ..( 22 skipped).    ..  *********
 26    2011-10-29 18:47    28  *********
 27    2011-10-29 18:48    29  **********
...    ..( 46 skipped).    ..  **********
 74    2011-10-29 19:35    29  **********
 75    2011-10-29 19:36    28  *********
...    ..( 25 skipped).    ..  *********
101    2011-10-29 20:02    28  *********
102    2011-10-29 20:03    29  **********
...    ..( 26 skipped).    ..  **********
  1    2011-10-29 20:30    29  **********
  2    2011-10-29 20:31    28  *********

SCT Error Recovery Control:
          Read: Disabled
         Write: Disabled

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0009  2            3  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS

Syslog

EDIT: and yes this unRAID is inside ESX.

If I have to to help testing, i can bounce the drives and flash to a spare norco with a matching motherboard in just seconds..

EDIT2:

you can see it was all good. tons of mover logs in there (no porn lol).

then at Oct 29 18:54:06 i logged in from a telnet session and invoked the mover manually and all hell broke loose..

Oct 29 18:54:06 Goliath login[28735]: ROOT LOGIN on '/dev/pts/0' from '192.168.1.230'

Oct 29 18:54:09 Goliath kernel: sd 3:0:2:0: [sdd] Device not ready

Oct 29 18:54:09 Goliath kernel: sd 3:0:2:0: [sdd] Result: hostbyte=0x00 driverbyte=0x08

Oct 29 18:54:09 Goliath kernel: sd 3:0:2:0: [sdd] Sense Key : 0x2 [current]

Oct 29 18:54:09 Goliath kernel: sd 3:0:2:0: [sdd] ASC=0x4 ASCQ=0x2

Oct 29 18:54:09 Goliath kernel: sd 3:0:2:0: [sdd] CDB: cdb[0]=0x28: 28 00 00 00 a2 b0 00 00 08 00

Oct 29 18:54:09 Goliath kernel: end_request: I/O error, dev sdd, sector 41648

Oct 29 18:54:09 Goliath kernel: sd 3:0:2:0: [sdd] Device not ready

Oct 29 18:54:09 Goliath kernel: sd 3:0:2:0: [sdd] Result: hostbyte=0x00 driverbyte=0x08

Oct 29 18:54:09 Goliath kernel: sd 3:0:2:0: [sdd] Sense Key : 0x2 [current]

Oct 29 18:54:09 Goliath kernel: sd 3:0:2:0: [sdd] ASC=0x4 ASCQ=0x2

Oct 29 18:54:09 Goliath kernel: sd 3:0:2:0: [sdd] CDB: cdb[0]=0x28: 28 00 00 00 a2 b8 00 00 08 00

goes on until she redballs..

syslog-2011-10-29.zip

gfjardim · October 30, 2011

I got all kind of creepy errors with my array, moving back to b12a solved those problems. I've done a reiserfsck on all disks previously the update, so this might be some 3.1.0 kernel weirdness.

Attached is the first 5k lines of my syslog, the original file have more than 2 million lines!

PS: Johnm, are you using the open-vm-tools package into your VM?

syslog-.zip

MikeL · October 30, 2011

Here is the last few lines of my syslog.

Oct 29 21:50:55 mediaserver last message repeated 344 times

Oct 29 21:52:01 mediaserver last message repeated 344 times

Oct 29 21:53:06 mediaserver last message repeated 345 times

Oct 29 21:54:12 mediaserver last message repeated 344 times

Oct 29 21:55:17 mediaserver last message repeated 344 times

Oct 29 21:56:23 mediaserver last message repeated 344 times

Oct 29 21:57:28 mediaserver last message repeated 344 times

Oct 29 21:58:33 mediaserver last message repeated 345 times

Oct 29 21:59:36 mediaserver last message repeated 206 times

Oct 29 22:00:38 mediaserver last message repeated 241 times

Anyone have any idea what this is telling?

The lines before these get deleted I guess, because this is all there is, when I checked earlier, the times were different, but the rest of the info was pretty much the same.

Johnm · October 30, 2011

PS: Johnm, are you using the open-vm-tools package into your VM?

Yes,

but I was still running the one for 12a. the 13 plg was only just released, next reboot will have the new version.

purko · October 30, 2011

The `/root/mkmbr` that's included in 5.0-beta13 doesn't work with 3TB drives, right?

Will it be updated for 3TB? Or maybe a similar tool that does the same job for 3TB drives?

Right, I forgot about the utility. It was created to help deal with "4K-aligned" hard drives during that code transition period, why do you still want to use it?

The tool comes in handy when I preclear new disks. Of course, I can do it without mkmbr, but it's convenient to use it. It's a neat tool all by itself, and it could be even cooler if it could do the 3TB disks.

Not a big deal though, I was just wondering if you're planning to update it.

---

(Yes, I am aware of the preclear scripts posted elsewhere.)

PeterB · October 30, 2011

Not sure if there was any effort to work this issue from my last report, but NFS network connectivity is still intermittent during any "real" access (e.g. continuous activity that lasts 30 seconds or more in duration).

I believe that is the same problem I was experiencing with 12/12a. I guess that I will stick with 11 for now!

MikeL · October 30, 2011

Ok, I went into puttytel to check the true speed of my servers network.

This is what it shows. I believe that everything is fine, but 13b is not using the full 1Gb/s speed of my network, or simple features is giving the incorrect settings. Either way, the transfer speeds as well as parity is extremely slow in 13b compared to what b12a was before it went south.

Am I reading this information wrong, or is it a problem with beta 13?

The syslog is attached!

root@mediaserver:~# ethtool eth0

Settings for eth0:

Supported ports: [ TP ]

Supported link modes: 10baseT/Half 10baseT/Full

100baseT/Half 100baseT/Full

1000baseT/Full

Supports auto-negotiation: Yes

Advertised link modes: 10baseT/Half 10baseT/Full

100baseT/Half 100baseT/Full

1000baseT/Full

Advertised pause frame use: No

Advertised auto-negotiation: Yes

Speed: 100Mb/s

Duplex: Full

Port: Twisted Pair

PHYAD: 1

Transceiver: internal

Auto-negotiation: on

MDI-X: off

Supports Wake-on: pumbg

Wake-on: g

Current message level: 0x00000001 (1)

Link detected: yes

syslog_10-3--2011.zip

bonienl · October 30, 2011

Settings for eth0:

Supported ports: [ TP ]

Supported link modes: 10baseT/Half 10baseT/Full

100baseT/Half 100baseT/Full

1000baseT/Full

Supports auto-negotiation: Yes

Advertised link modes: 10baseT/Half 10baseT/Full

100baseT/Half 100baseT/Full

1000baseT/Full

Advertised pause frame use: No

Advertised auto-negotiation: Yes

Speed: 100Mb/s

Duplex: Full

Port: Twisted Pair

PHYAD: 1

Transceiver: internal

Auto-negotiation: on

MDI-X: off

Supports Wake-on: pumbg

Wake-on: g

Current message level: 0x00000001 (1)

Link detected: yes

Check your cabling, usually the 1Gb speed is not obtained due to bad cabling and/or connectors.

You may want to re-insert the RJ45 plug into your system, this forces the speed to be re-negotiated. Otherwise try another UTP cable (use only cat.5e or cat.6 grade).

Ps. SF is not adjusting any of your link settings (it only reports them).

nia · October 30, 2011

Did a very first boot with a couple of fresh 2TB disks (one was actually precleared) and one leftover 400 GB pulled at an earlier upgrade from my production system with some files left on it.

This is on the Pleiades build (see sig) with a M1015 direct passthru (like from the Atlas build thread) reflashed to phase 11 using Zerons batch http://lime-technology.com/forum/index.php?topic=12767.msg149129#msg149129

No addons installed.

Seems to boot allright, but goes on with a problem that I don't know if it is related to the old 400GB drive, the unformatted drive or something completely different.

Maybe it is related to when I tried pressing the spindown button? It worked on the drives, but maybe it crashed something else.

Just wanted to put the syslog in here if that would be useful for someone to see.

I'll do a proper configuration of an array later. If there is anything I can try for someone, I'm open. As long as instructions are pretty fool-proof ::)

If there actually are problems with spindown related to the M1015, can it then be reflashed to something else (eg 9240-8i) and be expected to work with this new kernel?

I will let the M1015 do a preclear on the 2TB drives, and we'll see where it goes from there.

Syslog-firstboot.zip

BRiT · October 30, 2011

If there actually are problems with spindown related to the M1015, can it then be reflashed to something else (eg 9240-8i) and be expected to work with this new kernel?

From your syslog, it seems exactly the issue others are seeing with any sort of LSI adapter after the drive(s) are spundown and then needs to be read from. This seems entirely related to kernel 3.1.0. If you put together your own unRAID with the exact same components from 5.0b13 but use kernel 3.0.x, you shouldn't get this spindown/up related false drive failures.

Oct 30 16:03:22 Tower emhttp: Spinning down all drives...
Oct 30 16:03:22 Tower kernel: mdcmd (15): spindown 0

Oct 30 16:03:23 Tower kernel: mdcmd (16): spindown 1

Oct 30 16:03:24 Tower kernel: mdcmd (17): spindown 2

Oct 30 16:03:26 Tower kernel: ror

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828432/2, count: 1

Oct 30 16:03:26 Tower kernel: md: disk1 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828440/1, count: 1

Oct 30 16:03:26 Tower kernel: md: disk2 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828440/2, count: 1

Oct 30 16:03:26 Tower kernel: md: disk1 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828448/1, count: 1

Oct 30 16:03:26 Tower kernel: md: disk2 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828448/2, count: 1

Oct 30 16:03:26 Tower kernel: md: disk1 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828456/1, count: 1

Oct 30 16:03:26 Tower kernel: md: disk2 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828456/2, count: 1

Oct 30 16:03:26 Tower kernel: md: disk1 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828464/1, count: 1

Oct 30 16:03:26 Tower kernel: md: disk2 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828464/2, count: 1

Oct 30 16:03:26 Tower kernel: md: disk1 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828472/1, count: 1

Oct 30 16:03:26 Tower kernel: md: disk2 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828472/2, count: 1

Oct 30 16:03:26 Tower kernel: md: disk1 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828480/1, count: 1

Oct 30 16:03:26 Tower kernel: md: disk2 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828480/2, count: 1

Oct 30 16:03:26 Tower kernel: md: disk1 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828488/1, count: 1

Oct 30 16:03:26 Tower kernel: md: disk2 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828488/2, count: 1

Oct 30 16:03:26 Tower kernel: md: disk1 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828496/1, count: 1

Oct 30 16:03:26 Tower kernel: md: disk2 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828496/2, count: 1

Oct 30 16:03:26 Tower kernel: md: disk1 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828504/1, count: 1

Oct 30 16:03:26 Tower kernel: md: disk2 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828504/2, count: 1

Oct 30 16:03:26 Tower kernel: md: disk1 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828512/1, count: 1

Oct 30 16:03:26 Tower kernel: md: disk2 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828512/2, count: 1

Oct 30 16:03:26 Tower kernel: md: disk1 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828520/1, count: 1

Oct 30 16:03:26 Tower kernel: md: disk2 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828520/2, count: 1

Oct 30 16:03:26 Tower kernel: md: disk1 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828528/1, count: 1

Oct 30 16:03:26 Tower kernel: md: disk2 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828528/2, count: 1

Oct 30 16:03:26 Tower kernel: md: disk1 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828536/1, count: 1

Oct 30 16:03:26 Tower kernel: md: disk2 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828536/2, count: 1

Oct 30 16:03:26 Tower kernel: md: disk1 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828544/1, count: 1

Oct 30 16:03:26 Tower kernel: md: disk2 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828544/2, count: 1

Oct 30 16:03:26 Tower kernel: md: disk1 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828552/1, count: 1

Oct 30 16:03:26 Tower kernel: md: disk2 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828552/2, count: 1

Oct 30 16:03:26 Tower kernel: md: disk1 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828560/1, count: 1

Oct 30 16:03:26 Tower kernel: md: disk2 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828560/2, count: 1

Oct 30 16:03:26 Tower kernel: md: disk1 read error

Oct 30 16:03:26 Tower kernel: handle_stripe read error: 28828568/1, count: 1

nia · October 30, 2011

Thanks BRiT. I suspected that. Really too bad with an issue outside limetechs control $:-\$

I think I will then rather try beta12a first before recompiling with a different kernel. The steps to build a new unRAID are not entirely clear to me (to say the least!). Unless there is a step-by-step guide to do this somewhere?

Johnm · October 30, 2011

Ok, more data..

first off, it says "Parity is Valid:. Last parity check 15278 days ago with no sync errors. "

try 6 hours ago. the count has been off the whole time. I assume a reboot will fix that as it has in the past.

spin ups and spin downs not working correctly on the M1015.

i did not redball today, instead it looks like I went "read only" (riserfs corruption).

I had to hard power off the server. it would not release disk 3. for some reason it spundown when I hit stop array locking the array.

action packed syslog attached.

EDIT:

I rolled back to 12 since I have to leave in a few for a project out of town.

12 booted up, found the array and redballed 2 drives.

off it goes.. ill deal with it later.

syslog-2011-10-30.zip

JackBauer · October 30, 2011

I really wish someone (much more unraid knowledgeable than myself) would log a "bug track" list in this thread so we could all keep track of the issues that arise.

Johnm · October 30, 2011

I really wish someone (much more unraid knowledgeable than myself) would log a "bug track" list in this thread so we could all keep track of the issues that arise.

so far, most of the issues are related to LSI cards.

PeterB · October 30, 2011

so far, most of the issues are related to LSI cards.

... and nfs?

MikeL · October 30, 2011

I figured out my networking problem!

I had moved my server from the basement to the coffee table because I was getting tired of running up and down the stairs to make changes while I was trying to get things ironed out. Apparently the cat-6 cable I was using was bad. As soon as I moved it back down and restarted connected to the cat-6 down there all is back to the 1000Mb/s speeds.

peter_sm · October 30, 2011

Must say that SAMBA is working perfectly, before my TViX (Slim s1) don't handle movies with a bit rate greater than 40Mbit, but now it's perfect, thanks for the SAMBA update TOM !

BETA 13 is running flawless on my server

tyrindor · October 30, 2011

Must say that SAMBA is working perfectly, before my TViX (Slim s1) don't handle movies with a bit rate greater than 40Mbit, but now it's perfect, thanks for the SAMBA update TOM !

BETA 13 is running flawless on my server

You make me worry, seems like i'm the only one experiencing slower speed (25MB/s parity vs 65-80MB/s parity on beta 12a).

I really hope beta14 doesn't result in the same thing for me...

gfjardim · October 30, 2011

From your syslog, it seems exactly the issue others are seeing with any sort of LSI adapter after the drive(s) are spundown and then needs to be read from. This seems entirely related to kernel 3.1.0. If you put together your own unRAID with the exact same components from 5.0b13 but use kernel 3.0.x, you shouldn't get this spindown/up related false drive failures.

I totally support your statement, the problem seem to be "mptsas" related.

unRAID Server Release 5.0-beta13 Available

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation