Jump to content

Beta10 Parity Check Issues?


Recommended Posts

I've been running problem free on 5.0 beta 6a for 2 or 3 months and decided to upgrade to the latest beta (beta10). Upgrade went fine. Everything looked as it should.

 

August 1st rolled around and my automated parity check started as usual.

 

Server ran through the most of the parity check, sending me the usual hourly update emails, but then stopped sending emails.... Shares were still working, I could Putty into unRaid, but the webGUI no longer came up... Not sure if parity check completed, as I could not access the gui.

 

I tried to manually unmount everything and reboot the server using instructions from Joe in the forum. Drives all unmounted, but system would refuse to reboot. Eventually I had to reset via power button...

 

Ran parity check a second time, with the same results. This time I checked the free memory and saw that there was only 50MB left out of 4 gigs. I killed all the processes I could, and that made the unMenu GUI appear, though the unRaid GUI still would not show (I would just get the UnMenu frame and UnMenu tools pages), and still did not want to gracefully reboot.

 

Currently running Parity check for a third time after another hard reboot. This time I killed everything (SabNZB, SickBeard, Transmission, Cache_Dirs) before running it and it seems to be doing better (8 more hours to go and 2.5GB remaining memory).

 

2 other observations:

 

1. Parity check seems slower then usual: Running at about 30-35 MB/s when I'm fairly sure with 5.0-beta6a it would run significantly faster

 

2. Drives are running hotter during parity. I've never had a temp warning before in the new case (Norco 4224), and both initial parity checks had a couple of drives hitting 45-47 degrees... (This may be completely unrelated to the above issues and just be a middle-of-summer thing)

 

I bring this up because I've been seeing a few posts lately with server hanging/webGui not loading issues and thought it might be of some value if it turns out to be a bug in the new beta...

 

Any other ideas what could be the issue here? I've been running parity checks with all those addons enabled for months without issue. The only variable that has changed is that I upgrade to beta10...

 

I'm currently not home so can't access my server to post logs, but thought I'd post to get some input from the experts as to what might be occurring...

 

HW

Green Parity Drive

10 Green Data Drives

1 7200 RPM Data Drive

1 7200 RPM Cache Drive

750w Corsair PS

4GB Ram

Athlon II x4 600e CPU

Norco 4224

 

 

Thanks!

 

DB.

Link to comment

I initally thought my parity check speeds were slower, but after letting it run for a couple minutes and then refreshing the UI the speeds were back to normal range. It took 26152 seconds to parity check with correct an array with (6) 2TB drives which works out to around 75-76 MB/sec.

Link to comment

So got home from work today and checked free ram. Still looks good! free -m report 2671 MB free!!

 

Unfortunately, the webGui is once again unresponsive.... Shares work fine. Putty connects normally. UnMenu frame loads, but unRaid gui refuses to load... :(.

 

SysLog attached!

 

Thanks in advance for any help provided!!!

 

I'm confident that downgrading back to 6a will fix this issue, but I was kinda hoping to add a couple of 3TB drives to the array!

 

Cheers,

 

DB

syslog1of2.txt

Link to comment

There seems to be a problem with disk7 or disk8. Post SMART reports for both.

 

Here you go! Thanks dgaschk!

 

Disk7:

smartctl -a -d ata /dev/sdh

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

 

=== START OF INFORMATION SECTION ===

Device Model:     WDC WD20EADS-32S2B0

Serial Number:    WD-WCAVY2103709

Firmware Version: 01.00A01

User Capacity:    2,000,398,934,016 bytes

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:   8

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Fri Aug  5 01:38:28 2011 EDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x85) Offline data collection activity

was aborted by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (40680) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: (   2) minutes.

Extended self-test routine

recommended polling time: ( 255) minutes.

Conveyance self-test routine

recommended polling time: (   5) minutes.

SCT capabilities:       (0x303f) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

 1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0

 3 Spin_Up_Time            0x0027   160   150   021    Pre-fail  Always       -       8983

 4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1415

 5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

 7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

 9 Power_On_Hours          0x0032   085   085   000    Old_age   Always       -       11676

10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0

11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0

12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       124

192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       26

193 Load_Cycle_Count        0x0032   124   124   000    Old_age   Always       -       229747

194 Temperature_Celsius     0x0022   112   105   000    Old_age   Always       -       40

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%     11676         -

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

   1        0        0  Not_testing

   2        0        0  Not_testing

   3        0        0  Not_testing

   4        0        0  Not_testing

   5        0        0  Not_testing

Selective self-test flags (0x0):

 After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

 

 

 

 

 

 

Disk8:

smartctl -a -d ata /dev/sdc

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

 

=== START OF INFORMATION SECTION ===

Device Model:     WDC WD20EADS-32S2B0

Serial Number:    WD-WCAVY2140153

Firmware Version: 01.00A01

User Capacity:    2,000,398,934,016 bytes

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:   8

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Fri Aug  5 01:41:01 2011 EDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x84) Offline data collection activity

was suspended by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (39300) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: (   2) minutes.

Extended self-test routine

recommended polling time: ( 255) minutes.

Conveyance self-test routine

recommended polling time: (   5) minutes.

SCT capabilities:       (0x303f) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

 1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       2

 3 Spin_Up_Time            0x0027   187   147   021    Pre-fail  Always       -       7641

 4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1498

 5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

 7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

 9 Power_On_Hours          0x0032   084   084   000    Old_age   Always       -       11786

10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0

11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0

12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       125

192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       30

193 Load_Cycle_Count        0x0032   132   132   000    Old_age   Always       -       204407

194 Temperature_Celsius     0x0022   113   106   000    Old_age   Always       -       39

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       1

200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

 

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

   1        0        0  Not_testing

   2        0        0  Not_testing

   3        0        0  Not_testing

   4        0        0  Not_testing

   5        0        0  Not_testing

Selective self-test flags (0x0):

 After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

 

 

 

 

Link to comment

It could be a power cable or SATA cable issue. Reseat the connectors to these disks and replace any power splitters.

 

That's what I thought as well, so when you mentioned a few posts ago that those drives were misbehaving, i reseated all the sata and power cables on the drives... If that was the problem, that might explain the clean smart data (I took it after I reseated everything)...

 

Also, just to see what would happen, I reverted back to .6a and re-ran the parity check. Things appear to be back to normal. Speeds are good, finishing much faster then before, emails continue arriving every hour (before they stopped coming halfway through) and interface is coming up (So far).

 

If this finishes without issue, I'll switch back to .10 and see if it parity checks no problem as well. If it does, logic dictates that it must have been an unseated cable. If it gives me the same issues as before, it leads me to believe there is something not right in the new beta...

 

Thanks for all the help!

 

:)

 

[EDIT] There's another new post in the support forum about beta 10 and parity check issues in installations that were working before.... [/EDIT]

Link to comment

So after a few more days of testing I've got some more data:

 

1. I switched back to beta 6a, and parity check runs at regular speed (Between 60-90MB/s), and completes without issue

 

2. After the successful parity check on 6a, I tried 10 again, just in case it was an unseated cable. No luck. Tried 3 more times and:

a) parity checks all ran significantly slower (25-35 MB/s)

b) GUI hung every time (One time parity check almost finished before it happens), and hard resets eventually required. Also, after parity check should be finished, server was consuming significantly more power then when at idle (though less then when parity check is running)...

 

3. Switched to beta 9. First thing I notice is parity checks are back to normal. Can't report yet if things finish properly, but my gut tells me it will finish without issue.

 

I've attached the syslogs from the last time beta10 was running, as well as the current beta9

syslogs.zip

Link to comment

So just to conclude:

 

Hurrah! beta9 finished the parity check without any issues. Everything is running as it should! :)

 

Next step: Bringing my new 3TB drives into the array! :)

 

If I had to guess what the issue is with beta 10, I would say it must have something to do with the new kernel being employed. Tom mentions that there are some definite parity speed issues with the new kernel, so the odd results I am getting (including parity checks running 2 to 3 times slower than normal) could well be linked to this....

 

The fact that I am seeing an unusual number of posts with people having various issues with beta10 and parity checks leads me to believe that this is definitely not a unique case with my own setup, and might be something that needs to be examined more closely...

 

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...