November 4, 201015 yr Same thing happened a few days ago. Out of nowhere can't acces the server, disapeared from my finder window, cant ping and cant telnet ( other post here ) I did have to reset the server again from the reset button in the chasis and now I am facing another parity check, second in 4 days. And just like last time, parity sync shows estimated time of 2600 mins for 3.5 TB of data. Any idea what is the problem now?? I dont want ot have to reset the server and run a parity check every other day. Log after reset attached Thanks in advance Syslog.txt
November 4, 201015 yr Same thing happened a few days ago. Out of nowhere can't acces the server, disapeared from my finder window, cant ping and cant telnet ( other post here ) I did have to reset the server again from the reset button in the chasis and now I am facing another parity check, second in 4 days. And just like last time, parity sync shows estimated time of 2600 mins for 3.5 TB of data. Any idea what is the problem now?? I dont want ot have to reset the server and run a parity check every other day. Log after reset attached Thanks in advance Parity syncs are done on bits on the disk, regardless if they represent files, or file system structures, or empty space. A parity check always checks the entire disk. The disk could be empty, full, or even just added to the array and un-formatted they are all treated the same. Your syslog shows no errors. It is therefore difficult to know what is happening. You can start by performing the most basic of tests. First perform a memory test. (from the boot menu) Let it run overnight. There should be no errors. Make sure the memory voltage clock speed and timing are set correctly for your specific make and model memory strips. Make sure the fans in the case are working. Many CPUs will shut themselves down if they overheat. Other than that, leave a tail -f /var/log/syslog in either a telnet window or on the system console and it the server starts filling the syslog with errors you can see them.
November 5, 201015 yr Author Same thing happened a few days ago. Out of nowhere can't acces the server, disapeared from my finder window, cant ping and cant telnet ( other post here ) I did have to reset the server again from the reset button in the chasis and now I am facing another parity check, second in 4 days. And just like last time, parity sync shows estimated time of 2600 mins for 3.5 TB of data. Any idea what is the problem now?? I dont want ot have to reset the server and run a parity check every other day. Log after reset attached Thanks in advance Parity syncs are done on bits on the disk, regardless if they represent files, or file system structures, or empty space. A parity check always checks the entire disk. The disk could be empty, full, or even just added to the array and un-formatted they are all treated the same. Your syslog shows no errors. It is therefore difficult to know what is happening. You can start by performing the most basic of tests. First perform a memory test. (from the boot menu) Let it run overnight. There should be no errors. Make sure the memory voltage clock speed and timing are set correctly for your specific make and model memory strips. Make sure the fans in the case are working. Many CPUs will shut themselves down if they overheat. Other than that, leave a tail -f /var/log/syslog in either a telnet window or on the system console and it the server starts filling the syslog with errors you can see them. Thanks JOE. I'll try all this after the Parity Sync is done and ill report back
November 5, 201015 yr Author The Sync ended with no errors reported at the end, still the Syslog shows some error that i don't know what are about. Can anyone take a look please? Syslog attached Thx Syslog.txt
November 5, 201015 yr Disk0 (your parity disk) is reporting that it has "media errors" (un-readable sectors) When this happens unRAID will re-construct from the other disks the un-readable sector and write it back to the un-readable disk. If the SMART firmware on the disk is working correctly, it will re-allocate the un=readable sector. If you get a smart report on the parity drive you'll probably see a number of re-allocated sectors. If this number continues to increase every time you perform a parity check, it is time to RMA the drive. To get a smart report on that drive type: smartctl -d ata -a /dev/sdb Look for the lines describing re-allocated sectors or sectors pending re-allocated sectors. Joe L.
November 5, 201015 yr Author Disk0 (your parity disk) is reporting that it has "media errors" (un-readable sectors) When this happens unRAID will re-construct from the other disks the un-readable sector and write it back to the un-readable disk. If the SMART firmware on the disk is working correctly, it will re-allocate the un=readable sector. If you get a smart report on the parity drive you'll probably see a number of re-allocated sectors. If this number continues to increase every time you perform a parity check, it is time to RMA the drive. To get a smart report on that drive type: smartctl -d ata -a /dev/sdb Look for the lines describing re-allocated sectors or sectors pending re-allocated sectors. Joe L. I did get this after I used the command === START OF INFORMATION SECTION === Device Model: WDC WD20EARS-00MVWB0 Serial Number: WD-WCAZA0185651 Firmware Version: 50.0AB50 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Fri Nov 5 09:55:36 2010 CLST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (36600) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 200 162 021 Pre-fail Always - 4958 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 109 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 471 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 37 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 22 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2359 194 Temperature_Celsius 0x0022 126 118 000 Old_age Always - 24 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 6 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 3 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 8 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. root@Tower:~# Those 6 current pending sectors are there since the last forced reboot that I did a few days ago. (Linked in the first post) So now i should run a parity check and see if the pending / reallocated sectors increase?
November 5, 201015 yr So now i should run a parity check and see if the pending / reallocated sectors increase? Yes.
November 6, 201015 yr Author All the same, 0 Sync errors. No re allocated sectors or pending reallocated sectors. In fact I thing that the pending reallocated sectors went down from 8 or 6 to 5. Weird?
November 7, 201015 yr Not wierd, pending are sectors that are awaiting replacement, so they were replaced, they probably would show up in reallocated event count. I think, I'm still learning the ins and outs of this stuff myself.
November 7, 201015 yr Not wierd, pending are sectors that are awaiting replacement, so they were replaced, they probably would show up in reallocated event count. I think, I'm still learning the ins and outs of this stuff myself. sometimes sectors can be re-written to their original locations and not re-allocated. You'll then see a re-allocation event, but no re-allocation.
Archived
This topic is now archived and is closed to further replies.