Experiencing read errors but all tests normal

June 24, 201313 yr

I am still experiencing a lot of read errors on one of my drives (drive 3), but all tests and parity comes back normal/successful.

Should I just replace the drive on general principle and try to submit for warranty replacement, at this point it may still be easy since I have access to it and just copy the files to another drive.

syslog-2013-06-24.txt

Quote

June 24, 201313 yr

Your "read" errors are "media errors"

Jun 24 08:07:36 Tower kernel: ata4.00: irq_stat 0x40000001

Jun 24 08:07:36 Tower kernel: ata4.00: failed command: READ DMA EXT

Jun 24 08:07:36 Tower kernel: ata4.00: cmd 25/00:50:17:07:5c/00:02:e8:00:00/e0 tag 0 dma 303104 in

Jun 24 08:07:36 Tower kernel: res 51/40:5f:f8:07:5c/00:01:e8:00:00/e0 Emask 0x9 (media error)

Jun 24 08:07:36 Tower kernel: ata4.00: status: { DRDY ERR }

Jun 24 08:07:36 Tower kernel: ata4.00: error: { UNC }

"These are errors where the checksum at the end of a sector on a disk being read does not match the contents of the sector. (In other words, the disk considers the sector as un-readable, and un-correctable. It tries multiple times before deciding it cannot read the sector and have it match the checksum UNC = uncorrectable)

When "read"errors occur unRAID re-constructs the correct contents of the unreadable sector by reading parity in combination with all the other data disks in your server. At the same time, it re-writes the same (previously unreadable ) sector so that the SMART firmware on the disk may re-allocate it is needed. (assign a spare sector from its pool of spare sectors)

Odds are high your disk has sectrs that have been reallocated, and may have sectors pending re-allocation. The only way to know its health is to get a SMART report of the disk.

To do this,on the command line type:

smartctl -a /dev/sde

and post the output in this thread.

We are looking at the numbers in the "RAW" column for re-allocated sectors and sectors pending re-allocation.

Joe L.

Quote

June 24, 201313 yr

Author

Is this what I needed?

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0

3 Spin_Up_Time 0x0027 164 148 021 Pre-fail Always - 8775

4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 770

5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0

7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0

9 Power_On_Hours 0x0032 067 067 000 Old_age Always - 24120

10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0

11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0

12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 224

192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 137

193 Load_Cycle_Count 0x0032 180 180 000 Old_age Always - 61169

194 Temperature_Celsius 0x0022 117 109 000 Old_age Always - 35

196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 13

198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0

200 Multi_Zone_Error_Rate 0x0008 200 177 000 Old_age Offline - 0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Short offline Completed without error 00% 23887 -

# 2 Short offline Completed without error 00% 23887 -

# 3 Short offline Completed: read failure 70% 23886 3905729434

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 0 Not_testing

2 0 0 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

It is also having the IRQ16 shutdown bug again, which I have not an issue with this board/cpu before and nothings changed so I am at a loss with that one.

Whether it is or not I ordered a new drive to replace it or just add it to the array...lol...always use more sapce

Quote

June 24, 201313 yr

Well, replacement is certainly an option, but according to these lines in the report there are 13 sectors pending re-allocation:

5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 13

The way most users of unRAID would handle this is to have those sectors re-written, and re-allocated. This can be done by first making a copy of any critical data on that disk (just in case) and then:

1. stop the array

2. Make a copy of the "config" directory on the flash drive while the array is stopped. Save it someplace safe. (We should not need it, but just in case we can revert to this configuration easily with it)

3. un-assign the disk with the read errors.

4. start the array with the disk un-assigned (this will allow unRAID to forget its model/serial number so it can be used as its own replacement)

5. stop the array once more

6. re-assign the disk. It will be then written as its own replacement (upon which it will be re-constructed and all the sectors pending re-allocation should be re-allocated.) Basically, everything on the disk will be re-written in place. When it gets to the 13 sectors pending re-allocation the disk will first try to re-write the existing sector and checksum. If that works, the sector will not be re-allocated since it will then be readable and its affiliated check-sum match. If not successful, it will be re-allocated from the pool of spare sectors.

Note that the re-construction process will take about as long as the initial parity sync, and during that interval you'll not be protected by parity if another disk should fail.

Quote

June 24, 201313 yr

Author

Since I have the disc coming in and would have to use that as the backup, can I instead just preclear, and install as a new disc then use it to copy all the data to it, then use the current disc as a new disc.

Questions:

Will have to go through the pre-clear process again on the old disc?.

Will that accomplish the same thing plus give me the added space?

Wouldn't this be okay as well and since I use a second (free) unraid install as an OS to pre-clear it should leave the only downtime being the parity sync right?

Or am I better off doing the re-write

Quote

June 24, 201313 yr

Since I have the disc coming in and would have to use that as the backup, can I instead just preclear, and install as a new disc then use it to copy all the data to it, then use the current disc as a new disc.

Questions:

Will have to go through the pre-clear process again on the old disc?.

Will that accomplish the same thing plus give me the added space?

Wouldn't this be okay as well and since I use a second (free) unraid install as an OS to pre-clear it should leave the only downtime being the parity sync right?

Or am I better off doing the re-write

Yes, you will have to preclear the old disk if you remove it from the configuration and want to add it to a new slot.

Yes, preclearing will force the drive to read and write all involved sectors, thus allowing reallocation to work as needed.

Since you will be preclearing the new disk for testing purposes anyway, it makes perfect sense to add it to the array and copy the files from the drive having issues to the new drive. Theoretically you will still be protected from a drive failure during the entire procedure so far, and would only lose protection when you remove the drive and recalc parity. You will still be unprotected for the same length of time, but you would have two copies of the data in question during the at risk period.

Quote

June 24, 201313 yr

Author

Thanks loads for the help, you answered everything perfectly/...now one more thing.

I seemed to have lost all of my permissions (I decided to run the smb script top see if that was the problem).

But it seems it made all of my shares read only access, what can I do to fix this?

Thanks, Ice

Quote

June 24, 201313 yr

Attach a new syslog.

Quote

June 24, 201313 yr

Author

Here is the new log:

I had to remove about 3k lines of:

Jun 24 14:06:45 Tower kernel: REISERFS error (device md3): vs-4080 _reiserfs_free_block: block 229612613: bit already cleared error notices in order to get it uploaded.

syslog-2013-06-24.txt

Quote

June 24, 201313 yr

Here is the new log:

I had to remove about 3k lines of:

Jun 24 14:06:45 Tower kernel: REISERFS error (device md3): vs-4080 _reiserfs_free_block: block 229612613: bit already cleared error notices in order to get it uploaded.

That indicates the file system has probably been set to read-only to prevent further corruption.

You need to un-mount disk3 and then run

reiserfsck --check /dev/md3

to have it tell you what command needs to be run next to fix the corruption.

Details are in the wiki under "check file systems"

Quote

June 24, 201313 yr

Author

Lol...I guess I'm just daft.

I couldn't get it to even find md3 once it was unmounted, but it ran okay mounted just skipped the journal replay (probably what we need, right?)

Also, I am not finding anything that states 'check file systems' in the wiki link in your sig.

Quote

June 24, 201313 yr

Lol...I guess I'm just daft.

I couldn't get it to even find md3 once it was unmounted, but it ran okay mounted just skipped the journal replay (probably what we need, right?)

Also, I am not finding anything that states 'check file systems' in the wiki link in your sig.

http://lime-technology.com/wiki/index.php/Check_Disk_Filesystems

Quote

June 24, 201313 yr

Author

Thanks Joe L.

Okay I did what it said and this is what it put out (seems like it is still working at this point?)

Replaying journal: Done.
Reiserfs journal '/dev/md3' in blocks [18..8211]: 0 transactions replayed

Should I use the --fix-fixable switch now? or where do I go from here?

Quote

June 24, 201313 yr

Thanks Joe L.

Okay I did what it said and this is what it put out (seems like it is still working at this point?)

Replaying journal: Done.
Reiserfs journal '/dev/md3' in blocks [18..8211]: 0 transactions replayed

Should I use the --fix-fixable switch now? or where do I go from here?

Normally, it would tell you to run fix-fixable if it was needed. If it is not yet done, let it finish.

(probably would not hurt anything to run fix-fixable, but it will tell you once the current check is complete.

Just don't go running anything further unless it tells you to.)

Quote

June 24, 201313 yr

Author

Any idea how long this will take to finish, it's 1.5TB of data but it hasn't done anything for almost 3 hours, I haven't a clue (snicker) so I am asking.

I am assuming that it is done, frozen or screwed up. Don't hjave the time to baby sit, must use the server later tonight so it will have to wait.

Okay, so I am back at copying the data to a 4tb external (but 1mb/s is just brutal), then I guess I will try the first suggestion mentioned here as the reiserfsck does not seem to be doing anything at all. I was reading and shouldn't it take less than an hour to complete the test? It has take 12 hours and not budged. So I will probably go the other way and hope that will fix the issue

Quote

June 25, 201313 yr

Are you running reiserfsck on the disk with 13 pending sectors? If so, the pending sectors must be cleared before reiserfsck will work.

Quote

June 25, 201313 yr

Author

I believe so since nothing has worked so far, I am now asking if I am just better off saving the data (in process), and doing what was mentioned first

Well, replacement is certainly an option, but according to these lines in the report there are 13 sectors pending re-allocation:
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 13

The way most users of unRAID would handle this is to have those sectors re-written, and re-allocated. This can be done by first making a copy of any critical data on that disk (just in case) and then:

1. stop the array

2. Make a copy of the "config" directory on the flash drive while the array is stopped. Save it someplace safe. (We should not need it, but just in case we can revert to this configuration easily with it)

3. un-assign the disk with the read errors.

4. start the array with the disk un-assigned (this will allow unRAID to forget its model/serial number so it can be used as its own replacement)

5. stop the array once more

6. re-assign the disk. It will be then written as its own replacement (upon which it will be re-constructed and all the sectors pending re-allocation should be re-allocated.) Basically, everything on the disk will be re-written in place. When it gets to the 13 sectors pending re-allocation the disk will first try to re-write the existing sector and checksum. If that works, the sector will not be re-allocated since it will then be readable and its affiliated check-sum match. If not successful, it will be re-allocated from the pool of spare sectors.

Note that the re-construction process will take about as long as the initial parity sync, and during that interval you'll not be protected by parity if another disk should fail.

And woulfd that fix this, I am almost to the poiint of removing (saving) all 8TB's of data and restarting this thing from scratch...it would have been done by now...lol.

Quote

June 25, 201313 yr

I believe so since nothing has worked so far, I am now asking if I am just better off saving the data (in process), and doing what was mentioned first

Well, replacement is certainly an option, but according to these lines in the report there are 13 sectors pending re-allocation:
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 13

The way most users of unRAID would handle this is to have those sectors re-written, and re-allocated. This can be done by first making a copy of any critical data on that disk (just in case) and then:

1. stop the array

2. Make a copy of the "config" directory on the flash drive while the array is stopped. Save it someplace safe. (We should not need it, but just in case we can revert to this configuration easily with it)

3. un-assign the disk with the read errors.

4. start the array with the disk un-assigned (this will allow unRAID to forget its model/serial number so it can be used as its own replacement)

5. stop the array once more

6. re-assign the disk. It will be then written as its own replacement (upon which it will be re-constructed and all the sectors pending re-allocation should be re-allocated.) Basically, everything on the disk will be re-written in place. When it gets to the 13 sectors pending re-allocation the disk will first try to re-write the existing sector and checksum. If that works, the sector will not be re-allocated since it will then be readable and its affiliated check-sum match. If not successful, it will be re-allocated from the pool of spare sectors.

Note that the re-construction process will take about as long as the initial parity sync, and during that interval you'll not be protected by parity if another disk should fail.

And woulfd that fix this, I am almost to the poiint of removing (saving) all 8TB's of data and restarting this thing from scratch...it would have been done by now...lol.

The procedure outlined should correct the pending sectors. Once the disk surface is corrected then you can start fixing its contents, i.e., the file system.

Quote

June 26, 201313 yr

Author

Thank you, will try once again, once I have the data backed up

Quote

June 29, 201313 yr

Author

Okay it is doing the rebuild...but at 3.22mb/s is this right? 2TB in 10049 minutes???

It also seems to be staying as a read-only file system on 'disk3'

Sorry, jutst while I was tying it went down to 2.58...I think at this ppoint I will just back-up the dat on all drives and reset the entire array, maybe even change to a different server as this has been problematic at best, almost from minute one

I do know I am not waiting 170 Hours (7 days) to use this when I can have ithe data reloaded onto a another system in 1 day

Guess I am stuck with this.....

Who knew that the WD20EURS is somehow smaller than the WD20EVDS when precleared.....oh well, I guess i'llk just have to throw the monbey at it and build a new large and try something different.

Quote

June 29, 201313 yr

Okay it is doing the rebuild...but at 3.22mb/s is this right? 2TB in 10049 minutes???

Attach a new syslog.

It also seems to be staying as a read-only file system on 'disk3'

Correct. The physical disk surface must be correct before the file system can be corrected. See my last post.

Sorry, jutst while I was tying it went down to 2.58...I think at this ppoint I will just back-up the dat on all drives and reset the entire array, maybe even change to a different server as this has been problematic at best, almost from minute one

I do know I am not waiting 170 Hours (7 days) to use this when I can have ithe data reloaded onto a another system in 1 day

Guess I am stuck with this.....

Cannot provide any insight without a new syslog. Attach a new syslog. Rebuilding from scratch will take exactly as long if the hardware problems are not corrected first.

Who knew that the WD20EURS is somehow smaller than the WD20EVDS when precleared.....oh well, I guess i'llk just have to throw the monbey at it and build a new large and try something different.

Those drives are the same size. All modern drives have standardized sizes. Any system will have problems due to the hardware errors that your experiencing.

Quote

June 30, 201313 yr

Author

Array StatusSTARTED; 6 disks in array.
Rebuilding disk3 Total Size 1,953,514,552 KB

Current 526,064,632 (26.9%)

Speed 3,691 KB/sec

Finish 6426 minutes

Syslog attached

Who knew that the WD20EURS is somehow smaller than the WD20EVDS when precleared.....oh well, I guess i'llk just have to throw the monbey at it and build a new large and try something different.

Those drives are the same size. All modern drives have standardized sizes. Any system will have problems due to the hardware errors that your experiencing.

I also wasn't aware of any Hardware errors As I assumed they were just corrupt sectors not damaged ones. If it is indeed the case than I am going to have to go through this crap again when I replace the drive, which I will be doing instantly if there is damaged hardware

syslog-2013-06-30.txt

Quote

July 1, 201313 yr

Jun 29 14:49:49 Tower kernel: ata6.00: HPA detected: current 3907027055, native 3907029168

It looks like the MB is corrupting the disk. See here: http://lime-technology.com/forum/index.php?topic=10866.0

Did the rebuild complete?

Quote

July 1, 201313 yr

Author

I wonder why this hasn't happened before since the board is newer (DoM 2011, BIOS 2012) and has been running in this particular unraid install since over a year now.

Or are you seing the drive that I precleared on the older Gigabyte MB (I used another system for this particular preclear) that I tried to install

Who knew that the WD20EURS is somehow smaller than the WD20EVDS when precleared

That would explain why that drive was stated as smaller by unRAID. I didn't use that drive so there shouldn't be an issue, hopefully you are reading that error.

All of the currently used drives are listing 'LBA48 user addressable sectors: 3907029168' and even the parity drive 'LBA48 user addressable sectors: 5860533168'

Otherwise, wouldn't this have happened long ago?

And no the rebuild is only at 31%

Quote

July 1, 201313 yr

Attach a new syslog.

Quote

Experiencing read errors but all tests normal

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)