Preclear.sh results - Questions about your results? Post them here. - Page 99 - User Customizations

December 27, 201411 yr

Hello! I seem to be having a problem clearing new disks. I have added 2 new disks and ran the preclear script on both. When I query the disk with the -l to see if they are cleared already, it says they are cleared. When I stop the array and try to add the disks, it wants to clear them again.

So I bit the bullet one day and ran the clear from the webgui. Took over 24 hours and then it said it still was not cleared. It stopped the array after the clear was done and then asked for it to be cleared again.

I can add the disk as a cache drive, but not a normal drive. I am running 5.0.6 with preclear script 1.15

Any suggestions would be greatly appreciated! Great community unRaid has and never had any issue besides this

Can you attach a zip file of all the preclear reports? I'm afraid I won't have time until evening, but someone else may.

Hi Rob! Thanks for getting back to me. Attached are the pre_clear reports. Some are older. The drive I was trying to clear is /dev/sde

Thanks again!

preclear_reports.zip

Quote

December 27, 201411 yr

Hello! I seem to be having a problem clearing new disks. I have added 2 new disks and ran the preclear script on both. When I query the disk with the -l to see if they are cleared already, it says they are cleared. When I stop the array and try to add the disks, it wants to clear them again.

So I bit the bullet one day and ran the clear from the webgui. Took over 24 hours and then it said it still was not cleared. It stopped the array after the clear was done and then asked for it to be cleared again.

I can add the disk as a cache drive, but not a normal drive. I am running 5.0.6 with preclear script 1.15

Any suggestions would be greatly appreciated! Great community unRaid has and never had any issue besides this

Can you attach a zip file of all the preclear reports? I'm afraid I won't have time until evening, but someone else may.

Hi Rob! Thanks for getting back to me. Attached are the pre_clear reports. Some are older. The drive I was trying to clear is /dev/sde

No problems at all in any of the preclear reports. Drive is perfect, and correctly precleared. I'm out of ideas. DO you happen to have the syslog for the session where it rejected it? Perhaps a clue will be there. If no syslog from then, would it be possible to try adding it again, and grab the syslog after, and attach it here?

Quote

December 29, 201411 yr

I will try that Rob. Though won't be until new year as away for the holidays.

I'll post back the results when I have them.

Thanks!

Quote

December 30, 201411 yr

G'Day All!

As a total Noob so taking baby steps and learning/reading as I go!

So now that I've finally got my system up and running, I've installed unRAIDbeta6 via a vmdk on esxi O/S I kicked off the preclear on 5 drives simultaneously.

I used the command preclear_disk.sh -r 65536 -w 65536 -b 2000 -A /dev/sdX where X=(a-e) for each of my 5 drives (3x3TB & 2X4TB WD REDs) what I didn't do is set for additional passes just only 1 pass. Now that I am 3 hours in with another 4 or so hours remaining, should I run another 2 passes after it has finished or does that defeat the purpose of this soak in test?

These are not new drives, so I am not sure what to expect the results to come back with as the plan is to use the 3x3TB as storage use the 1x4TB as Parity and the other 1x4TB as a parity/backup, then later increase the array as I buy more HDD (I've got room for upto 13 HDD)

I will post the results of the single pass once complete, so figures Crossed!

Cheers

Kosti

Quote

December 30, 201411 yr

Just a thought - those times sound remarkably fast for drives of the size you mention. Do you realize that there are 3 phases per pass, and that the first one is only about 25-30% of the elapsed time for all three passes.

As to how many passes, as these are previously used drives and you do not suspect any problems then one pass should be fine as you are not trying for the initial 'burn-in' test to detect early life failures. However look carefully at the final results to check that no errors are indicated and that there are no pending sectors (or a large number of reallocated sectors).

Quote

December 30, 201411 yr

It's going to take around 46-50 hours for a single pass on a 4TB drive. 5 hours of 9 is not the entire preclear, just the first stage. Pre read, then writes zeros, then post read (which takes twice as long as the pre read just FYI). Some people do multiple passes, and some have reported some drives that pass the first and fail the second, or third, but I think it's a little overkill, especially with the large drive sizes we have and how long it takes. Running 3 passes on a 6TB drive would take over a week (1 is ~61 hours IIRC), with constant random seeking, that's a lot of (perhaps unnecessary) wear. Someone here made an analogy once, something along the lines of it's akin to driving a car cross country just to see if it will make the 20 minute commute to work, here. Something smaller than 4 TB I would do multiple passes, above that I do one. Up to you really.

Also, it's not a bad idea to do a long SMART test after the array is up and running, just to have a good baseline recorded. I also try to do long smart tests once a month just before the scheduled parity check.

Here is a script Joe provided to generate smart results automatically with dates in the filename.

Quote

December 30, 201411 yr

Just a thought - those times sound remarkably fast for drives of the size you mention. Do you realize that there are 3 phases per pass, and that the first one is only about 25-30% of the elapsed time for all three passes.

As to how many passes, as these are previously used drives and you do not suspect any problems then one pass should be fine as you are not trying for the initial 'burn-in' test to detect early life failures. However look carefully at the final results to check that no errors are indicated and that there are no pending sectors (or a large number of reallocated sectors).

Yep, thought it was a little fast too! Still going at 13+ hours, when looking at the drives, the 1st 3TB seems to be the slowest as it was the 1st one I kicked off and it's about 1-2% slower, I notices the speed shows it moving from 112MB-140MB in that area where as the others a llittle more 120-140MB/s.

It's going to take around 46-50 hours for a single pass on a 4TB drive. 5 hours of 9 is not the entire preclear, just the first stage. Pre read, then writes zeros, then post read (which takes twice as long as the pre read just FYI). Some people do multiple passes, and some have reported some drives that pass the first and fail the second, or third, but I think it's a little overkill, especially with the large drive sizes we have and how long it takes. Running 3 passes on a 6TB drive would take over a week (1 is ~61 hours IIRC), with constant random seeking, that's a lot of (perhaps unnecessary) wear. Someone here made an analogy once, something along the lines of it's akin to driving a car cross country just to see if it will make the 20 minute commute to work, here. Something smaller than 4 TB I would do multiple passes, above that I do one. Up to you really.

Also, it's not a bad idea to do a long SMART test after the array is up and running, just to have a good baseline recorded. I also try to do long smart tests once a month just before the scheduled parity check.

Here is a script Joe provided to generate smart results automatically with dates in the filename.

WOW I never realised it would be that log, checked power usage which is around 300Watts, todays temp will be around 25Deg C but tomorrow will be in the 30Deg C so it will get pretty warm in my house, no aircon.

Well I may just go a mow the lawn since its a waiting game!

Thanks for the feedback and I will look at generating the SMART results and get them up for inspection

Cheers

Kosti

Quote

December 31, 201411 yr

I don't have results to ask about yet, but what I do have is on cycle 2 of 3:

= Post-Read in progress: 69% complete.
(  696,254,464,000  of  1,000,204,886,016  bytes read ) 1.2 MB/s
Disk Temperature: 32C, Elapsed Time:  39:53:56

Note 40 hours to run less than 2 complete passes on a 1 TB drive!!! Note the speed of 1.2 MB/s!!! I noticed that on pass 1, as well, and it was down into the KB/s range.

Now, this is an older drive that spend a couple of years in a WinXP machine, then was migrated to my unRAID box as it was built, then started giving me lots and lots of errors. I've got all the data off of it and replaced the drive in the array a couple of days ago, then decided to re-run preclear to see if there were just some bad areas that could be mapped out or if it was really dead. Looks like it's really, really dead...

I looked to see if there were any intermediate logs (after each cycle?) generated by preclear, but it doesn't look like it.

Any thoughts?

EDIT: I'm getting a fair number of issues being reported in the syslog (attached)

EDIT 2: It actually seems to have hung. It's been about 2 hours and nothing has changed in my telnet session - byte count read and elapsed time haven't changed one bit.

EDIT 3: Nope, it's moving. It's at 73% complete now, and has sped up to a blazing 6.1 MB/s

syslog-2014.12.31.1116.txt

Quote

December 31, 201411 yr

Attached is my syslog after preclearing again.

Your help is appreciated!

syslog.txt.zip

Quote

December 31, 201411 yr

Happy New Year All, wow 2015 already!! Wellit is in Sydney AUS.

The 3 X 3TB have completed and I have posted the before and after results, I see nothing that stands out to suggest these are not good for data.

Temps are a little high, so i will look at reversing the FANS on the caddys and have them blow are from back to front instead of sucking are in from front to back and see it that reduces them.

Please let me know how these 3 drives look and I should have the 4TB results in about a few more hours

Now I am not sure I have attached the correct syslog

EDIT - found the correct syslog via unmenu

Cheers

Kosti

SmartLogs.zip

syslog-2015-01-01.zip

Quote

January 1, 201511 yr

Hey All

The 4TB drives finally finished

Please see attached files and let me know if I'm good to go

Side note - In the original preclear sticky being a noob I have been reading to understand the commands I needed to extract the file to post up and found one listing in there that didn't work for me until i changed it - A very simple typo error

This thread http://lime-technology.com/forum/index.php?topic=2817.0 states

They are also in /var/log/smart_start_sdX and /var/log/smart_finish_sdX so you can see them with your browser at:
//tower/log/smart_start_sdX and //tower/smart_finish_sdX

I assume in this syntax log locations was missing for the finish file - no big deal, but it took me a few minutes to figure out why i couldn't get the files, LOL

Again, thanks for the support team, here are the logs for the 2 x 4TB HDD attached.

Now on a side note, I was reading for a cache drive, i found a spare WD Black 500GB drive, should I use this or should I use a spare 600G SSD drive? I will need to run preclear on the 500G since it is an older drive so will power down the VM and insert it into the caddy as I am not sure if they are hotswap capable, so don't want to risk it

Also do I add this 500G HDD to the onboard SATA or in the Caddy connected to my HBA card to get the best performance for the cache drive??

Cheers

Kosti

EDIT -

OK i'm curious with the following statement

Once a disk has been successfully pre-cleared, you can "quickly" add it to your array by following these steps:

Why do i need to add it quickly? I ask as I am not ready to add it as I want to power it down to install the 500G WD Black drive and do another preclear.

Side note - I went back to the main screen in unmenu and I'm seeing double and I'm not even drunk yet.

Why does this need to occur - see picture

All of my drives are duplicated??

Thanks again

Kosti

smart4TBHDD.zip

syslog_mod.txt

Quote

January 1, 201511 yr

OK i'm curious with the following statement

Once a disk has been successfully pre-cleared, you can "quickly" add it to your array by following these steps:

Why do i need to add it quickly? I ask as I am not ready to add it as I want to power it down to install the 500G WD Black drive and do another preclear.

If you have pre-cleared a disk, then when you add it to a parity protected array you only have to take the array down for a minute or so while you stop the array; add the disk; and then restart the array. Since a pre-cleared disk has been zeroised unRAID does not need to take any action to keep parity valid. If you have not pre-cleared the disk then the array will be offline while unRAID zeroises the disk by doing the preclear itself (needed to keep parity valid) which can take many hours (with actual time depending on disk size)

Quote

January 1, 201511 yr

Thanks itimpi, I wasn't sure (in fact still not sure) if it's ok then to power down the server and add in another drive - preclear that and then setup the array, I assume unRAID will see/read the zero's and do it's thing a lot quicker so it's save to turn it off, move the drives as I please and install/build the array once I'm happy with the setupof disc's right

Also forgive me for asking this in here, but just on adding this additional drive to be used as cache, it is better to be plugged into the MB SATA port or put this on the same HBA controller for speed performance?

Would the use of an SSD be more suitable for a cache drive or will the constant writing to this drive kill it before it's normal usage life time?

Many Thanks

Kosti

Quote

January 1, 201511 yr

Happy New Year everyone. My monthly parity check ran this morning popping up 3191 read errors. Here is the short SMART:

Attached to port: sdf
ID#	ATTRIBUTE NAME	FLAG	VALUE	WORST	THRESH	TYPE	UPDATED	FAILED	RAW VALUE
1	Raw Read Error Rate	0x002f	200	200	051	Pre-fail	Always	Never	11112
3	Spin Up Time	0x0027	180	178	021	Pre-fail	Always	Never	5991
4	Start Stop Count	0x0032	099	099	000	Old age	Always	Never	1393
5	Reallocated Sector Ct	0x0033	200	200	140	Pre-fail	Always	Never	0
7	Seek Error Rate	0x002e	100	253	000	Old age	Always	Never	0
9	Power On Hours	0x0032	079	079	000	Old age	Always	Never	15505
10	Spin Retry Count	0x0032	100	100	000	Old age	Always	Never	0
11	Calibration Retry Count	0x0032	100	253	000	Old age	Always	Never	0
12	Power Cycle Count	0x0032	100	100	000	Old age	Always	Never	15
192	Power-Off Retract Count	0x0032	200	200	000	Old age	Always	Never	7
193	Load Cycle Count	0x0032	182	182	000	Old age	Always	Never	55338
194	Temperature Celsius	0x0022	128	108	000	Old age	Always	Never	22
196	Reallocated Event Count	0x0032	200	200	000	Old age	Always	Never	0
197	Current Pending Sector	0x0032	200	200	000	Old age	Always	Never	45
198	Offline Uncorrectable	0x0030	200	200	000	Old age	Offline	Never	64
199	UDMA CRC Error Count	0x0032	200	193	000	Old age	Always	Never	16
200	Multi Zone Error Rate	0x0008	192	192	000	Old age	Offline	Never	3505

Syslog looks like:

Jan  1 10:33:54 Tower kernel: ata5.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan  1 10:33:54 Tower kernel: ata5.01: BMDMA stat 0x65
Jan  1 10:33:54 Tower kernel: ata5.01: failed command: READ DMA EXT
Jan  1 10:33:54 Tower kernel: ata5.01: cmd 25/00:00:f0:86:25/00:04:3c:01:00/f0 tag 0 dma 524288 in
Jan  1 10:33:54 Tower kernel:          res 51/40:cf:18:87:25/40:03:3c:01:00/f0 Emask 0x9 (media error)
Jan  1 10:33:54 Tower kernel: ata5.01: status: { DRDY ERR }
Jan  1 10:33:54 Tower kernel: ata5.01: error: { UNC }
Jan  1 10:33:54 Tower kernel: ata5.00: configured for UDMA/133
Jan  1 10:33:54 Tower kernel: ata5.01: configured for UDMA/33
Jan  1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf] Unhandled sense code
Jan  1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf]  
Jan  1 10:33:54 Tower kernel: Result: hostbyte=0x00 driverbyte=0x08
Jan  1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf]  
Jan  1 10:33:54 Tower kernel: Sense Key : 0x3 [current] [descriptor]
Jan  1 10:33:54 Tower kernel: Descriptor sense data with sense descriptors (in hex):
Jan  1 10:33:54 Tower kernel:         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 01 
Jan  1 10:33:54 Tower kernel:         3c 25 87 18 
Jan  1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf]  
Jan  1 10:33:54 Tower kernel: ASC=0x11 ASCQ=0x4
Jan  1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf] CDB: 
Jan  1 10:33:54 Tower kernel: cdb[0]=0x88: 88 00 00 00 00 01 3c 25 86 f0 00 00 04 00 00 00
Jan  1 10:33:54 Tower kernel: end_request: I/O error, dev sdf, sector 5304059672
Jan  1 10:33:54 Tower kernel: ata5: EH complete
Jan  1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059608
Jan  1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059616
Jan  1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059624
Jan  1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059632
Jan  1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059640
Jan  1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059648
Jan  1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059656
... repeated over and over

The Current Pending and Offline Uncorrectable are concerning to me, but the CRC Error is high also. Could this all be due to a cabling issue (I haven't changed cables in months and almost all of these reading are newly high)? Turns out the warranty expires in <1 month so I'm inclined to RMA is ASAP.

Thanks for any help.

BTW, the parity check log found 0 errors. Does that make any sense?

Anyone ever thought of a plugin that parses SMART data to give more newbie users thoughts on data safety? Waiting for a SMART failure seems like a long time. Something had charting of the numbers (unMenu?) which could be very useful as well.

Quote

January 2, 201511 yr

Thanks itimpi, I wasn't sure (in fact still not sure) if it's ok then to power down the server and add in another drive - preclear that and then setup the array, I assume unRAID will see/read the zero's and do it's thing a lot quicker so it's save to turn it off, move the drives as I please and install/build the array once I'm happy with the setupof disc's right

Also forgive me for asking this in here, but just on adding this additional drive to be used as cache, it is better to be plugged into the MB SATA port or put this on the same HBA controller for speed performance?

Would the use of an SSD be more suitable for a cache drive or will the constant writing to this drive kill it before it's normal usage life time?

Many Thanks

Kosti

Kosti-

When you preclear the drive, you can add it to the array "quickly" because all you do is stop the array (without powering down), assign the freshly precleared drive to the array, then restart the array. It will take unRAID just a couple of minutes to do its housekeeping, and you're back in business.

If you don't preclear the drive, then you stop the array, assign the new drive to the array, restart the array, then wait hours and hours for unRAID to write zeros to every byte on the disk. During this time, your array is inaccessible, so you can't read from or write to it at all during that time.

The "pre" clear process allows you to write the zeros to the new disk(s) while the array is still online, instead of doing it while it's offline.

NOTE: both of these scenarios presume that you've powered down the machine and physically installed the drive.

Quote

January 2, 201511 yr

Now, this is an older drive that spend a couple of years in a WinXP machine, then was migrated to my unRAID box as it was built, then started giving me lots and lots of errors. I've got all the data off of it and replaced the drive in the array a couple of days ago, then decided to re-run preclear to see if there were just some bad areas that could be mapped out or if it was really dead. Looks like it's really, really dead...

It's not looking too good for the old drive - lots of pending re-allocations, and the numbers were growing after every cycle. Time to retire the old gal?

preclear_rpt_JP2921HQ03M4KA_2015-01-01.txt

Quote

January 2, 201511 yr

Thanks itimpi, I wasn't sure (in fact still not sure) if it's ok then to power down the server and add in another drive - preclear that and then setup the array, I assume unRAID will see/read the zero's and do it's thing a lot quicker so it's save to turn it off, move the drives as I please and install/build the array once I'm happy with the setupof disc's right

Also forgive me for asking this in here, but just on adding this additional drive to be used as cache, it is better to be plugged into the MB SATA port or put this on the same HBA controller for speed performance?

Would the use of an SSD be more suitable for a cache drive or will the constant writing to this drive kill it before it's normal usage life time?

Many Thanks

Kosti

Kosti-

When you preclear the drive, you can add it to the array "quickly" because all you do is stop the array (without powering down), assign the freshly precleared drive to the array, then restart the array. It will take unRAID just a couple of minutes to do its housekeeping, and you're back in business.

If you don't preclear the drive, then you stop the array, assign the new drive to the array, restart the array, then wait hours and hours for unRAID to write zeros to every byte on the disk. During this time, your array is inaccessible, so you can't read from or write to it at all during that time.

The "pre" clear process allows you to write the zeros to the new disk(s) while the array is still online, instead of doing it while it's offline.

NOTE: both of these scenarios presume that you've powered down the machine and physically installed the drive.

Thanks for taking the time to provide further information

As I am starting a new build there is no array as yet

Also I am not sure why all my drives are showing up twice as in the picture above??

Cheers

Kosti

Quote

January 2, 201511 yr

Happy New Year everyone. My monthly parity check ran this morning popping up 3191 read errors. Here is the short SMART:

Attached to port: sdf
ID#	ATTRIBUTE NAME	FLAG	VALUE	WORST	THRESH	TYPE	UPDATED	FAILED	RAW VALUE
1	Raw Read Error Rate	0x002f	200	200	051	Pre-fail	Always	Never	11112
3	Spin Up Time	0x0027	180	178	021	Pre-fail	Always	Never	5991
4	Start Stop Count	0x0032	099	099	000	Old age	Always	Never	1393
5	Reallocated Sector Ct	0x0033	200	200	140	Pre-fail	Always	Never	0
7	Seek Error Rate	0x002e	100	253	000	Old age	Always	Never	0
9	Power On Hours	0x0032	079	079	000	Old age	Always	Never	15505
10	Spin Retry Count	0x0032	100	100	000	Old age	Always	Never	0
11	Calibration Retry Count	0x0032	100	253	000	Old age	Always	Never	0
12	Power Cycle Count	0x0032	100	100	000	Old age	Always	Never	15
192	Power-Off Retract Count	0x0032	200	200	000	Old age	Always	Never	7
193	Load Cycle Count	0x0032	182	182	000	Old age	Always	Never	55338
194	Temperature Celsius	0x0022	128	108	000	Old age	Always	Never	22
196	Reallocated Event Count	0x0032	200	200	000	Old age	Always	Never	0
197	Current Pending Sector	0x0032	200	200	000	Old age	Always	Never	45
198	Offline Uncorrectable	0x0030	200	200	000	Old age	Offline	Never	64
199	UDMA CRC Error Count	0x0032	200	193	000	Old age	Always	Never	16
200	Multi Zone Error Rate	0x0008	192	192	000	Old age	Offline	Never	3505

Syslog looks like:

Jan  1 10:33:54 Tower kernel: ata5.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan  1 10:33:54 Tower kernel: ata5.01: BMDMA stat 0x65
Jan  1 10:33:54 Tower kernel: ata5.01: failed command: READ DMA EXT
Jan  1 10:33:54 Tower kernel: ata5.01: cmd 25/00:00:f0:86:25/00:04:3c:01:00/f0 tag 0 dma 524288 in
Jan  1 10:33:54 Tower kernel:          res 51/40:cf:18:87:25/40:03:3c:01:00/f0 Emask 0x9 (media error)
Jan  1 10:33:54 Tower kernel: ata5.01: status: { DRDY ERR }
Jan  1 10:33:54 Tower kernel: ata5.01: error: { UNC }
Jan  1 10:33:54 Tower kernel: ata5.00: configured for UDMA/133
Jan  1 10:33:54 Tower kernel: ata5.01: configured for UDMA/33
Jan  1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf] Unhandled sense code
Jan  1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf]  
Jan  1 10:33:54 Tower kernel: Result: hostbyte=0x00 driverbyte=0x08
Jan  1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf]  
Jan  1 10:33:54 Tower kernel: Sense Key : 0x3 [current] [descriptor]
Jan  1 10:33:54 Tower kernel: Descriptor sense data with sense descriptors (in hex):
Jan  1 10:33:54 Tower kernel:         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 01 
Jan  1 10:33:54 Tower kernel:         3c 25 87 18 
Jan  1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf]  
Jan  1 10:33:54 Tower kernel: ASC=0x11 ASCQ=0x4
Jan  1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf] CDB: 
Jan  1 10:33:54 Tower kernel: cdb[0]=0x88: 88 00 00 00 00 01 3c 25 86 f0 00 00 04 00 00 00
Jan  1 10:33:54 Tower kernel: end_request: I/O error, dev sdf, sector 5304059672
Jan  1 10:33:54 Tower kernel: ata5: EH complete
Jan  1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059608
Jan  1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059616
Jan  1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059624
Jan  1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059632
Jan  1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059640
Jan  1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059648
Jan  1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059656
... repeated over and over

The Current Pending and Offline Uncorrectable are concerning to me, but the CRC Error is high also. Could this all be due to a cabling issue (I haven't changed cables in months and almost all of these reading are newly high)? Turns out the warranty expires in <1 month so I'm inclined to RMA is ASAP.

Thanks for any help.

BTW, the parity check log found 0 errors. Does that make any sense?

Anyone ever thought of a plugin that parses SMART data to give more newbie users thoughts on data safety? Waiting for a SMART failure seems like a long time. Something had charting of the numbers (unMenu?) which could be very useful as well.

Just to update. Re-ran parity check. ~another 800 read errors but no sync errors. New short SMART shows:

1	Raw Read Error Rate	0x002f	200	200	051	Pre-fail	Always	Never	11982
3	Spin Up Time	0x0027	180	178	021	Pre-fail	Always	Never	6000
4	Start Stop Count	0x0032	099	099	000	Old age	Always	Never	1394
5	Reallocated Sector Ct	0x0033	200	200	140	Pre-fail	Always	Never	0
7	Seek Error Rate	0x002e	100	253	000	Old age	Always	Never	0
9	Power On Hours	0x0032	079	079	000	Old age	Always	Never	15525
10	Spin Retry Count	0x0032	100	100	000	Old age	Always	Never	0
11	Calibration Retry Count	0x0032	100	253	000	Old age	Always	Never	0
12	Power Cycle Count	0x0032	100	100	000	Old age	Always	Never	15
192	Power-Off Retract Count	0x0032	200	200	000	Old age	Always	Never	7
193	Load Cycle Count	0x0032	182	182	000	Old age	Always	Never	55343
194	Temperature Celsius	0x0022	129	108	000	Old age	Always	Never	21
196	Reallocated Event Count	0x0032	200	200	000	Old age	Always	Never	0
197	Current Pending Sector	0x0032	200	200	000	Old age	Always	Never	37
198	Offline Uncorrectable	0x0030	200	200	000	Old age	Offline	Never	64
199	UDMA CRC Error Count	0x0032	200	193	000	Old age	Always	Never	16
200	Multi Zone Error Rate	0x0008	192	192	000	Old age	Offline	Never	3505

So Pending went down but Uncorrectable and CRC stayed constant. My thought is it is flaky but probably not enough for them to replace. Suggestions? This is my most used disk so would you replace it anyway?

Thanks

Quote

January 2, 201511 yr

Just to update. Re-ran parity check. ~another 800 read errors but no sync errors. New short SMART shows:

1	Raw Read Error Rate	0x002f	200	200	051	Pre-fail	Always	Never	11982
3	Spin Up Time	0x0027	180	178	021	Pre-fail	Always	Never	6000
4	Start Stop Count	0x0032	099	099	000	Old age	Always	Never	1394
5	Reallocated Sector Ct	0x0033	200	200	140	Pre-fail	Always	Never	0
7	Seek Error Rate	0x002e	100	253	000	Old age	Always	Never	0
9	Power On Hours	0x0032	079	079	000	Old age	Always	Never	15525
10	Spin Retry Count	0x0032	100	100	000	Old age	Always	Never	0
11	Calibration Retry Count	0x0032	100	253	000	Old age	Always	Never	0
12	Power Cycle Count	0x0032	100	100	000	Old age	Always	Never	15
192	Power-Off Retract Count	0x0032	200	200	000	Old age	Always	Never	7
193	Load Cycle Count	0x0032	182	182	000	Old age	Always	Never	55343
194	Temperature Celsius	0x0022	129	108	000	Old age	Always	Never	21
196	Reallocated Event Count	0x0032	200	200	000	Old age	Always	Never	0
197	Current Pending Sector	0x0032	200	200	000	Old age	Always	Never	37
198	Offline Uncorrectable	0x0030	200	200	000	Old age	Offline	Never	64
199	UDMA CRC Error Count	0x0032	200	193	000	Old age	Always	Never	16
200	Multi Zone Error Rate	0x0008	192	192	000	Old age	Offline	Never	3505

So Pending went down but Uncorrectable and CRC stayed constant. My thought is it is flaky but probably not enough for them to replace. Suggestions? This is my most used disk so would you replace it anyway?

Thanks

I recently had a very similar occurrence with parity drive (ST4000D) throwing off a huge number of read errors, but no sync errors during parity check. I replaced the drive and ran multiple preclear cycles on it. Although pending sectors decreased with each preclear cycle, they never went to zero. Reallocated sector count continued to increase. After the preclear cycles I ran a full badblock cycle, expecting to find a lot of problems, but interestingly, it found no errors at all. Like you, I'm not sure if the drive was really bad, but since it was still under warranty, I decided to RMA it.

Quote

January 2, 201511 yr

Just to update. Re-ran parity check. ~another 800 read errors but no sync errors. New short SMART shows:
1	Raw Read Error Rate	0x002f	200	200	051	Pre-fail	Always	Never	11982
3	Spin Up Time	0x0027	180	178	021	Pre-fail	Always	Never	6000
4	Start Stop Count	0x0032	099	099	000	Old age	Always	Never	1394
5	Reallocated Sector Ct	0x0033	200	200	140	Pre-fail	Always	Never	0
7	Seek Error Rate	0x002e	100	253	000	Old age	Always	Never	0
9	Power On Hours	0x0032	079	079	000	Old age	Always	Never	15525
10	Spin Retry Count	0x0032	100	100	000	Old age	Always	Never	0
11	Calibration Retry Count	0x0032	100	253	000	Old age	Always	Never	0
12	Power Cycle Count	0x0032	100	100	000	Old age	Always	Never	15
192	Power-Off Retract Count	0x0032	200	200	000	Old age	Always	Never	7
193	Load Cycle Count	0x0032	182	182	000	Old age	Always	Never	55343
194	Temperature Celsius	0x0022	129	108	000	Old age	Always	Never	21
196	Reallocated Event Count	0x0032	200	200	000	Old age	Always	Never	0
197	Current Pending Sector	0x0032	200	200	000	Old age	Always	Never	37
198	Offline Uncorrectable	0x0030	200	200	000	Old age	Offline	Never	64
199	UDMA CRC Error Count	0x0032	200	193	000	Old age	Always	Never	16
200	Multi Zone Error Rate	0x0008	192	192	000	Old age	Offline	Never	3505
So Pending went down but Uncorrectable and CRC stayed constant. My thought is it is flaky but probably not enough for them to replace. Suggestions? This is my most used disk so would you replace it anyway?

Thanks
I recently had a very similar occurrence with parity drive (ST4000D) throwing off a huge number of read errors, but no sync errors during parity check. I replaced the drive and ran multiple preclear cycles on it. Although pending sectors decreased with each preclear cycle, they never went to zero. Reallocated sector count continued to increase. After the preclear cycles I ran a full badblock cycle, expecting to find a lot of problems, but interestingly, it found no errors at all. Like you, I'm not sure if the drive was really bad, but since it was still under warranty, I decided to RMA it.

Thanks Raz. Did you have any issues with the RMA? I'll probably check it out with the WD diag tool but my guess is that it will check out fine. Since I don't feel good about the drive, I'll probably attempt an RMA (only have ~2 weeks left under warranty). Might as well try to get something back that I feel a bit more comfortable with. Even with a slight drop in Pending sectors, the uncorrectable sectors are still concerning to me.

Quote

January 2, 201511 yr

Finally completed the spare 500GB WD Blue (not Black as I thought) Drive I found in the draw - The plan is to use this is my cache, since I am not sure using the SDD as a cache is a good idea as it would most likely shorten it's life

Anyways here are the logs for the 500G, is it safe to use?

I guess asking about the duplication of HDD display in unmenu is not seen by anyone else so i should just ignore it right

Cheers

Kosti

syslog_mod_2015-01-03.txt

smart_start_sdg.txt

smart_finish_sdg.txt

Quote

January 3, 201511 yr

So, I just finished preclearing 2 drives. In both cases, the result was such as "Disk /dev/sdk has NOT been successfully precleared". I don't fully understand why. I tried adding one of the precleared drives to my array and there was no issue. I've attached the preclear logs.

I'm using preclear_disk.sh v1.15 on unRAID v6b12.

If I'm able to add the drives to my array with no problem, should I be Okay?

PL1311LAG14UWA.zip

WD-WMC5D0D0MDMK.zip

Quote

January 3, 201511 yr

So, I just finished preclearing 2 drives. In both cases, the result was such as "Disk /dev/sdk has NOT been successfully precleared". I don't fully understand why. I tried adding one of the precleared drives to my array and there was no issue. I've attached the preclear logs.

I'm using preclear_disk.sh v1.15 on unRAID v6b12.

If I'm able to add the drives to my array with no problem, should I be Okay?

Well... maybe... but maybe not.

The preclear report said:

== Disk /dev/sdi has NOT been successfully precleared

== Postread detected un-expected non-zero bytes on disk==

Basically, it wrote all zeros to the disk, but when it went to read them back to verify the write was successful, there were some bytes that were not zero.

That is a very bad thing, since you cannot rely on the disk to store your data accurately. Nor can you rebuild any other failed disk accurately if one were to fail.

I would try a parity check at this point. It might have an error or two as it expected all zeros...

Joe L.

Quote

January 3, 201511 yr

Finally completed the spare 500GB WD Blue (not Black as I thought) Drive I found in the draw - The plan is to use this is my cache, since I am not sure using the SDD as a cache is a good idea as it would most likely shorten it's life

Anyways here are the logs for the 500G, is it safe to use?

I guess asking about the duplication of HDD display in unmenu is not seen by anyone else so i should just ignore it right

http://i161.photobucket.com/albums/t214/Kostiz/unmenumain_zps1ba49511.png

Cheers

Kosti

Yeah just ignore it, it's normal, it's just the physical drive vs mounted volume or something.

Quote

January 3, 201511 yr

Cheers Traxxus, I have been advised of the same and to stick to one thread about the same question, I just got paranoid...I am currently running preclear on another 1TB HDD which I had a backup of a back up, so its redundant for now, so will end up sing this as my cache drive instead of the 500G as I feel it will be to slow for caching

Thanks again!

EDIT

My Samsung 1TB has finished the preclear - please let me know if this is safe to use as a cache drive as I noticed a few of these alarms in syslog prior to the preclear

Jan  4 04:16:45 Matrix kernel: end_request: critical medium error, dev sdg, sector 7100176

One last thing

I also notice the speed drops considerably, is this due to heat?

Samsung 1TB
Preclear Successful

... Total time 11:25:12

... Pre-Read time 3:14:40 (85 MB/s)

... Zeroing time 2:28:36 (112 MB/s)

... Post-Read time 5:40:48 (48 MB/s)

The 500GB was worse

Cheers

Kosti

syslog-2015-01-04.txt

smart_start_1TB_sdg.txt

smart_finish_1TB_sdg.txt

Quote

Preclear.sh results - Questions about your results? Post them here.

Featured Replies

Top Posters In This Topic

Popular Days

Most Popular Posts

Inolvidable

RobJ

binhex

Posted Images

Join the conversation

Top Posters In This Topic

Popular Days

Most Popular Posts

Inolvidable

RobJ

binhex

Posted Images

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)