taros14 Posted December 27, 2014 Share Posted December 27, 2014 Hello! I seem to be having a problem clearing new disks. I have added 2 new disks and ran the preclear script on both. When I query the disk with the -l to see if they are cleared already, it says they are cleared. When I stop the array and try to add the disks, it wants to clear them again. So I bit the bullet one day and ran the clear from the webgui. Took over 24 hours and then it said it still was not cleared. It stopped the array after the clear was done and then asked for it to be cleared again. I can add the disk as a cache drive, but not a normal drive. I am running 5.0.6 with preclear script 1.15 Any suggestions would be greatly appreciated! Great community unRaid has and never had any issue besides this Can you attach a zip file of all the preclear reports? I'm afraid I won't have time until evening, but someone else may. Hi Rob! Thanks for getting back to me. Attached are the pre_clear reports. Some are older. The drive I was trying to clear is /dev/sde Thanks again! preclear_reports.zip Quote Link to comment
RobJ Posted December 27, 2014 Share Posted December 27, 2014 Hello! I seem to be having a problem clearing new disks. I have added 2 new disks and ran the preclear script on both. When I query the disk with the -l to see if they are cleared already, it says they are cleared. When I stop the array and try to add the disks, it wants to clear them again. So I bit the bullet one day and ran the clear from the webgui. Took over 24 hours and then it said it still was not cleared. It stopped the array after the clear was done and then asked for it to be cleared again. I can add the disk as a cache drive, but not a normal drive. I am running 5.0.6 with preclear script 1.15 Any suggestions would be greatly appreciated! Great community unRaid has and never had any issue besides this Can you attach a zip file of all the preclear reports? I'm afraid I won't have time until evening, but someone else may. Hi Rob! Thanks for getting back to me. Attached are the pre_clear reports. Some are older. The drive I was trying to clear is /dev/sde No problems at all in any of the preclear reports. Drive is perfect, and correctly precleared. I'm out of ideas. DO you happen to have the syslog for the session where it rejected it? Perhaps a clue will be there. If no syslog from then, would it be possible to try adding it again, and grab the syslog after, and attach it here? Quote Link to comment
taros14 Posted December 29, 2014 Share Posted December 29, 2014 I will try that Rob. Though won't be until new year as away for the holidays. I'll post back the results when I have them. Thanks! Quote Link to comment
Kosti Posted December 30, 2014 Share Posted December 30, 2014 G'Day All! As a total Noob so taking baby steps and learning/reading as I go! So now that I've finally got my system up and running, I've installed unRAIDbeta6 via a vmdk on esxi O/S I kicked off the preclear on 5 drives simultaneously. I used the command preclear_disk.sh -r 65536 -w 65536 -b 2000 -A /dev/sdX where X=(a-e) for each of my 5 drives (3x3TB & 2X4TB WD REDs) what I didn't do is set for additional passes just only 1 pass. Now that I am 3 hours in with another 4 or so hours remaining, should I run another 2 passes after it has finished or does that defeat the purpose of this soak in test? These are not new drives, so I am not sure what to expect the results to come back with as the plan is to use the 3x3TB as storage use the 1x4TB as Parity and the other 1x4TB as a parity/backup, then later increase the array as I buy more HDD (I've got room for upto 13 HDD) I will post the results of the single pass once complete, so figures Crossed! Cheers Kosti Quote Link to comment
itimpi Posted December 30, 2014 Share Posted December 30, 2014 Just a thought - those times sound remarkably fast for drives of the size you mention. Do you realize that there are 3 phases per pass, and that the first one is only about 25-30% of the elapsed time for all three passes. As to how many passes, as these are previously used drives and you do not suspect any problems then one pass should be fine as you are not trying for the initial 'burn-in' test to detect early life failures. However look carefully at the final results to check that no errors are indicated and that there are no pending sectors (or a large number of reallocated sectors). Quote Link to comment
Traxxus Posted December 30, 2014 Share Posted December 30, 2014 It's going to take around 46-50 hours for a single pass on a 4TB drive. 5 hours of 9 is not the entire preclear, just the first stage. Pre read, then writes zeros, then post read (which takes twice as long as the pre read just FYI). Some people do multiple passes, and some have reported some drives that pass the first and fail the second, or third, but I think it's a little overkill, especially with the large drive sizes we have and how long it takes. Running 3 passes on a 6TB drive would take over a week (1 is ~61 hours IIRC), with constant random seeking, that's a lot of (perhaps unnecessary) wear. Someone here made an analogy once, something along the lines of it's akin to driving a car cross country just to see if it will make the 20 minute commute to work, here. Something smaller than 4 TB I would do multiple passes, above that I do one. Up to you really. Also, it's not a bad idea to do a long SMART test after the array is up and running, just to have a good baseline recorded. I also try to do long smart tests once a month just before the scheduled parity check. Here is a script Joe provided to generate smart results automatically with dates in the filename. Quote Link to comment
Kosti Posted December 30, 2014 Share Posted December 30, 2014 Just a thought - those times sound remarkably fast for drives of the size you mention. Do you realize that there are 3 phases per pass, and that the first one is only about 25-30% of the elapsed time for all three passes. As to how many passes, as these are previously used drives and you do not suspect any problems then one pass should be fine as you are not trying for the initial 'burn-in' test to detect early life failures. However look carefully at the final results to check that no errors are indicated and that there are no pending sectors (or a large number of reallocated sectors). Yep, thought it was a little fast too! Still going at 13+ hours, when looking at the drives, the 1st 3TB seems to be the slowest as it was the 1st one I kicked off and it's about 1-2% slower, I notices the speed shows it moving from 112MB-140MB in that area where as the others a llittle more 120-140MB/s. It's going to take around 46-50 hours for a single pass on a 4TB drive. 5 hours of 9 is not the entire preclear, just the first stage. Pre read, then writes zeros, then post read (which takes twice as long as the pre read just FYI). Some people do multiple passes, and some have reported some drives that pass the first and fail the second, or third, but I think it's a little overkill, especially with the large drive sizes we have and how long it takes. Running 3 passes on a 6TB drive would take over a week (1 is ~61 hours IIRC), with constant random seeking, that's a lot of (perhaps unnecessary) wear. Someone here made an analogy once, something along the lines of it's akin to driving a car cross country just to see if it will make the 20 minute commute to work, here. Something smaller than 4 TB I would do multiple passes, above that I do one. Up to you really. Also, it's not a bad idea to do a long SMART test after the array is up and running, just to have a good baseline recorded. I also try to do long smart tests once a month just before the scheduled parity check. Here is a script Joe provided to generate smart results automatically with dates in the filename. WOW I never realised it would be that log, checked power usage which is around 300Watts, todays temp will be around 25Deg C but tomorrow will be in the 30Deg C so it will get pretty warm in my house, no aircon. Well I may just go a mow the lawn since its a waiting game! Thanks for the feedback and I will look at generating the SMART results and get them up for inspection Cheers Kosti Quote Link to comment
FreeMan Posted December 31, 2014 Share Posted December 31, 2014 I don't have results to ask about yet, but what I do have is on cycle 2 of 3: = Post-Read in progress: 69% complete. ( 696,254,464,000 of 1,000,204,886,016 bytes read ) 1.2 MB/s Disk Temperature: 32C, Elapsed Time: 39:53:56 Note 40 hours to run less than 2 complete passes on a 1 TB drive!!! Note the speed of 1.2 MB/s!!! I noticed that on pass 1, as well, and it was down into the KB/s range. Now, this is an older drive that spend a couple of years in a WinXP machine, then was migrated to my unRAID box as it was built, then started giving me lots and lots of errors. I've got all the data off of it and replaced the drive in the array a couple of days ago, then decided to re-run preclear to see if there were just some bad areas that could be mapped out or if it was really dead. Looks like it's really, really dead... I looked to see if there were any intermediate logs (after each cycle?) generated by preclear, but it doesn't look like it. Any thoughts? EDIT: I'm getting a fair number of issues being reported in the syslog (attached) EDIT 2: It actually seems to have hung. It's been about 2 hours and nothing has changed in my telnet session - byte count read and elapsed time haven't changed one bit. EDIT 3: Nope, it's moving. It's at 73% complete now, and has sped up to a blazing 6.1 MB/s syslog-2014.12.31.1116.txt Quote Link to comment
taros14 Posted December 31, 2014 Share Posted December 31, 2014 Attached is my syslog after preclearing again. Your help is appreciated! syslog.txt.zip Quote Link to comment
Kosti Posted December 31, 2014 Share Posted December 31, 2014 Happy New Year All, wow 2015 already!! Wellit is in Sydney AUS. The 3 X 3TB have completed and I have posted the before and after results, I see nothing that stands out to suggest these are not good for data. Temps are a little high, so i will look at reversing the FANS on the caddys and have them blow are from back to front instead of sucking are in from front to back and see it that reduces them. Please let me know how these 3 drives look and I should have the 4TB results in about a few more hours Now I am not sure I have attached the correct syslog EDIT - found the correct syslog via unmenu Cheers Kosti SmartLogs.zip syslog-2015-01-01.zip Quote Link to comment
Kosti Posted January 1, 2015 Share Posted January 1, 2015 Hey All The 4TB drives finally finished Please see attached files and let me know if I'm good to go Side note - In the original preclear sticky being a noob I have been reading to understand the commands I needed to extract the file to post up and found one listing in there that didn't work for me until i changed it - A very simple typo error This thread http://lime-technology.com/forum/index.php?topic=2817.0 states They are also in /var/log/smart_start_sdX and /var/log/smart_finish_sdX so you can see them with your browser at: //tower/log/smart_start_sdX and //tower/smart_finish_sdX I assume in this syntax log locations was missing for the finish file - no big deal, but it took me a few minutes to figure out why i couldn't get the files, LOL Again, thanks for the support team, here are the logs for the 2 x 4TB HDD attached. Now on a side note, I was reading for a cache drive, i found a spare WD Black 500GB drive, should I use this or should I use a spare 600G SSD drive? I will need to run preclear on the 500G since it is an older drive so will power down the VM and insert it into the caddy as I am not sure if they are hotswap capable, so don't want to risk it Also do I add this 500G HDD to the onboard SATA or in the Caddy connected to my HBA card to get the best performance for the cache drive?? Cheers Kosti EDIT - OK i'm curious with the following statement Once a disk has been successfully pre-cleared, you can "quickly" add it to your array by following these steps: Why do i need to add it quickly? I ask as I am not ready to add it as I want to power it down to install the 500G WD Black drive and do another preclear. Side note - I went back to the main screen in unmenu and I'm seeing double and I'm not even drunk yet. Why does this need to occur - see picture All of my drives are duplicated?? Thanks again Kosti smart4TBHDD.zip syslog_mod.txt Quote Link to comment
itimpi Posted January 1, 2015 Share Posted January 1, 2015 OK i'm curious with the following statement Once a disk has been successfully pre-cleared, you can "quickly" add it to your array by following these steps: Why do i need to add it quickly? I ask as I am not ready to add it as I want to power it down to install the 500G WD Black drive and do another preclear. If you have pre-cleared a disk, then when you add it to a parity protected array you only have to take the array down for a minute or so while you stop the array; add the disk; and then restart the array. Since a pre-cleared disk has been zeroised unRAID does not need to take any action to keep parity valid. If you have not pre-cleared the disk then the array will be offline while unRAID zeroises the disk by doing the preclear itself (needed to keep parity valid) which can take many hours (with actual time depending on disk size) Quote Link to comment
Kosti Posted January 1, 2015 Share Posted January 1, 2015 Thanks itimpi, I wasn't sure (in fact still not sure) if it's ok then to power down the server and add in another drive - preclear that and then setup the array, I assume unRAID will see/read the zero's and do it's thing a lot quicker so it's save to turn it off, move the drives as I please and install/build the array once I'm happy with the setupof disc's right Also forgive me for asking this in here, but just on adding this additional drive to be used as cache, it is better to be plugged into the MB SATA port or put this on the same HBA controller for speed performance? Would the use of an SSD be more suitable for a cache drive or will the constant writing to this drive kill it before it's normal usage life time? Many Thanks Kosti Quote Link to comment
Loch Posted January 1, 2015 Share Posted January 1, 2015 Happy New Year everyone. My monthly parity check ran this morning popping up 3191 read errors. Here is the short SMART: Attached to port: sdf ID# ATTRIBUTE NAME FLAG VALUE WORST THRESH TYPE UPDATED FAILED RAW VALUE 1 Raw Read Error Rate 0x002f 200 200 051 Pre-fail Always Never 11112 3 Spin Up Time 0x0027 180 178 021 Pre-fail Always Never 5991 4 Start Stop Count 0x0032 099 099 000 Old age Always Never 1393 5 Reallocated Sector Ct 0x0033 200 200 140 Pre-fail Always Never 0 7 Seek Error Rate 0x002e 100 253 000 Old age Always Never 0 9 Power On Hours 0x0032 079 079 000 Old age Always Never 15505 10 Spin Retry Count 0x0032 100 100 000 Old age Always Never 0 11 Calibration Retry Count 0x0032 100 253 000 Old age Always Never 0 12 Power Cycle Count 0x0032 100 100 000 Old age Always Never 15 192 Power-Off Retract Count 0x0032 200 200 000 Old age Always Never 7 193 Load Cycle Count 0x0032 182 182 000 Old age Always Never 55338 194 Temperature Celsius 0x0022 128 108 000 Old age Always Never 22 196 Reallocated Event Count 0x0032 200 200 000 Old age Always Never 0 197 Current Pending Sector 0x0032 200 200 000 Old age Always Never 45 198 Offline Uncorrectable 0x0030 200 200 000 Old age Offline Never 64 199 UDMA CRC Error Count 0x0032 200 193 000 Old age Always Never 16 200 Multi Zone Error Rate 0x0008 192 192 000 Old age Offline Never 3505 Syslog looks like: Jan 1 10:33:54 Tower kernel: ata5.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jan 1 10:33:54 Tower kernel: ata5.01: BMDMA stat 0x65 Jan 1 10:33:54 Tower kernel: ata5.01: failed command: READ DMA EXT Jan 1 10:33:54 Tower kernel: ata5.01: cmd 25/00:00:f0:86:25/00:04:3c:01:00/f0 tag 0 dma 524288 in Jan 1 10:33:54 Tower kernel: res 51/40:cf:18:87:25/40:03:3c:01:00/f0 Emask 0x9 (media error) Jan 1 10:33:54 Tower kernel: ata5.01: status: { DRDY ERR } Jan 1 10:33:54 Tower kernel: ata5.01: error: { UNC } Jan 1 10:33:54 Tower kernel: ata5.00: configured for UDMA/133 Jan 1 10:33:54 Tower kernel: ata5.01: configured for UDMA/33 Jan 1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf] Unhandled sense code Jan 1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf] Jan 1 10:33:54 Tower kernel: Result: hostbyte=0x00 driverbyte=0x08 Jan 1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf] Jan 1 10:33:54 Tower kernel: Sense Key : 0x3 [current] [descriptor] Jan 1 10:33:54 Tower kernel: Descriptor sense data with sense descriptors (in hex): Jan 1 10:33:54 Tower kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 01 Jan 1 10:33:54 Tower kernel: 3c 25 87 18 Jan 1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf] Jan 1 10:33:54 Tower kernel: ASC=0x11 ASCQ=0x4 Jan 1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf] CDB: Jan 1 10:33:54 Tower kernel: cdb[0]=0x88: 88 00 00 00 00 01 3c 25 86 f0 00 00 04 00 00 00 Jan 1 10:33:54 Tower kernel: end_request: I/O error, dev sdf, sector 5304059672 Jan 1 10:33:54 Tower kernel: ata5: EH complete Jan 1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059608 Jan 1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059616 Jan 1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059624 Jan 1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059632 Jan 1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059640 Jan 1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059648 Jan 1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059656 ... repeated over and over The Current Pending and Offline Uncorrectable are concerning to me, but the CRC Error is high also. Could this all be due to a cabling issue (I haven't changed cables in months and almost all of these reading are newly high)? Turns out the warranty expires in <1 month so I'm inclined to RMA is ASAP. Thanks for any help. BTW, the parity check log found 0 errors. Does that make any sense? Anyone ever thought of a plugin that parses SMART data to give more newbie users thoughts on data safety? Waiting for a SMART failure seems like a long time. Something had charting of the numbers (unMenu?) which could be very useful as well. Quote Link to comment
FreeMan Posted January 2, 2015 Share Posted January 2, 2015 Thanks itimpi, I wasn't sure (in fact still not sure) if it's ok then to power down the server and add in another drive - preclear that and then setup the array, I assume unRAID will see/read the zero's and do it's thing a lot quicker so it's save to turn it off, move the drives as I please and install/build the array once I'm happy with the setupof disc's right Also forgive me for asking this in here, but just on adding this additional drive to be used as cache, it is better to be plugged into the MB SATA port or put this on the same HBA controller for speed performance? Would the use of an SSD be more suitable for a cache drive or will the constant writing to this drive kill it before it's normal usage life time? Many Thanks Kosti Kosti- When you preclear the drive, you can add it to the array "quickly" because all you do is stop the array (without powering down), assign the freshly precleared drive to the array, then restart the array. It will take unRAID just a couple of minutes to do its housekeeping, and you're back in business. If you don't preclear the drive, then you stop the array, assign the new drive to the array, restart the array, then wait hours and hours for unRAID to write zeros to every byte on the disk. During this time, your array is inaccessible, so you can't read from or write to it at all during that time. The "pre" clear process allows you to write the zeros to the new disk(s) while the array is still online, instead of doing it while it's offline. NOTE: both of these scenarios presume that you've powered down the machine and physically installed the drive. Quote Link to comment
FreeMan Posted January 2, 2015 Share Posted January 2, 2015 Now, this is an older drive that spend a couple of years in a WinXP machine, then was migrated to my unRAID box as it was built, then started giving me lots and lots of errors. I've got all the data off of it and replaced the drive in the array a couple of days ago, then decided to re-run preclear to see if there were just some bad areas that could be mapped out or if it was really dead. Looks like it's really, really dead... It's not looking too good for the old drive - lots of pending re-allocations, and the numbers were growing after every cycle. Time to retire the old gal? preclear_rpt_JP2921HQ03M4KA_2015-01-01.txt Quote Link to comment
Kosti Posted January 2, 2015 Share Posted January 2, 2015 Thanks itimpi, I wasn't sure (in fact still not sure) if it's ok then to power down the server and add in another drive - preclear that and then setup the array, I assume unRAID will see/read the zero's and do it's thing a lot quicker so it's save to turn it off, move the drives as I please and install/build the array once I'm happy with the setupof disc's right Also forgive me for asking this in here, but just on adding this additional drive to be used as cache, it is better to be plugged into the MB SATA port or put this on the same HBA controller for speed performance? Would the use of an SSD be more suitable for a cache drive or will the constant writing to this drive kill it before it's normal usage life time? Many Thanks Kosti Kosti- When you preclear the drive, you can add it to the array "quickly" because all you do is stop the array (without powering down), assign the freshly precleared drive to the array, then restart the array. It will take unRAID just a couple of minutes to do its housekeeping, and you're back in business. If you don't preclear the drive, then you stop the array, assign the new drive to the array, restart the array, then wait hours and hours for unRAID to write zeros to every byte on the disk. During this time, your array is inaccessible, so you can't read from or write to it at all during that time. The "pre" clear process allows you to write the zeros to the new disk(s) while the array is still online, instead of doing it while it's offline. NOTE: both of these scenarios presume that you've powered down the machine and physically installed the drive. Thanks for taking the time to provide further information As I am starting a new build there is no array as yet Also I am not sure why all my drives are showing up twice as in the picture above?? Cheers Kosti Quote Link to comment
Loch Posted January 2, 2015 Share Posted January 2, 2015 Happy New Year everyone. My monthly parity check ran this morning popping up 3191 read errors. Here is the short SMART: Attached to port: sdf ID# ATTRIBUTE NAME FLAG VALUE WORST THRESH TYPE UPDATED FAILED RAW VALUE 1 Raw Read Error Rate 0x002f 200 200 051 Pre-fail Always Never 11112 3 Spin Up Time 0x0027 180 178 021 Pre-fail Always Never 5991 4 Start Stop Count 0x0032 099 099 000 Old age Always Never 1393 5 Reallocated Sector Ct 0x0033 200 200 140 Pre-fail Always Never 0 7 Seek Error Rate 0x002e 100 253 000 Old age Always Never 0 9 Power On Hours 0x0032 079 079 000 Old age Always Never 15505 10 Spin Retry Count 0x0032 100 100 000 Old age Always Never 0 11 Calibration Retry Count 0x0032 100 253 000 Old age Always Never 0 12 Power Cycle Count 0x0032 100 100 000 Old age Always Never 15 192 Power-Off Retract Count 0x0032 200 200 000 Old age Always Never 7 193 Load Cycle Count 0x0032 182 182 000 Old age Always Never 55338 194 Temperature Celsius 0x0022 128 108 000 Old age Always Never 22 196 Reallocated Event Count 0x0032 200 200 000 Old age Always Never 0 197 Current Pending Sector 0x0032 200 200 000 Old age Always Never 45 198 Offline Uncorrectable 0x0030 200 200 000 Old age Offline Never 64 199 UDMA CRC Error Count 0x0032 200 193 000 Old age Always Never 16 200 Multi Zone Error Rate 0x0008 192 192 000 Old age Offline Never 3505 Syslog looks like: Jan 1 10:33:54 Tower kernel: ata5.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jan 1 10:33:54 Tower kernel: ata5.01: BMDMA stat 0x65 Jan 1 10:33:54 Tower kernel: ata5.01: failed command: READ DMA EXT Jan 1 10:33:54 Tower kernel: ata5.01: cmd 25/00:00:f0:86:25/00:04:3c:01:00/f0 tag 0 dma 524288 in Jan 1 10:33:54 Tower kernel: res 51/40:cf:18:87:25/40:03:3c:01:00/f0 Emask 0x9 (media error) Jan 1 10:33:54 Tower kernel: ata5.01: status: { DRDY ERR } Jan 1 10:33:54 Tower kernel: ata5.01: error: { UNC } Jan 1 10:33:54 Tower kernel: ata5.00: configured for UDMA/133 Jan 1 10:33:54 Tower kernel: ata5.01: configured for UDMA/33 Jan 1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf] Unhandled sense code Jan 1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf] Jan 1 10:33:54 Tower kernel: Result: hostbyte=0x00 driverbyte=0x08 Jan 1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf] Jan 1 10:33:54 Tower kernel: Sense Key : 0x3 [current] [descriptor] Jan 1 10:33:54 Tower kernel: Descriptor sense data with sense descriptors (in hex): Jan 1 10:33:54 Tower kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 01 Jan 1 10:33:54 Tower kernel: 3c 25 87 18 Jan 1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf] Jan 1 10:33:54 Tower kernel: ASC=0x11 ASCQ=0x4 Jan 1 10:33:54 Tower kernel: sd 5:0:1:0: [sdf] CDB: Jan 1 10:33:54 Tower kernel: cdb[0]=0x88: 88 00 00 00 00 01 3c 25 86 f0 00 00 04 00 00 00 Jan 1 10:33:54 Tower kernel: end_request: I/O error, dev sdf, sector 5304059672 Jan 1 10:33:54 Tower kernel: ata5: EH complete Jan 1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059608 Jan 1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059616 Jan 1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059624 Jan 1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059632 Jan 1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059640 Jan 1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059648 Jan 1 10:33:54 Tower kernel: md: disk1 read error, sector=5304059656 ... repeated over and over The Current Pending and Offline Uncorrectable are concerning to me, but the CRC Error is high also. Could this all be due to a cabling issue (I haven't changed cables in months and almost all of these reading are newly high)? Turns out the warranty expires in <1 month so I'm inclined to RMA is ASAP. Thanks for any help. BTW, the parity check log found 0 errors. Does that make any sense? Anyone ever thought of a plugin that parses SMART data to give more newbie users thoughts on data safety? Waiting for a SMART failure seems like a long time. Something had charting of the numbers (unMenu?) which could be very useful as well. Just to update. Re-ran parity check. ~another 800 read errors but no sync errors. New short SMART shows: 1 Raw Read Error Rate 0x002f 200 200 051 Pre-fail Always Never 11982 3 Spin Up Time 0x0027 180 178 021 Pre-fail Always Never 6000 4 Start Stop Count 0x0032 099 099 000 Old age Always Never 1394 5 Reallocated Sector Ct 0x0033 200 200 140 Pre-fail Always Never 0 7 Seek Error Rate 0x002e 100 253 000 Old age Always Never 0 9 Power On Hours 0x0032 079 079 000 Old age Always Never 15525 10 Spin Retry Count 0x0032 100 100 000 Old age Always Never 0 11 Calibration Retry Count 0x0032 100 253 000 Old age Always Never 0 12 Power Cycle Count 0x0032 100 100 000 Old age Always Never 15 192 Power-Off Retract Count 0x0032 200 200 000 Old age Always Never 7 193 Load Cycle Count 0x0032 182 182 000 Old age Always Never 55343 194 Temperature Celsius 0x0022 129 108 000 Old age Always Never 21 196 Reallocated Event Count 0x0032 200 200 000 Old age Always Never 0 197 Current Pending Sector 0x0032 200 200 000 Old age Always Never 37 198 Offline Uncorrectable 0x0030 200 200 000 Old age Offline Never 64 199 UDMA CRC Error Count 0x0032 200 193 000 Old age Always Never 16 200 Multi Zone Error Rate 0x0008 192 192 000 Old age Offline Never 3505 So Pending went down but Uncorrectable and CRC stayed constant. My thought is it is flaky but probably not enough for them to replace. Suggestions? This is my most used disk so would you replace it anyway? Thanks Quote Link to comment
razmajazz Posted January 2, 2015 Share Posted January 2, 2015 Just to update. Re-ran parity check. ~another 800 read errors but no sync errors. New short SMART shows: 1 Raw Read Error Rate 0x002f 200 200 051 Pre-fail Always Never 11982 3 Spin Up Time 0x0027 180 178 021 Pre-fail Always Never 6000 4 Start Stop Count 0x0032 099 099 000 Old age Always Never 1394 5 Reallocated Sector Ct 0x0033 200 200 140 Pre-fail Always Never 0 7 Seek Error Rate 0x002e 100 253 000 Old age Always Never 0 9 Power On Hours 0x0032 079 079 000 Old age Always Never 15525 10 Spin Retry Count 0x0032 100 100 000 Old age Always Never 0 11 Calibration Retry Count 0x0032 100 253 000 Old age Always Never 0 12 Power Cycle Count 0x0032 100 100 000 Old age Always Never 15 192 Power-Off Retract Count 0x0032 200 200 000 Old age Always Never 7 193 Load Cycle Count 0x0032 182 182 000 Old age Always Never 55343 194 Temperature Celsius 0x0022 129 108 000 Old age Always Never 21 196 Reallocated Event Count 0x0032 200 200 000 Old age Always Never 0 197 Current Pending Sector 0x0032 200 200 000 Old age Always Never 37 198 Offline Uncorrectable 0x0030 200 200 000 Old age Offline Never 64 199 UDMA CRC Error Count 0x0032 200 193 000 Old age Always Never 16 200 Multi Zone Error Rate 0x0008 192 192 000 Old age Offline Never 3505 So Pending went down but Uncorrectable and CRC stayed constant. My thought is it is flaky but probably not enough for them to replace. Suggestions? This is my most used disk so would you replace it anyway? Thanks I recently had a very similar occurrence with parity drive (ST4000D) throwing off a huge number of read errors, but no sync errors during parity check. I replaced the drive and ran multiple preclear cycles on it. Although pending sectors decreased with each preclear cycle, they never went to zero. Reallocated sector count continued to increase. After the preclear cycles I ran a full badblock cycle, expecting to find a lot of problems, but interestingly, it found no errors at all. Like you, I'm not sure if the drive was really bad, but since it was still under warranty, I decided to RMA it. Quote Link to comment
Loch Posted January 2, 2015 Share Posted January 2, 2015 Just to update. Re-ran parity check. ~another 800 read errors but no sync errors. New short SMART shows: 1 Raw Read Error Rate 0x002f 200 200 051 Pre-fail Always Never 11982 3 Spin Up Time 0x0027 180 178 021 Pre-fail Always Never 6000 4 Start Stop Count 0x0032 099 099 000 Old age Always Never 1394 5 Reallocated Sector Ct 0x0033 200 200 140 Pre-fail Always Never 0 7 Seek Error Rate 0x002e 100 253 000 Old age Always Never 0 9 Power On Hours 0x0032 079 079 000 Old age Always Never 15525 10 Spin Retry Count 0x0032 100 100 000 Old age Always Never 0 11 Calibration Retry Count 0x0032 100 253 000 Old age Always Never 0 12 Power Cycle Count 0x0032 100 100 000 Old age Always Never 15 192 Power-Off Retract Count 0x0032 200 200 000 Old age Always Never 7 193 Load Cycle Count 0x0032 182 182 000 Old age Always Never 55343 194 Temperature Celsius 0x0022 129 108 000 Old age Always Never 21 196 Reallocated Event Count 0x0032 200 200 000 Old age Always Never 0 197 Current Pending Sector 0x0032 200 200 000 Old age Always Never 37 198 Offline Uncorrectable 0x0030 200 200 000 Old age Offline Never 64 199 UDMA CRC Error Count 0x0032 200 193 000 Old age Always Never 16 200 Multi Zone Error Rate 0x0008 192 192 000 Old age Offline Never 3505 So Pending went down but Uncorrectable and CRC stayed constant. My thought is it is flaky but probably not enough for them to replace. Suggestions? This is my most used disk so would you replace it anyway? Thanks I recently had a very similar occurrence with parity drive (ST4000D) throwing off a huge number of read errors, but no sync errors during parity check. I replaced the drive and ran multiple preclear cycles on it. Although pending sectors decreased with each preclear cycle, they never went to zero. Reallocated sector count continued to increase. After the preclear cycles I ran a full badblock cycle, expecting to find a lot of problems, but interestingly, it found no errors at all. Like you, I'm not sure if the drive was really bad, but since it was still under warranty, I decided to RMA it. Thanks Raz. Did you have any issues with the RMA? I'll probably check it out with the WD diag tool but my guess is that it will check out fine. Since I don't feel good about the drive, I'll probably attempt an RMA (only have ~2 weeks left under warranty). Might as well try to get something back that I feel a bit more comfortable with. Even with a slight drop in Pending sectors, the uncorrectable sectors are still concerning to me. Quote Link to comment
Kosti Posted January 2, 2015 Share Posted January 2, 2015 Finally completed the spare 500GB WD Blue (not Black as I thought) Drive I found in the draw - The plan is to use this is my cache, since I am not sure using the SDD as a cache is a good idea as it would most likely shorten it's life Anyways here are the logs for the 500G, is it safe to use? I guess asking about the duplication of HDD display in unmenu is not seen by anyone else so i should just ignore it right Cheers Kosti syslog_mod_2015-01-03.txt smart_start_sdg.txt smart_finish_sdg.txt Quote Link to comment
JarDo Posted January 3, 2015 Share Posted January 3, 2015 So, I just finished preclearing 2 drives. In both cases, the result was such as "Disk /dev/sdk has NOT been successfully precleared". I don't fully understand why. I tried adding one of the precleared drives to my array and there was no issue. I've attached the preclear logs. I'm using preclear_disk.sh v1.15 on unRAID v6b12. If I'm able to add the drives to my array with no problem, should I be Okay? PL1311LAG14UWA.zip WD-WMC5D0D0MDMK.zip Quote Link to comment
Joe L. Posted January 3, 2015 Share Posted January 3, 2015 So, I just finished preclearing 2 drives. In both cases, the result was such as "Disk /dev/sdk has NOT been successfully precleared". I don't fully understand why. I tried adding one of the precleared drives to my array and there was no issue. I've attached the preclear logs. I'm using preclear_disk.sh v1.15 on unRAID v6b12. If I'm able to add the drives to my array with no problem, should I be Okay? Well... maybe... but maybe not. The preclear report said: == Disk /dev/sdi has NOT been successfully precleared == Postread detected un-expected non-zero bytes on disk== Basically, it wrote all zeros to the disk, but when it went to read them back to verify the write was successful, there were some bytes that were not zero. That is a very bad thing, since you cannot rely on the disk to store your data accurately. Nor can you rebuild any other failed disk accurately if one were to fail. I would try a parity check at this point. It might have an error or two as it expected all zeros... Joe L. Quote Link to comment
Traxxus Posted January 3, 2015 Share Posted January 3, 2015 Finally completed the spare 500GB WD Blue (not Black as I thought) Drive I found in the draw - The plan is to use this is my cache, since I am not sure using the SDD as a cache is a good idea as it would most likely shorten it's life Anyways here are the logs for the 500G, is it safe to use? I guess asking about the duplication of HDD display in unmenu is not seen by anyone else so i should just ignore it right http://i161.photobucket.com/albums/t214/Kostiz/unmenumain_zps1ba49511.png Cheers Kosti Yeah just ignore it, it's normal, it's just the physical drive vs mounted volume or something. Quote Link to comment
Kosti Posted January 3, 2015 Share Posted January 3, 2015 Cheers Traxxus, I have been advised of the same and to stick to one thread about the same question, I just got paranoid...I am currently running preclear on another 1TB HDD which I had a backup of a back up, so its redundant for now, so will end up sing this as my cache drive instead of the 500G as I feel it will be to slow for caching Thanks again! EDIT My Samsung 1TB has finished the preclear - please let me know if this is safe to use as a cache drive as I noticed a few of these alarms in syslog prior to the preclear Jan 4 04:16:45 Matrix kernel: end_request: critical medium error, dev sdg, sector 7100176 One last thing I also notice the speed drops considerably, is this due to heat? Samsung 1TB Preclear Successful ... Total time 11:25:12 ... Pre-Read time 3:14:40 (85 MB/s) ... Zeroing time 2:28:36 (112 MB/s) ... Post-Read time 5:40:48 (48 MB/s) The 500GB was worse Cheers Kosti syslog-2015-01-04.txt smart_start_1TB_sdg.txt smart_finish_1TB_sdg.txt Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.