Two drive failures coming soon?

Bungy · March 3, 2015

I have two drive currently in my array that may need some attention. I just upgraded to unraid 6 beta16b and turned on emailing smart report changes. Last night I received a report about two different drives that have not failed and never red-balled, but they may be showing signs of future failure and I wanted to get a second opinion.

Disk 3:

My concerns are with Current Pending Sector and Offline Uncorrectable.

Disk 3 attached to port: sdl
ID#	ATTRIBUTE NAME	FLAG	VALUE	WORST	THRESH	TYPE	UPDATED	FAILED	RAW VALUE
1	Raw Read Error Rate	0x002f	200	200	051	Pre-fail	Always	Never	0
3	Spin Up Time	0x0027	164	159	021	Pre-fail	Always	Never	6783
4	Start Stop Count	0x0032	096	096	000	Old age	Always	Never	4318
5	Reallocated Sector Ct	0x0033	200	200	140	Pre-fail	Always	Never	0
7	Seek Error Rate	0x002e	200	200	000	Old age	Always	Never	0
9	Power On Hours	0x0032	048	048	000	Old age	Always	Never	38032
10	Spin Retry Count	0x0032	100	100	000	Old age	Always	Never	0
11	Calibration Retry Count	0x0032	100	100	000	Old age	Always	Never	0
12	Power Cycle Count	0x0032	100	100	000	Old age	Always	Never	235
192	Power-Off Retract Count	0x0032	200	200	000	Old age	Always	Never	135
193	Load Cycle Count	0x0032	074	074	000	Old age	Always	Never	379826
194	Temperature Celsius	0x0022	120	098	000	Old age	Always	Never	30
196	Reallocated Event Count	0x0032	200	200	000	Old age	Always	Never	0
197	Current Pending Sector	0x0032	200	200	000	Old age	Always	Never	20
198	Offline Uncorrectable	0x0030	200	200	000	Old age	Offline	Never	3
199	UDMA CRC Error Count	0x0032	200	200	000	Old age	Always	Never	0
200	Multi Zone Error Rate	0x0008	200	200	000	Old age	Offline	Never	25

Disk 10:

My concerns are with Reported Uncorrect, Command Timeout, and Current Pending Sector.

It seems the command timeout may be due to a bad cable, but I haven't had a chance to get to the cable to check this.

Disk 10 attached to port: sdh
ID#	ATTRIBUTE NAME	FLAG	VALUE	WORST	THRESH	TYPE	UPDATED	FAILED	RAW VALUE
1	Raw Read Error Rate	0x002f	200	199	051	Pre-fail	Always	Never	0
3	Spin Up Time	0x0027	170	168	021	Pre-fail	Always	Never	6466
4	Start Stop Count	0x0032	097	097	000	Old age	Always	Never	3756
5	Reallocated Sector Ct	0x0033	200	200	140	Pre-fail	Always	Never	0
7	Seek Error Rate	0x002f	100	253	051	Pre-fail	Always	Never	0
9	Power On Hours	0x0032	060	060	000	Old age	Always	Never	29646
10	Spin Retry Count	0x0033	100	100	051	Pre-fail	Always	Never	0
11	Calibration Retry Count	0x0032	100	100	000	Old age	Always	Never	0
12	Power Cycle Count	0x0032	100	100	000	Old age	Always	Never	142
184	End-to-End Error	0x0033	100	100	097	Pre-fail	Always	Never	0
187	Reported Uncorrect	0x0032	100	099	000	Old age	Always	Never	1
188	Command Timeout	0x0032	100	099	000	Old age	Always	Never	4295032834
190	Airflow Temperature Cel	0x0022	067	049	040	Old age	Always	Never	33 (Min/Max 24/37)
192	Power-Off Retract Count	0x0032	200	200	000	Old age	Always	Never	69
193	Load Cycle Count	0x0032	199	199	000	Old age	Always	Never	3686
196	Reallocated Event Count	0x0032	200	200	000	Old age	Always	Never	0
197	Current Pending Sector	0x0032	200	200	000	Old age	Always	Never	1
198	Offline Uncorrectable	0x0030	200	200	000	Old age	Offline	Never	0
199	UDMA CRC Error Count	0x0032	200	200	000	Old age	Always	Never	0
200	Multi Zone Error Rate	0x0008	200	200	000	Old age	Offline	Never	0

I'm currently pulling all of the data off of both drives onto a backup drive. My question is how big of a problem are these attributes? I'm guessing I can preclear disk3 to clear the pending sectors, but the offline uncorrectable is a concern. I doubt I can add it back to the array and have peace of mind with the potential for the offline uncorrectable to increase. I also don't think I can trust disk 10 with the large number of command timeouts. If any of you have any thoughts, I would love a second opinion.

WeeboTech · March 3, 2015

I would check cable on the drive with timeouts.

I would verify all data if possible on the drives with pending sectors.

Possibly move the data then clear the drive to clear out the pending sectors.

or retire them, the power on hours are pretty high.

You can read about the SMART attributes here.

http://en.wikipedia.org/wiki/S.M.A.R.T.

Frank1940 · March 3, 2015

If it were my server, I would be looking at getting a couple of new drives. One is over three years old and the other is over four. I, personally, am always proactive about getting any questionable drives out of my array and testing them later to see if they can be savaged...

Remember if you do have even one bad drive that can not be read, you have a fifty-fifty chance of data loss if you try to replace one of them and close to a 100% chance if have a problem with another drive. It is the potential failure/problem with second drive that worries me when I have a drive that is problematic. (I am a true believer in Murphy's Law.)

Bungy · March 3, 2015

Thanks for the responses. I'm glad to hear I'm not in a very desperate situation. I happen to have a 4TB drive that just came in the mail yesterday. My plan is to upgrade disk 3 (2TB) to 4TB and then copy my data off of disk 10 (2TB) to the new drive. Is there a way to remove disk 10 from the array once it's been copied to the 4tb drive? Could I write 0's to the partition so that the parity is updated, remove the drive, do a trust my array with the disk missing, and then of course do a parity sync at the end just to be sure?

JonathanM · March 3, 2015

Thanks for the responses. I'm glad to hear I'm not in a very desperate situation. I happen to have a 4TB drive that just came in the mail yesterday. My plan is to upgrade disk 3 (2TB) to 4TB and then copy my data off of disk 10 (2TB) to the new drive. Is there a way to remove disk 10 from the array once it's been copied to the 4tb drive? Could I write 0's to the partition so that the parity is updated, remove the drive, do a trust my array with the disk missing, and then of course do a parity sync at the end just to be sure?

Yes, if you dd zeroes to the md? device you wish to remove, you can do exactly what you want. Works quite smoothly, and would probably be a normal procedure if it wasn't for the high risk of accidentally nuking the wrong drive if you aren't careful or don't know what you are doing and blindly following a guide.

All this assumes your parity drive is currently 4TB or larger.

Bungy · March 3, 2015

Thanks for the help. When the copying is done, I'll run this command to clear out the disk before removing it from the array

dd if=/dev/zero of=/dev/mdX bs=2048k

I've been meaning to clean out the old WD greens. I'm just glad they lasted 3-4 years.

trurl · March 3, 2015

Thanks for the help. When the copying is done, I'll run this command to clear out the disk before removing it from the array
dd if=/dev/zero of=/dev/mdX bs=2048k
I've been meaning to clean out the old WD greens. I'm just glad they lasted 3-4 years.

If you zero the disk this way, you will still have valid parity if you remove the disk, but you will have to set a new config after you remove it and check the trust parity box.

Two drive failures coming soon?

Recommended Posts

Bungy

Link to comment

WeeboTech

Link to comment

Frank1940

Link to comment

Bungy

Link to comment

JonathanM

Link to comment

Bungy

Link to comment

trurl

Link to comment

Archived