Disk errors, what to do ?

April 6, 201511 yr

First time since I started unraid I got a notice of one of my disks having read errors. There has been no tinkering inside the box so I do not expect cabling issues.

disk3 (sdh) is showing 32 read errors

Did a short self test:

Disk 3 attached to port: sdh
Num	Test Description	Status	Remaining	LifeTime(hours)	LBA of first error
1	Short offline	Completed without error	00%	12646	None

Smart values look good:

Disk 3 attached to port: sdh
ID#	ATTRIBUTE NAME	FLAG	VALUE	WORST	THRESH	TYPE	UPDATED	FAILED	RAW VALUE
1	Raw Read Error Rate	0x002f	200	200	051	Pre-fail	Always	Never	0
3	Spin Up Time	0x0027	182	180	021	Pre-fail	Always	Never	7858
4	Start Stop Count	0x0032	099	099	000	Old age	Always	Never	1919
5	Reallocated Sector Ct	0x0033	200	200	140	Pre-fail	Always	Never	0
7	Seek Error Rate	0x002e	100	253	000	Old age	Always	Never	0
9	Power On Hours	0x0032	083	083	000	Old age	Always	Never	12646
10	Spin Retry Count	0x0032	100	100	000	Old age	Always	Never	0
11	Calibration Retry Count	0x0032	100	253	000	Old age	Always	Never	0
12	Power Cycle Count	0x0032	100	100	000	Old age	Always	Never	22
192	Power-Off Retract Count	0x0032	200	200	000	Old age	Always	Never	11
193	Load Cycle Count	0x0032	179	179	000	Old age	Always	Never	64401
194	Temperature Celsius	0x0022	122	110	000	Old age	Always	Never	30
196	Reallocated Event Count	0x0032	200	200	000	Old age	Always	Never	0
197	Current Pending Sector	0x0032	200	200	000	Old age	Always	Never	0
198	Offline Uncorrectable	0x0030	100	253	000	Old age	Offline	Never	0
199	UDMA CRC Error Count	0x0032	200	200	000	Old age	Always	Never	0
200	Multi Zone Error Rate	0x0008	100	253	000	Old age	Offline	Never	0

Appropriate syslog part:

Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928192
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928200
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928208
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928216
Apr  6 02:06:57 Tower kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928224
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928232
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928240
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928248
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928256
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928264
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928272
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928280
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928288
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928296
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928304
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928312
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928320
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928328
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928336
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928344
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928352
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928360
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928368
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928376
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928384
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928392
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928400
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928408
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928416
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928424
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928432
Apr  6 02:06:57 Tower kernel: md: disk3 read error, sector=5034928440

What should I do ?

Quote

April 6, 201511 yr

Attach a syslog. zip if needed.

Quote

April 7, 201511 yr

Author

Enclosed.. Learning is a good thing, if you can tell me what you get out of the syslog besides the stuff I allready incloded I would like to know !

syslog.zip

Quote

April 9, 201511 yr

The syslog doesn't reveal the cause in this case.

sas: Enter sas_scsi_recover_host busy: 1 failed: 1
Apr  6 02:06:57 Tower kernel: sas: ata13: end_device-8:0: cmd error handler
Apr  6 02:06:57 Tower kernel: sas: ata13: end_device-8:0: dev error handler
Apr  6 02:06:57 Tower kernel: ata13.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Apr  6 02:06:57 Tower kernel: ata13.00: failed command: READ DMA EXT
Apr  6 02:06:57 Tower kernel: ata13.00: cmd 25/00:00:80:e8:1a/00:01:2c:01:00/e0 tag 14 dma 131072 in
Apr  6 02:06:57 Tower kernel:         res 51/40:00:80:e8:1a/00:01:2c:01:00/e0 Emask 0x9 (media error)
Apr  6 02:06:57 Tower kernel: ata13.00: status: { DRDY ERR }
Apr  6 02:06:57 Tower kernel: ata13.00: error: { UNC }

All references to ata13.00 prior to this look normal. Typically a pending sector should result from a media error. UnRAID should then calculate the missing data from the other disks and then write the calculated data back to the faulty disk. This appears to have been successful; although, it would be nice if unRAID were more verbose in these cases.

Run a non-correcting parity check.

Quote

April 9, 201511 yr

Author

The syslog doesn't reveal the cause in this case.
sas: Enter sas_scsi_recover_host busy: 1 failed: 1
Apr  6 02:06:57 Tower kernel: sas: ata13: end_device-8:0: cmd error handler
Apr  6 02:06:57 Tower kernel: sas: ata13: end_device-8:0: dev error handler
Apr  6 02:06:57 Tower kernel: ata13.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Apr  6 02:06:57 Tower kernel: ata13.00: failed command: READ DMA EXT
Apr  6 02:06:57 Tower kernel: ata13.00: cmd 25/00:00:80:e8:1a/00:01:2c:01:00/e0 tag 14 dma 131072 in
Apr  6 02:06:57 Tower kernel:         res 51/40:00:80:e8:1a/00:01:2c:01:00/e0 Emask 0x9 (media error)
Apr  6 02:06:57 Tower kernel: ata13.00: status: { DRDY ERR }
Apr  6 02:06:57 Tower kernel: ata13.00: error: { UNC }
All references to ata13.00 prior to this look normal. Typically a pending sector should result from a media error. UnRAID should then calculate the missing data from the other disks and then write the calculated data back to the faulty disk. This appears to have been successful; although, it would be nice if unRAID were more verbose in these cases.

Run a non-correcting parity check.

Thanks!!

Quote

Disk errors, what to do ?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)