Parity disk with read errors


Recommended Posts

So I woke up to this warning :

 

[10588352.295242] print_req_error: critical medium error, dev sdc, sector 11241728864

Googling that error comes up with just about all the possibilities of bad cable, bad drive, bad controller, and "everything is OK, just some spindown error." Ideally I'd like the last one to be true but I can't quite figure out how to tell from the diagnostics. Can someone who's more knowledgeable take a look? The drive in question is the Parity drive. It's technically still under WD warranty until December so if the drive is bad I can create an RMA for it and live without parity for a week while WD sends me a new drive (or I can use it as an excuse to pick up a 12TB drive)

 

arthur-diagnostics-20200731-0723.zip

Link to comment
Jul  5 00:00:01 Arthur kernel: md: recovery thread: check P ...
Jul  5 00:00:01 Arthur Plugin Auto Update: Checking for available plugin updates
Jul  5 00:00:02 Arthur kernel: md: recovery thread: P corrected, sector=0
Jul  5 00:00:02 Arthur kernel: md: recovery thread: P corrected, sector=8
Jul  5 00:00:02 Arthur kernel: md: recovery thread: P corrected, sector=16
Jul  5 00:00:02 Arthur kernel: md: recovery thread: P corrected, sector=24
Jul  5 00:00:02 Arthur kernel: md: recovery thread: P corrected, sector=32
Jul  5 00:00:02 Arthur kernel: md: recovery thread: P corrected, sector=40
Jul  5 00:00:02 Arthur kernel: md: recovery thread: P corrected, sector=128
Jul  5 00:00:02 Arthur kernel: md: recovery thread: P corrected, sector=136
Jul  5 00:00:02 Arthur kernel: md: recovery thread: P corrected, sector=144
Jul  5 00:00:02 Arthur kernel: md: recovery thread: P corrected, sector=152
Jul  5 00:00:02 Arthur kernel: md: recovery thread: P corrected, sector=160
Jul  5 00:00:02 Arthur kernel: md: recovery thread: P corrected, sector=168
Jul  5 00:00:02 Arthur kernel: md: recovery thread: P corrected, sector=176
Jul  5 00:00:02 Arthur kernel: md: recovery thread: P corrected, sector=184
Jul  5 00:00:03 Arthur Plugin Auto Update: unassigned.devices.plg version 2020.07.03a does not meet age requirements to update
Jul  5 00:00:03 Arthur Plugin Auto Update: Community Applications Plugin Auto Update finished
Jul  5 01:00:16 Arthur crond[1699]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Jul  5 02:15:21 Arthur kernel: md: recovery thread: P corrected, sector=2147483640
Jul  5 02:15:21 Arthur kernel: md: recovery thread: P corrected, sector=2147483648
Jul  5 02:15:21 Arthur kernel: md: recovery thread: P corrected, sector=2147483656
Jul  5 02:15:21 Arthur kernel: md: recovery thread: P corrected, sector=2147483664
Jul  5 02:15:21 Arthur kernel: md: recovery thread: P corrected, sector=2147483672
Jul  5 02:15:21 Arthur kernel: md: recovery thread: P corrected, sector=2147483680
Jul  5 04:03:01 Arthur root: /etc/libvirt: 924.2 MiB (969101312 bytes) trimmed on /dev/loop3
Jul  5 04:03:01 Arthur root: /var/lib/docker: 12.9 GiB (13809070080 bytes) trimmed on /dev/loop2
Jul  5 04:03:01 Arthur root: /mnt/cache: 1.2 TiB (1309510852608 bytes) trimmed on /dev/sdd1
Jul  5 04:40:01 Arthur apcupsd[9285]: apcupsd exiting, signal 15
Jul  5 04:40:01 Arthur apcupsd[9285]: apcupsd shutdown succeeded
Jul  5 04:40:04 Arthur apcupsd[15213]: apcupsd 3.14.14 (31 May 2016) slackware startup succeeded
Jul  5 04:40:04 Arthur apcupsd[15213]: NIS server startup succeeded
Jul  5 04:56:01 Arthur kernel: md: recovery thread: P corrected, sector=4294967280
Jul  5 04:56:01 Arthur kernel: md: recovery thread: P corrected, sector=4294967288
Jul  5 04:56:01 Arthur kernel: md: recovery thread: P corrected, sector=4294967296
Jul  5 04:56:01 Arthur kernel: md: recovery thread: P corrected, sector=4294967304
Jul  5 04:56:01 Arthur kernel: md: recovery thread: P corrected, sector=4294967312
Jul  5 04:56:01 Arthur kernel: md: recovery thread: P corrected, sector=4294967320
Jul  5 08:03:28 Arthur kernel: md: recovery thread: P corrected, sector=6442450920
Jul  5 08:03:28 Arthur kernel: md: recovery thread: P corrected, sector=6442450928
Jul  5 08:03:28 Arthur kernel: md: recovery thread: P corrected, sector=6442450936
Jul  5 08:03:28 Arthur kernel: md: recovery thread: P corrected, sector=6442450944
Jul  5 08:03:28 Arthur kernel: md: recovery thread: P corrected, sector=6442450952
Jul  5 08:03:28 Arthur kernel: md: recovery thread: P corrected, sector=6442450960
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934560
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934568
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934576
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934584
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934592
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934600
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934608
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934616
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934624
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934632
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934640
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934648
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934656
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934664
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934672
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934680
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934688
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934696
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934704
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934712
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934720
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934728
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934736
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934744
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934752
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934760
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934768
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934776
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934784
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934792
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934800
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934808
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934816
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934824
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934832
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934840
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934848
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934856
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934864
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934872
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934880
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934888
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934896
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934904
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934912
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934920
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934928
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934936
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934944
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934952
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934960
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934968
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934976
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934984
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589934992
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589935000
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589935008
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589935016
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589935024
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589935032
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589935040
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589935048
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589935056
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589935064
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589935072
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589935080
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589935088
Jul  5 10:01:24 Arthur kernel: md: recovery thread: P corrected, sector=8589935096
Jul  5 10:01:24 Arthur kernel: md: recovery thread: stopped logging

 

Link to comment

Hmm, extended test will take approximately 1149 minutes..... Instead of that I created an RMA and will ship it with 2 day air and hopefully get a replacement by the end of next week.

Thank you so much for all your help. Would you mind sharing which logs you looked into to tell the issue was most likely the drive itself?

Link to comment

Both on the syslog and on the SMART report error is consistent with a disk problem, but it's not definite, also sometimes they are intermittent, but SMART doesn't look very good, healthy WD drives should have 0 for Raw read error rate.

 

2 minutes ago, tomsliwowski said:

I should also probably ask, what's the best way to remove the parity drive and not have data loss while I'm awaiting a replacement?

Just unassign it and start the array, of course if a disk fails in the meantime...

Link to comment
8 minutes ago, johnnie.black said:

Both on the syslog and on the SMART report error is consistent with a disk problem, but it's not definite, also sometimes they are intermittent, but SMART doesn't look very good, healthy WD drives should have 0 for Raw read error rate.

 

Just unassign it and start the array, of course if a disk fails in the meantime...

Thanks for the explanation.

 

Yeah, I'm shipping the drive out this afternoon but thinking of picking up a 10 or 12 TB Easystore so I can have parity in the interim (and an additional 10TB when the RMA is complete).

Do you know if there is a bug in the unraid GUI when displaying this raw read error rate? I ask cause the SMART report in the diagnostics has a crazy high number but on the GUI it's 0. Is this some weird 16bit number overflow that makes it read as 0 as soon as 65536 is passed?

Edited by tomsliwowski
Link to comment
10 minutes ago, tomsliwowski said:

Do you know if there is a bug in the unraid GUI when displaying this raw read error rate? I ask cause the SMART report in the diagnostics has a crazy high number but on the GUI it's 0. Is this some weird 16bit number overflow that makes it read as 0 as soon as 65536 is passed?

Maybe, but that number on the SMART report can also be wrong, possibly a firmware issue.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.