[solved] parity drive failed shortly after data drive rebuild


twg

Recommended Posts

I recently had a data drive quit on me, at least Unraid said so, so I replaced it and it went thru a data rebuild. In the process, the server froze, so I rebooted it. It completed rebuilding the data drive and when it finished, I saw the following message:

 

Event: Unraid Parity sync / Data rebuild
Subject: Notice [TOWER] - Parity sync / Data rebuild finished (11640829 errors)
Description: Duration: 1 day, 6 minutes, 23 seconds. Average speed: 92.2 MB/s
Importance: warning

 

What does it mean when it lists all those errors, are those errors in the drive rebuild ? ie. there's bad data ?

 

When I check the drive log, I get a whole bunch of these:

 

Oct 18 20:45:35 Tower kernel: sd 13:0:5:0: [sdo] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Oct 18 20:45:35 Tower kernel: sd 13:0:5:0: [sdo] tag#0 CDB: opcode=0x88 88 00 00 00 00 03 9d f4 24 a0 00 00 02 00 00 00
Oct 18 20:45:35 Tower kernel: print_req_error: I/O error, dev sdo, sector 15534924960
Oct 18 20:45:35 Tower kernel: sd 13:0:5:0: [sdo] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Oct 18 20:45:35 Tower kernel: sd 13:0:5:0: [sdo] tag#1 CDB: opcode=0x88 88 00 00 00 00 03 9d f4 26 a0 00 00 02 00 00 00
Oct 18 20:45:35 Tower kernel: print_req_error: I/O error, dev sdo, sector 15534925472
Oct 18 20:45:35 Tower kernel: sd 13:0:5:0: [sdo] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Oct 18 20:45:35 Tower kernel: sd 13:0:5:0: [sdo] tag#2 CDB: opcode=0x88 88 00 00 00 00 03 9d f4 28 a0 00 00 02 00 00 00
Oct 18 20:45:35 Tower kernel: print_req_error: I/O error, dev sdo, sector 15534925984

 

I've attached the full drive log.

 

So I was getting some really weird issues, multiple drives would drop out on me, different drives everytime I reboot... 

 

open my server, seemed like some power cables were loose, so I replugged those in... still multiple drives failing on me, it seems like it's coming from one drive controller, the AOC-SASLP-MV8 controller... luckily I had a spare AOC-SAS2LP-MV8 controller, so I plugged that in... I see almost all of my drives... except my parity drive is not listed...

 

I hear a drive struggling to seek properly, and sure enough it's my parity drive... it seems my parity drive has died... 

 

Now I'm not sure what to do... did my original data drive rebuild properly ? considering the errors I got ? I still have the failed data drive... 

 

suggestions ? I have a spare drive I can replace the parity drive but hesitant to do anything at this point that may be permanent and damage my data...

 

help!!

 

 

Edited by twg
Link to comment

I put a new drive in to replace the failed parity drive and it finished rebuilding the parity drive. I decided to buy another parity and put 2 parity drives to cover myself... and within 2 hours of adding the 2nd parity drive, another one of my disk redballed. The relevent part of the log shows similar errors:

 

Oct 20 14:21:48 Tower emhttpd: shcmd (217): echo 128 > /sys/block/sdp/queue/nr_requests
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 82 30 e8 10 00 00 02 f8 00 00
Oct 20 18:09:01 Tower kernel: print_req_error: I/O error, dev sdp, sector 2184243216
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#1 CDB: opcode=0x88 88 00 00 00 00 00 82 30 e6 d0 00 00 01 40 00 00
Oct 20 18:09:01 Tower kernel: print_req_error: I/O error, dev sdp, sector 2184242896
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#2 CDB: opcode=0x88 88 00 00 00 00 00 82 30 e2 d0 00 00 04 00 00 00
Oct 20 18:09:01 Tower kernel: print_req_error: I/O error, dev sdp, sector 2184241872
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#3 CDB: opcode=0x88 88 00 00 00 00 00 82 30 e1 90 00 00 01 40 00 00
Oct 20 18:09:01 Tower kernel: print_req_error: I/O error, dev sdp, sector 2184241552
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#4 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#4 CDB: opcode=0x88 88 00 00 00 00 00 82 30 dd 90 00 00 04 00 00 00
Oct 20 18:09:01 Tower kernel: print_req_error: I/O error, dev sdp, sector 2184240528
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#5 CDB: opcode=0x88 88 00 00 00 00 00 82 30 dc 50 00 00 01 40 00 00
Oct 20 18:09:01 Tower kernel: print_req_error: I/O error, dev sdp, sector 2184240208
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#6 CDB: opcode=0x88 88 00 00 00 00 00 82 30 d8 50 00 00 04 00 00 00
Oct 20 18:09:01 Tower kernel: print_req_error: I/O error, dev sdp, sector 2184239184
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#7 CDB: opcode=0x88 88 00 00 00 00 00 82 30 d7 08 00 00 01 48 00 00
Oct 20 18:09:01 Tower kernel: print_req_error: I/O error, dev sdp, sector 2184238856
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#8 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#8 CDB: opcode=0x88 88 00 00 00 00 00 82 30 d3 08 00 00 04 00 00 00
Oct 20 18:09:01 Tower kernel: print_req_error: I/O error, dev sdp, sector 2184237832
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#9 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] tag#9 CDB: opcode=0x88 88 00 00 00 00 00 82 30 cf 08 00 00 04 00 00 00
Oct 20 18:09:01 Tower kernel: print_req_error: I/O error, dev sdp, sector 2184236808
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] Read Capacity(16) failed: Result: hostbyte=0x04 driverbyte=0x00
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] Sense not available.
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] Read Capacity(10) failed: Result: hostbyte=0x04 driverbyte=0x00
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] Sense not available.
Oct 20 18:09:01 Tower kernel: sd 13:0:5:0: [sdp] 0 512-byte logical blocks: (0 B/0 B)

 

I'm beginning to think the chances of 3 of my drives failing all within 1-2 days is too coincidental... there must be something else going on...

 

I've attached the output of my diagnostics

 

 

 

tower-diagnostics-20181020-1829.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.