Jump to content

[SOLVED] One Drive Write/Read Errors, One Drive Read Errors, One Parity Drive


Recommended Posts

Woke up this morning and found a red X next to Drive 11.

System log shows read and write errors on Drive 11 and enough read errors on Drive 7 to hit the 128MB log file limit.

I pulled the sys log, but I didn't get smart logs before shutting down.

After shutting down, I reseated the drives.

I restarted and ran short SMART scans on both drives. They passed.

I have not restarted the array.

I have a hot spare precleared and ready to replace one of these drives.

 

I'm not sure what to do next. Disk 11 might be bad or it might have just been a loose connection. Disk 7 may be no better, though it didn't have any write errors. If I rebuild now it might be fine, but I'd like to know what options I have before that.

I'm grateful for any help or advice you folks can provide.

 

Full system log attached. Here's a partial system log from before the shutdown (... denotes lines omitted):

Jan 23 13:09:44 OMNI kernel: microcode: microcode updated early to revision 0x21, date = 2019-02-13
...
Feb  8 04:54:49 OMNI CA Backup/Restore: Backup / Restore Completed
Feb  8 19:45:10 OMNI login[17876]: ROOT LOGIN  on '/dev/pts/0'
Feb  9 03:16:40 OMNI kernel: sd 9:0:2:0: device_block, handle(0x001e)
Feb  9 03:16:41 OMNI kernel: mpt2sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00)
...
Feb  9 03:16:41 OMNI kernel: mpt2sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00)
Feb  9 03:16:42 OMNI kernel: sd 9:0:2:0: device_unblock and setting to running, handle(0x001e)
Feb  9 03:16:42 OMNI kernel: sd 9:0:2:0: [sdd] Synchronizing SCSI cache
Feb  9 03:16:42 OMNI kernel: print_req_error: I/O error, dev sdd, sector 704672
Feb  9 03:16:42 OMNI kernel: md: disk11 read error, sector=704608
...
Feb  9 03:16:42 OMNI kernel: print_req_error: I/O error, dev sdd, sector 707296
Feb  9 03:16:42 OMNI kernel: md: disk11 read error, sector=681936
...
Feb  9 03:16:42 OMNI kernel: md: disk11 read error, sector=779416
Feb  9 03:16:42 OMNI kernel: scsi 9:0:2:0: rejecting I/O to dead device
Feb  9 03:16:42 OMNI kernel: md: disk11 read error, sector=745568
Feb  9 03:16:42 OMNI kernel: mpt2sas_cm0: removing handle(0x001e), sas_addr(0x4433221101000000)
Feb  9 03:16:42 OMNI kernel: mpt2sas_cm0: enclosure logical id(0x500605b001521880), slot(2) 
Feb  9 03:16:42 OMNI kernel: md: disk11 write error, sector=5407464512
...
Feb  9 03:16:42 OMNI kernel: md: disk11 write error, sector=5407464504
Feb  9 03:16:42 OMNI rc.diskinfo[7600]: SIGHUP received, forcing refresh of disks info.
Feb  9 03:16:42 OMNI kernel: md: disk11 write error, sector=565600
...
Feb  9 03:16:42 OMNI kernel: scsi 9:0:2:0: rejecting I/O to dead device
Feb  9 03:16:42 OMNI kernel: md: disk11 read error, sector=745568
Feb  9 03:16:42 OMNI kernel: mpt2sas_cm0: removing handle(0x001e), sas_addr(0x4433221101000000)
Feb  9 03:16:42 OMNI kernel: mpt2sas_cm0: enclosure logical id(0x500605b001521880), slot(2) 
Feb  9 03:16:42 OMNI kernel: md: disk11 write error, sector=779416
Feb  9 03:16:53 OMNI kernel: scsi 9:0:25:0: Direct-Access     ATA      WDC WD100EFAX-68 0A83 PQ: 0 ANSI: 6
Feb  9 03:16:53 OMNI kernel: scsi 9:0:25:0: SATA: handle(0x001e), sas_addr(0x4433221101000000), phy(1), device_name(0x0000000000000000)
Feb  9 03:16:53 OMNI kernel: scsi 9:0:25:0: enclosure logical id (0x500605b001521880), slot(2) 
Feb  9 03:16:53 OMNI kernel: scsi 9:0:25:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Feb  9 03:16:53 OMNI kernel: sd 9:0:25:0: Power-on or device reset occurred
Feb  9 03:16:53 OMNI kernel: sd 9:0:25:0: Attached scsi generic sg3 type 0
Feb  9 03:16:53 OMNI kernel: sd 9:0:25:0: [sdaa] 19532873728 512-byte logical blocks: (10.0 TB/9.10 TiB)
Feb  9 03:16:53 OMNI kernel: sd 9:0:25:0: [sdaa] 4096-byte physical blocks
Feb  9 03:16:54 OMNI kernel: sd 9:0:25:0: [sdaa] Write Protect is off
Feb  9 03:16:54 OMNI kernel: sd 9:0:25:0: [sdaa] Mode Sense: 7f 00 10 08
Feb  9 03:16:54 OMNI kernel: sd 9:0:25:0: [sdaa] Write cache: enabled, read cache: enabled, supports DPO and FUA
Feb  9 03:16:54 OMNI kernel: sdaa: sdaa1
Feb  9 03:16:54 OMNI kernel: sd 9:0:25:0: [sdaa] Attached SCSI disk
Feb  9 03:16:54 OMNI rc.diskinfo[7600]: SIGHUP received, forcing refresh of disks info.
Feb  9 03:16:54 OMNI kernel: BTRFS warning (device md11): duplicate device fsid:devid for aa876094-6b30-4aac-b367-aaf5a6d6fde8:1 old:/dev/md11 new:/dev/sdaa1
Feb  9 03:16:54 OMNI unassigned.devices: Disk with serial 'WDC_WD100EFAX-68LHPN0_XXXXXXXX', mountpoint 'WDC_WD100EFAX-68LHPN0_XXXXXXXX' is not set to auto mount.
Feb  9 03:21:45 OMNI kernel: mpt2sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00)
Feb  9 03:21:45 OMNI kernel: sd 9:0:1:0: [sdc] tag#2871 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb  9 03:21:45 OMNI kernel: sd 9:0:1:0: [sdc] tag#2871 Sense Key : 0x2 [current] 
Feb  9 03:21:45 OMNI kernel: sd 9:0:1:0: [sdc] tag#2871 ASC=0x4 ASCQ=0x0 
Feb  9 03:21:45 OMNI kernel: sd 9:0:1:0: [sdc] tag#2871 CDB: opcode=0x88 88 00 00 00 00 00 bd b5 9b c0 00 00 00 20 00 00
Feb  9 03:21:45 OMNI kernel: print_req_error: 120 callbacks suppressed
Feb  9 03:21:45 OMNI kernel: print_req_error: I/O error, dev sdc, sector 3182795712
Feb  9 03:21:45 OMNI kernel: md: disk7 read error, sector=3182795648
...
Feb  9 03:21:45 OMNI kernel: md: disk7 read error, sector=3182795672
Feb  9 03:21:45 OMNI kernel: BTRFS error (device md11): bdev /dev/md11 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
Feb  9 03:21:45 OMNI kernel: sd 9:0:1:0: [sdc] tag#2873 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb  9 03:21:45 OMNI kernel: sd 9:0:1:0: [sdc] tag#2873 Sense Key : 0x2 [current] 
Feb  9 03:21:45 OMNI kernel: sd 9:0:1:0: [sdc] tag#2873 ASC=0x4 ASCQ=0x0 
Feb  9 03:21:45 OMNI kernel: sd 9:0:1:0: [sdc] tag#2873 CDB: opcode=0x88 88 00 00 00 00 03 3c db 10 00 00 00 00 20 00 00
Feb  9 03:21:45 OMNI kernel: print_req_error: I/O error, dev sdc, sector 13905891328
Feb  9 03:21:45 OMNI kernel: md: disk7 read error, sector=13905891264
...
Feb  9 03:21:45 OMNI kernel: md: disk7 read error, sector=13905891288
Feb  9 03:21:45 OMNI kernel: BTRFS error (device md11): bdev /dev/md11 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
Feb  9 03:21:45 OMNI kernel: BTRFS error (device md11): error loading props for ino 76593 (root 5): -5
Feb  9 03:21:45 OMNI kernel: BTRFS error (device md11): bdev /dev/md11 errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
Feb  9 03:21:47 OMNI kernel: sd 9:0:1:0: [sdc] tag#2817 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb  9 03:21:47 OMNI kernel: sd 9:0:1:0: [sdc] tag#2817 Sense Key : 0x2 [current] 
Feb  9 03:21:47 OMNI kernel: sd 9:0:1:0: [sdc] tag#2817 ASC=0x4 ASCQ=0x0 
Feb  9 03:21:47 OMNI kernel: sd 9:0:1:0: [sdc] tag#2817 CDB: opcode=0x88 88 00 00 00 00 03 94 8f 18 a0 00 00 00 20 00 00
Feb  9 03:21:47 OMNI kernel: print_req_error: I/O error, dev sdc, sector 15377307808
Feb  9 03:21:47 OMNI kernel: md: disk7 read error, sector=15377307744
...
Feb  9 03:21:47 OMNI kernel: md: disk7 read error, sector=15377307768
Feb  9 03:21:47 OMNI kernel: BTRFS error (device md11): bdev /dev/md11 errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
Feb  9 03:21:48 OMNI kernel: sd 9:0:1:0: [sdc] tag#2823 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb  9 03:21:48 OMNI kernel: sd 9:0:1:0: [sdc] tag#2823 Sense Key : 0x2 [current] 
Feb  9 03:21:48 OMNI kernel: sd 9:0:1:0: [sdc] tag#2823 ASC=0x4 ASCQ=0x0 
Feb  9 03:21:48 OMNI kernel: sd 9:0:1:0: [sdc] tag#2823 CDB: opcode=0x88 88 00 00 00 00 02 55 da b9 e0 00 00 00 20 00 00
Feb  9 03:21:48 OMNI kernel: print_req_error: I/O error, dev sdc, sector 10030332384
Feb  9 03:21:48 OMNI kernel: md: disk7 read error, sector=10030332320
...
Feb  9 03:21:48 OMNI kernel: md: disk7 read error, sector=10030332344
Feb  9 03:21:48 OMNI kernel: BTRFS error (device md11): bdev /dev/md11 errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
Feb  9 03:21:51 OMNI kernel: sd 9:0:1:0: [sdc] tag#2846 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb  9 03:21:51 OMNI kernel: sd 9:0:1:0: [sdc] tag#2846 Sense Key : 0x2 [current] 
Feb  9 03:21:51 OMNI kernel: sd 9:0:1:0: [sdc] tag#2846 ASC=0x4 ASCQ=0x0 
Feb  9 03:21:51 OMNI kernel: sd 9:0:1:0: [sdc] tag#2846 CDB: opcode=0x88 88 00 00 00 00 00 3f c5 d2 40 00 00 00 20 00 00
Feb  9 03:21:51 OMNI kernel: print_req_error: I/O error, dev sdc, sector 1069929024
Feb  9 03:21:51 OMNI kernel: md: disk7 read error, sector=1069928960
...
Feb  9 03:21:51 OMNI kernel: md: disk7 read error, sector=1069928984
Feb  9 03:21:51 OMNI kernel: BTRFS error (device md11): bdev /dev/md11 errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
...
Feb  9 03:21:51 OMNI kernel: BTRFS error (device md11): bdev /dev/md11 errs: wr 0, rd 15, flush 0, corrupt 0, gen 0
Feb  9 03:21:54 OMNI kernel: sd 9:0:1:0: Power-on or device reset occurred
Feb  9 03:21:55 OMNI rc.diskinfo[7600]: SIGHUP received, forcing refresh of disks info.
Feb  9 03:22:19 OMNI kernel: sd 9:0:1:0: device_block, handle(0x001d)
Feb  9 03:22:20 OMNI kernel: mpt2sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00)
...
Feb  9 03:22:20 OMNI kernel: mpt2sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00)
Feb  9 03:22:21 OMNI kernel: sd 9:0:1:0: device_unblock and setting to running, handle(0x001d)
Feb  9 03:22:21 OMNI kernel: sd 9:0:1:0: [sdc] Synchronizing SCSI cache
Feb  9 03:22:21 OMNI kernel: print_req_error: I/O error, dev sdc, sector 8956304744
Feb  9 03:22:21 OMNI kernel: md: disk7 read error, sector=8956304680
...
Feb  9 03:22:21 OMNI kernel: md: disk7 read error, sector=8956304704
Feb  9 03:22:21 OMNI kernel: print_req_error: I/O error, dev sdc, sector 3896548872
...
Feb  9 03:22:21 OMNI kernel: md: disk7 read error, sector=3896548808
Feb  9 03:22:21 OMNI kernel: md: disk7 read error, sector=3896548832
Feb  9 03:22:21 OMNI kernel: print_req_error: I/O error, dev sdc, sector 5544903688
Feb  9 03:22:21 OMNI kernel: md: disk7 read error, sector=5544903624
...
Feb  9 03:22:30 OMNI kernel: md: disk7 read error, sector=12006235056
Feb  9 03:22:30 OMNI kernel: sd 9:0:26:0: [sdab] Attached SCSI disk
Feb  9 03:22:30 OMNI kernel: md: disk7 read error, sector=11945181264
...
Feb  9 03:41:50 OMNI kernel: BTRFS error (device md11): bdev /dev/md11 errs: wr 0, rd 3202, flush 0, corrupt 0, gen 0
Feb  9 03:41:50 OMNI kernel: md: disk7 read error, sector=15848823584
...(disk7 read errors continue until EOF)

 

SystemLogNoSerials.zip

Edited by elecgnosis
Link to comment

So I have two paths: Trust the disk's data (new config/re-sync parity) or trust the drive's condition (rebuild on top).

Regardless of which option I go with, if another drive goes bad during either operation, I will lose the contents of both drives.

 

If I go with rebuilding on top, would it be better to preclear the disk first? Is there any other way to validate the drive's condition?

Link to comment
39 minutes ago, elecgnosis said:

So I have two paths: Trust the disk's data (new config/re-sync parity) or trust the drive's condition (rebuild on top).

Third option, rebuild on a totally different drive and keep the dropped drive intact.

41 minutes ago, elecgnosis said:

Regardless of which option I go with, if another drive goes bad during either operation, I will lose the contents of both drives.

True, but if you use a new drive to rebuild on, at least you still have some hope of possible recovery from the excluded drive.

42 minutes ago, elecgnosis said:

If I go with rebuilding on top, would it be better to preclear the disk first?

No. A long smart test would be a good indicator of condition. No need to erase everything currently on the drive, even if it's partially corrupt, it still might be somewhat salvageable.

Link to comment
  • elecgnosis changed the title to [SOLVED] One Drive Write/Read Errors, One Drive Read Errors, One Parity Drive

I'm confused now. I chose to pull the drive that had the write error and replace it with my hot spare.

When I started the array, as the drives were mounting, the new drive came up as Unmountable, though the rebuild is still happening.
I haven't done anything with the original drive that had the read error.

 

I have a bad feeling that the rebuild will result in an empty drive. Can you help me find out what's going on? Do I still have an opportunity to save that data?

UnRaidScreenshot.png

Link to comment
  • elecgnosis changed the title to One Drive Write/Read Errors, One Drive Read Errors, One Parity Drive

I was able to mount the drive with the write error as an unassigned device. I am still able to access its files.

I'm looking at this similar topic, and I think I understand the problem better: While there may not have been any mechanical failure or damage in the drive, its BTRFS filesystem was somehow corrupted?

So, even after rebuilding, I will need to repair the file system on the new drive by using the Scrub command?

 

Link to comment

I had physically removed the original disk 11, but I didn't reassign it to the disk 11 slot. The hot spare was still in its slot when I did the new config.

I reassigned it to slot 11, did another new config, and started up the array. Parity is resyncing. Everything seems to be fine now.

 

Thanks, everyone.

Edited by elecgnosis
  • Like 1
Link to comment
  • elecgnosis changed the title to [SOLVED] One Drive Write/Read Errors, One Drive Read Errors, One Parity Drive

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...