blk_update_request: i/o error


Duniac

Recommended Posts

Hi All,

 

I have been using Unraid for a couple of couple of months now, and have 1 parity drive with 5 drives in the storage array (would like to configure a second parity disk soon).

 

This week disk 3 became disabled with errors in the logs (blk_update_request: i/o error).  I researched the error and tried connecting the disk to another channel on the hba controller, no change.  After changing the channel, I removed the disk from the array and ran pre-clear, the error appeared in the pre-read, I then ran pre-clear again and it completed successfully.  Then ran pre-clear again successfully.  At this point I hoped that the problem was fixed.  So I then added the disk back to the array and the parity rebuild started, but then failed with the same error.

 

I now have two notifications appearing:

Unraid array errors: 30-01-2020 16:55
Warning [UNRAID-MEDIA] - array has errors
Array has 3 disks with read errors

---------------------------
Unraid array errors: 30-01-2020 17:01
Warning [UNRAID-MEDIA] - array has errors
Array has 4 disks with read errors

 

Can someone please point me in the right direction?  Is it possible the HBA Controller is failing or is there a problem with Unraid?

 

Attached diagnostics.

 

Thanks in advance.

unraid-media-diagnostics-20200130-1053.zip

Edited by Duniac
Link to comment

Log is completely spammed with these errors from the HBA:

 

Jan 30 04:40:06 Unraid-Media kernel: mpt3sas 0000:04:00.0: AER: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Jan 30 04:40:06 Unraid-Media kernel: mpt3sas 0000:04:00.0: AER:   device [1000:0087] error status/mask=00000001/00002000
Jan 30 04:40:06 Unraid-Media kernel: mpt3sas 0000:04:00.0: AER:    [ 0] RxErr                 
Jan 30 04:40:17 Unraid-Media kernel: pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:00.0

 

While technically this is not a problem if nothing else it spams the log, look for a bios update or try a different slot for the HBA to see if these errors stop, also a good idea to update the LSI to latest firmware since it's on a very old one, current is p20.00.07.00

Link to comment

Thanks for the suggestions, will look at updating the firmware on the HBA controller and HDD.

Hopefully this will correct the problem.

In the meantime I have suspended all operations and turned off all containers, I don't want any operation occurring while I have two bad disks.

Link to comment

Update:

followed the instructions https://wiki.unraid.net/UnRAID_Manual_-_FAQ

The drive was identified correctly, added to the array and the parity rebuild began.

However, the error re-appeared, see new diagnostics.

Ran SMART extended self-test, however the following message appeared - Interrupted (host reset).

Also noticed that all of my docker containers have gone missing.

Also, also seems that this drive is not being emulated at all, as I am missing a lot of content.

I will be purchasing a new drive in about one week, but in the meantime I'll be running with no protection and it seems that Unraid is totally shitting itself now!

 

Any assistance anyone can provide will be very appreciative.

unraid-media-diagnostics-20200202-0741.zip

Link to comment

Multiple issues:

 

Feb  2 17:12:23 Unraid-Media kernel: XFS (md3): Metadata CRC error detected at xfs_dir3_block_read_verify+0x7c/0xc5 [xfs], xfs_dir3_block block 0x100000038
Feb  2 17:12:23 Unraid-Media kernel: XFS (md3): Unmount and run xfs_repair

Disk 3 had a corrupt file system.

 

Feb  2 17:59:08 Unraid-Media kernel: md: disk1 read error, sector=952405472
Feb  2 17:59:08 Unraid-Media kernel: md: disk1 read error, sector=952405480
Feb  2 17:59:08 Unraid-Media kernel: md: disk1 read error, sector=952405488
Feb  2 17:59:08 Unraid-Media kernel: md: recovery thread: multiple disk errors, sector=952405168
Feb  2 17:59:08 Unraid-Media kernel: md: disk1 read error, sector=952405496
Feb  2 17:59:08 Unraid-Media kernel: md: disk1 read error, sector=952405504

 

There were read errors on disk1 during disk3 rebuild, so rebuild will be corrupt, if the rebuild didn't finish you can try again after fixing the disk1 issues, and fixing the file system on disk3.

Link to comment
12 hours ago, johnnie.black said:

There were read errors on disk1 during disk3 rebuild, so rebuild will be corrupt, if the rebuild didn't finish you can try again after fixing the disk1 issues, and fixing the file system on disk3.

How should I approach fixing the read errors occuring?

Link to comment
12 hours ago, johnnie.black said:

Disk looks healthy, but there are known issues with those models and LSI, Seagate release a firmware update for that, see if that helps.

https://apps1.seagate.com/downloads/certificate.html?key=1237891795995

It does seem odd that I have been operating without problems for about six months.

 

Should I look at replacing the Controller or will the firmware update be sufficient?

 

Will also be performing the following:

  • Purchasing another drive to ensure dual parity
  • Running diagnostics on the disk causing the problems
  • Updating firmware on all disks
  • Extracting the controller card and updating the firmware
  • Reinstalling controller card and disk
  • Will preclear new disk and the disk causing the problems.
  • Insert new disk into the array in the position of the disk causing problems.
  • Ensure parity has sync'd correctly.
  • Then add the disk which caused the problems as parity.
Link to comment
9 hours ago, Duniac said:

will the firmware update be sufficient?

Should be, it was release specifically to fix that problem (possible others also), also worth checking all connections.

 

First thing you want to do is to rebuild disk3 (assuming it never completed), since old rebuild was going to be mostly corrupt.

Link to comment
On 2/4/2020 at 6:54 PM, johnnie.black said:

Should be, it was release specifically to fix that problem (possible others also), also worth checking all connections.

 

First thing you want to do is to rebuild disk3 (assuming it never completed), since old rebuild was going to be mostly corrupt.

Looks like things are going from bad to worse.

I've now lost disk 1 (Unmountable: No file system).

Backing up data seems to be a lost cause at the moment as nothing is copying, everything is showing read errors.

I do have backups of critical data, but I hope not to have to loose everything, but it seems like that is what is happening now.

Still don't understand why these problems has suddenly appeared, but will try to move forward...

Link to comment

Hi All,

 

Couple of updates:
1.  I have purchased a new drive to try to recover data.

2.  I completed a firmware update on the disk which originally caused the problems.

 

Now, when I applied the firmware update on the disk, Seagate advises to backup the disk as it may result in lost data.

 

itimpi:

I have ran a full scan of the disk which was experiencing the problems (took about 15 hours), no errors were found.


tee-tee jorge:

The errors I experienced were prior to extracting the disk to update the firmware.

 

I haven't had time in the last week to investigate further and have had the machine turned off.

What is the best approach for trying to recover data?  I have lost a lot and desperately need to recover some data which was added to the array and wan't backed up.

I have tried access the data from within UnRaid, by navigating to the disk and trying to see the data, but nothing is appearing.  Are there any tools which can be used to try to extract the data?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.