Upgraded to 6.6 and drive failures


blurb2m

Recommended Posts

2 hours ago, Maticks said:

I ran into the same issues on 6.6.0. And it started complaining about open files and watcher inotify issues.

even ran into some weirfd BTRFS issues which was my cache drive of all things.

LSI Controller and onboard motherboard controller.

 

4 Hours on 6.5.3 very stable again... i think ill give 6.6.0 a miss till the bugs are ironed out.

 

Sorry to hear that but glad you are stable again. I'm really hoping that it is just the HBA card. New one comes Tuesday and I was still running into issues rolling back to 6.5.3.

Link to comment
4 hours ago, johnnie.black said:

That's on the slow side, but first you need to worry about fixing the problem, then you can worry about speed.

Happy to report 7.5 hours later that it completed with zero errors! Disk4 is fully operational and its the old 3TB disk.

The speed picked up a bit and went over 100MB/s but averaged around 80 - 85MB/s. Not sure why History shows over 280MB/s... (I think that would have finished in 3 hours). 

image.png.075474b6f437102955c2179a1292d390.png

Link to comment
15 minutes ago, johnnie.black said:
Phase 1 - find and verify superblock...
        - block cache size set to 1512192 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 2356505 tail block 2356499
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
sb_fdblocks 489295541, counted 490033385
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 1
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (2:2356527) is ahead of log (1:2).
Format log to cycle 5.

        XFS_REPAIR Summary    Wed Sep 26 11:52:28 2018

Phase		Start		End		Duration
Phase 1:	09/26 11:48:25	09/26 11:48:25
Phase 2:	09/26 11:48:25	09/26 11:49:42	1 minute, 17 seconds
Phase 3:	09/26 11:49:42	09/26 11:49:46	4 seconds
Phase 4:	09/26 11:49:46	09/26 11:49:46
Phase 5:	09/26 11:49:46	09/26 11:49:46
Phase 6:	09/26 11:49:46	09/26 11:49:50	4 seconds
Phase 7:	09/26 11:49:50	09/26 11:49:50

Total run time: 1 minute, 25 seconds
done
Link to comment
3 hours ago, br0kenraid said:

Just out of curiosity, but what version firmware were you on your 9211-8i? I have 2 of them on P20 (20.00.04.00). Also, how long until you started seeing these errors?

I think I was on P20 20.00.07(?)

It was 0.5 - 2.9% into a disk rebuild that it would kill the disk and disable it. Couple hours later the other 3 disks on that SAS port would get bazillions of read/write errors. They just stopped communicating through the card.

 

Since moving to the 9207-8i, it has been flawless. Wroks out of the box with no flashing and uses PCIe 3.0 instead of 2.0 (not that it matters bandwidth-wise but seems more compatible with my x399 Taichi.

Link to comment

I also had similar errors when upgrading to the latest unraid release. One of my parity drives started stalling so I ran a preclear on the drive on a separate machine and it produced pending sector errors. I replaced the drive only to have large CRC errors on multiple drives.

 

After many hours of hair pulling, I figured it out! My LSI 9211-8i had the dreaded P20.00.00 firmware and for some reason, the latest unraid build didn't get along. Upgrading to P20.00.07 fixed everything. No more CRC errors & faster speed in general.

  • Like 1
Link to comment
  • 4 weeks later...

@johnnie.black so an update for everyone.

I turned out to be the BIOS on the HBA causing some kind of weird conflict, i erased it from the HBA and the problem went away.

You only need the BIOS on the HBA if you're using this as a boot drive which since unraid uses a USB that is never going to happen.

Running 6.6.3 without any issues.

 

I ended up doing my upgrades anyway.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.