Jump to content

Moved to new hardware, Data drive became disabled during parity


Go to solution Solved by JorgeB,

Recommended Posts

Hey all,

 

I have had unraid running for years on my trusty HP gen3 micro server. I moved the drives to a newer G7 system.

 

I had to do a new configuration thanks to the SAS card fooling Unraid into thinking I had different serial numbers. So I after placing the drives in their correct slots, and starting with "parity is valid" checked, it was happy and I was able to see all the files.

 

All seemed well until I worked out 2 of the 4 SAS channels in use are throwing CRC errors...

 

Originally it was only one, so I relocated that drive to a known good bay and it was happy. Now, in what was most likely a stupid thing to do, I let Unraid start a parity sync after starting as I thought it would be good way to confirm I wont be getting any more CRC errors...

 

So it did not go well, another drive during this sync threw so many errors that unraid has now disabled it, and the parity has paused. In looking, this drive may actually be faulty beyond my original CRC issue.

 

So, now I feel like I am in a precarious position, as my parity is now in an unknown state and 30% of my files are not visible (e.g. unraid is not emulating the disabled drive). That is my biggest worry - that the missing drive is not being emulated...

 

What should my next step be? I am happy to pull the potentially faulty drive and manually recover the files from it later, but am unsure of how to go about this while keeping the current setup with what left set up remaining running.

 

I have a new 10 TB drive I was going to use to replace the parity drive once things have settled down... looks like things just are not going to my plan...

 

 

unraid disabled drive.png

stuff2-diagnostics-20231107-1410.zip

Edited by unravelit
clarified
Link to comment

I really appreciate your time with this. Running the check via the GUI gave this message...

 

Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

Edited by unravelit
Link to comment

OK, making progress, -L completed:

 

Phase 1 - find and verify superblock...

Phase 2 - using internal log - zero log...

ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used.

- scan filesystem freespace and inode maps...

clearing needsrepair flag and regenerating metadata

- found root inode chunk

Phase 3 - for each AG... - scan and clear agi unlinked lists...

- process known inodes and perform inode discovery...

- agno = 0

- agno = 1

- agno = 2

- agno = 3

- agno = 4

- agno = 5

- agno = 6

- agno = 7

- process newly discovered inodes...

Phase 4 - check for duplicate blocks...

- setting up duplicate extent list...

- check for inodes claiming duplicate blocks...

- agno = 0

- agno = 1

- agno = 2

- agno = 3

- agno = 4

- agno = 5

- agno = 6

- agno = 7

Phase 5 - rebuild AG headers and trees...

- reset superblock...

Phase 6 - check inode connectivity...

- resetting contents of realtime bitmap and summary inodes

- traversing filesystem ...

- traversal finished ...

- moving disconnected inodes to lost+found ...

Phase 7 - verify and correct link counts...

Maximum metadata LSN (67:3647140) is ahead of log (1:2).

Format log to cycle 70.

done

 

 

I have not brought the array out of maintenance mode yet.

 

 

Link to comment

Thanks again for your help, you guided me through to a working solution. The parity sync completed a few hours ago, and aside from a handful of errors on one disk, all is well.

 

That disk that had a few errors is well overdue to be replaced anyway as it has been online for over 7 years (I only realised when checking it's SMART info!). Now things have settled I can now work on replacing the parity drive with a newer, larger drive, and use the "old" parity drive to replace the very old drive...

 

I really appreciate your patience and help.

 

Cheers!

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...