rebuilding a disk, then no more filesystem


neuk34

Recommended Posts

Hello,

 

Hope you're well.

I'm experiencing a problem : my disk9 failed. So I went to the store, I bought a new one, and I launched rebuild disk.

I noticed that disk 8 was counting many errors (around 480 Million) while disk9 was rebuilding.

Rebuild disk9 was a success, but my disk8 lost its filesystem.

I don't know if my parity is valid.

IMG_2019.thumb.PNG.509211b531ae08337f5e74f0fbf8eace.PNG

 

IMG_2020.thumb.PNG.1cbe4c0dfab258521892ffec073b90e2.PNG

What should I do?

formatting and rebuilding a new file system, then launch a rebuild to recover data?

or should I try to recover my filesystem using xfs_repair -v /dev/md8?

 

thanks for your help

Capture.JPG

Edited by neuk34
Link to comment

Unfortunately diags are after rebooting, but disk9 looks healthy, likely dropped offline during the previous rebuild, and since it's on a SAS2LP it wouldn't be that surprising, you can rebuild disk8 again, it might not be perfect but it would be better than it is now.

 

Also I see sync errors during various parity checks, was that never a concern for you?

Link to comment

Thanks ! I launched format and rebuild disk on disk8 : all datas has been lost. Moreover, disk8 has been disactived.

I set a new disk8 trying to recover my datas thanks to parity.

break on legs !!

 

Presently, datas on others disk cannot be displayed : all directories have disappeared, dunno why.

Edited by neuk34
Link to comment

Format is NEVER part of the rebuild process. Format is a write operation. It writes a new empty filesystem to the disk. Unraid treats that write operation exactly as it does any other write operation, by updating parity. So, after the format, parity agrees the disk has an empty filesystem. If you rebuild after formatting a disk, the result is a formatted disk.

Link to comment
3 hours ago, trurl said:

The data on the other disks should not be affected. What do you mean about the directories have all disappeared. Do you mean you can't see your shares on the network?

There's file system on corruption on disk9, I didn't mentioned it before so the OP could deal with disk8 first, but since he "solved" the problem by formatting disk8, now run reiserfsck on disk9:

https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

 

Link to comment
3 hours ago, trurl said:

Format is NEVER part of the rebuild process. Format is a write operation. It writes a new empty filesystem to the disk. Unraid treats that write operation exactly as it does any other write operation, by updating parity. So, after the format, parity agrees the disk has an empty filesystem. If you rebuild after formatting a disk, the result is a formatted disk.

It's a misunderstanding. When Johnnie wrote : "you can rebuild disk8 again", i thought it was OK to rebuiild filesystem through a format. I guess I lost 2 To of data. great.

 

3 hours ago, trurl said:

The data on the other disks should not be affected. What do you mean about the directories have all disappeared. Do you mean you can't see your shares on the network?

 

I hope you have another copy of anything important and irreplaceable.

 

Post another diagnostic.

I can access my shares, but there are empty.

 

42 minutes ago, johnnie.black said:

There's file system on corruption on disk9, I didn't mentioned it before so the OP could deal with disk8 first, but since he "solved" the problem by formatting disk8, now run reiserfsck on disk9:

https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

 

OK, i'm still waiting data rebuild for disk8, that lead to nothing, because it seems that I've lost my data when I cliecked on "format".

I will check a reiserfsck asap.

By the way, third time I lost a disk, and i've never been able to rebuild any of them. Does it comes from my hardware supermicro? Should it be better with a H330 dell raid card?

 

thanks

nas-diagnostics-20190115-0915.zip

Link to comment
1 hour ago, neuk34 said:

By the way, third time I lost a disk, and i've never been able to rebuild any of them. Does it comes from my hardware supermicro? Should it be better with a H330 dell raid card?

SASLP/SAS2LP are not recommended for some time, they can drop disks without reason, recommend replacing it with an LSI HBA, any LSI with a SAS2008/2308/3008 chipset in IT mode, e.g., 9201-8i, 9211-8i, 9207-8i, 9300-8i, etc and clones, like the Dell H200/H310 and IBM M1015, these latter ones need to be crossflashed.

Link to comment
2 hours ago, johnnie.black said:

SASLP/SAS2LP are not recommended for some time, they can drop disks without reason, recommend replacing it with an LSI HBA, any LSI with a SAS2008/2308/3008 chipset in IT mode, e.g., 9201-8i, 9211-8i, 9207-8i, 9300-8i, etc and clones, like the Dell H200/H310 and IBM M1015, these latter ones need to be crossflashed.

The Adaptec HBA 1000-8i should also work.

Link to comment

Hello,

 

On 1/15/2019 at 8:30 AM, johnnie.black said:

There's file system on corruption on disk9, I didn't mentioned it before so the OP could deal with disk8 first, but since he "solved" the problem by formatting disk8, now run reiserfsck on disk9:

https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

 

Done ! conclusion : i've lost 2 To of data on my 4 To disk9 + 2 To on disk8 (my mistake to have formatted filesystem)

Is there a way to recover my disk9 data's thanks to parity?

nas-diagnostics-20190117-0937.zip

Link to comment
42 minutes ago, neuk34 said:

i've lost 2 To of data on my 4 To disk9

That's unusual, reiserfsck is usually good at repairing the filesystem with no or minimal data loss.

 

42 minutes ago, neuk34 said:

Is there a way to recover my disk9 data's thanks to parity?

No, parity can't help with filesystem corruption, you can try --rebuild-tree with --scan-whole-partition, but before doing backup current data on disk9 to another disk, also don't do any more writes to disk9, after backup start the array in maintenance mode and use:

 

reiserfsck --rebuild-tree --scan-whole-partition /dev/md9

 

 

Link to comment

Thanks for your help. I've been able to get my data back from disk8. And I've able to download the data I lose on my disk9.

I'm going to buy a new HBA card as you advised to avoid any disk dropping.

Do you think this one is OK ?

 

https://www.ebay.com/itm/LSI-SAS-SATA-IT-Mode-9201-8I-6Gbps-8-Ports-RAID-Controller-Card-IT-9211-8I/142729911794?epid=15025014623&hash=item213b5d9df2:g:kC4AAOSw7p1asoVd:rk:1:pf:0

 

Thanks

Link to comment

Hello, I am experiencing a new disaster after spending 1 week to recover my datas back

 

@johnnie.black, I've done exactly what you told me : backuping my disk9. I did it spreading datas on all others health disks. At this time, disk9 contains no data.

Then I launched : reiserfsck --rebuild-tree --scan-whole-partition /dev/md9

Filesystem was out, and I had to format resetting disk (no matter because all datas were secured and disk was empty).

All lights green !

 

Then, I decided to check filesystem of all disks to be sure that everything was OK.

No corruption found on any disk. Great ! I was thinking : I've got all my data's back,  all filesystem are OK, let's check the parity

 

I launched now a parity check, and 977 errors found, that dropped disk8 - 4 Tb (xfs filesystem, that was OK during check)

disk8 is now off, and data are not reachable anymore, content is not emulated.

I tried to mount it with unassigned device, but it can't.

You can guess that i'm getting crazy. :/

 

What is going on ? What should I do ?

 

Thanks for your help

 

 

nas-diagnostics-20190126-1305.zip

Edited by neuk34
Link to comment

Disk8 dropped offline, most likely a controller problem:

On 1/15/2019 at 9:21 AM, johnnie.black said:

SASLP/SAS2LP are not recommended for some time, they can drop disks without reason

But you'll need to reboot and post new diags so there's a SMART report.

 

The emulated disk needs a filesystem check:

https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

or

https://wiki.unraid.net/Check_Disk_Filesystems#Drives_formatted_with_XFS

 

Link to comment

Thanks Johnnie.

I follow your advise rebooting my server.

It's OK, disk8 can be mounted through unassigned devices. And my datas are still here.

 

I launch a check :

 


Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used.  Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
        - scan filesystem freespace and inode maps...
Metadata CRC error detected at xfs_allocbt block 0x1529d5b10/0x1000
btree block 5/99653134 is suspect, error -74
bad magic # 0x5c1bbbcd in btbno block 5/99653134
Metadata CRC error detected at xfs_allocbt block 0x12343ae18/0x1000
btree block 5/352367 is suspect, error -74
bad magic # 0xd358f85d in btcnt block 5/352367
agf_freeblks 5493034, counted 5492507 in ag 5
agf_btreeblks 32, counted 30 in ag 5
Metadata CRC error detected at xfs_allocbt block 0x952983d0/0x1000
btree block 2/68626418 is suspect, error -74
bad magic # 0xa839f4a6 in btcnt block 2/68626418
Metadata CRC error detected at xfs_allocbt block 0x9bc14780/0x1000
btree block 2/82452584 is suspect, error -74
bad magic # 0x84931db4 in btcnt block 2/82452584
Metadata CRC error detected at xfs_allocbt block 0x9c0d1ca8/0x1000
btree block 2/83073805 is suspect, error -74
bad magic # 0x62b94362 in btcnt block 2/83073805
Metadata CRC error detected at xfs_allocbt block 0x9c1f21e8/0x1000
btree block 2/83221429 is suspect, error -74
bad magic # 0x9391f868 in btcnt block 2/83221429
Metadata CRC error detected at xfs_allocbt block 0x19788eee8/0x1000
btree block 7/1 is suspect, error -74
bad magic # 0x6a0fa8a2 in btcnt block 7/1
Metadata CRC error detected at xfs_allocbt block 0x7a551eb0/0x1000
btree block 2/12360526 is suspect, error -74
bad magic # 0x818081 in btcnt block 2/12360526
agf_freeblks 9836794, counted 9792588 in ag 7
agf_btreeblks 133, counted 132 in ag 7
agf_freeblks 34159166, counted 33590046 in ag 2
agf_btreeblks 135, counted 130 in ag 2
sb_fdblocks 93204249, counted 92590388
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Metadata CRC error detected at xfs_bmbt block 0xc7a7f6e8/0x1000
btree block 3/52425233 is suspect, error -74
bad magic # 0x1428857f in inode 3221225574 (data fork) bmbt block 455078417
bad data fork in inode 3221225574
would have cleared inode 3221225574
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 2
        - agno = 3
entry "00001.m2ts" at block 0 offset 96 in directory inode 3221225573 references free inode 3221225574
	would clear inode number in entry at offset 96...
        - agno = 0
bad magic # 0x1428857f in inode 3221225574 (data fork) bmbt block 455078417
bad data fork in inode 3221225574
would have cleared inode 3221225574
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
entry "00001.m2ts" in directory inode 3221225573 points to free inode 3221225574, would junk entry
bad hash table for directory inode 3221225573 (no data entry): would rebuild
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
Maximum metadata LSN (1860872985:535073526) is ahead of log (4:37307).
Would format log to cycle 1860872988.
No modify flag set, skipping filesystem flush and exiting.

I laucnhed a new check. Could you please confirm that everything is ok?

 

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 0
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

thanks

Edited by neuk34
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.