Have I lost data?

trurl · April 3, 2021

Please review this 2 year old thread you posted to, it isn't very long.

trurl · April 3, 2021

Also

https://wiki.unraid.net/UnRAID_6/Storage_Management#Drive_shows_as_unmountable

trurl · April 3, 2021

And this from the 2nd post in this thread:

On 3/7/2019 at 1:16 PM, trurl said:

If the data was very important, you might be able to recover some of it with some third party software, such as UFS Explorer. Obviously you should not write anything to the disk if you want to have any chance.

leeknight1981 · April 13, 2021

On 4/3/2021 at 9:13 PM, John_M said:

You were protected against data loss, but not against operator error. The big scary warning is really trying to tell you that this is probably not the option you want to choose. I'm sorry for your loss but arguing about it won't bring it back. Some people have reported success in recovering deleted files with a utility called UFS Explorer.

Hiya

So today i had the Same issue with that disk, iv added the smart report, logs before i shut it down logs after i shut it down and the repair status report it spat out.

Re started the array and Boom disk 7 is back, Any ideas as to whats going on? and iv upgarded to 6.9.2

Thanks Again

Lee

Disk 7 xfs_repair status-.rtf r720xd-smart-20210413-1330.zip r720xd-diagnostics-20210413-1315.zip r720xd-diagnostics-20210413-1304.zip

Edited April 13, 2021 by leeknight1981

John_M · April 13, 2021

You ran the file system repair with the default -n (no-modify) flag so it found a huge amount of corruption but didn't attempt to repair it. You can try again without the "-n" and see if it has any success.

leeknight1981 · April 13, 2021

15 minutes ago, John_M said:

You ran the file system repair with the default -n (no-modify) flag so it found a huge amount of corruption but didn't attempt to repair it. You can try again without the "-n" and see if it has any success.

ok Done that

Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 4 - agno = 7 - agno = 6 - agno = 3 - agno = 1 - agno = 5 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done

Does that Help, i cant see anythign in the smart report for the disk to say swap it out! or its on its way out I Dont understand all the logs tbh but would be helpful to understand why the disk 7 goes to unmountable

TIA

Lee

John_M · April 13, 2021

39 minutes ago, leeknight1981 said:

Does that Help

The file system check completed. Is it now mountable when you start the array in normal mode? Is this the disk you formatted? If so it will be empty, as you'd expect. You might find some old remnants of files in the lost+found directory but they probably won't be complete. The SMART report looks ok so the disk itself is probably fine. The most common cause of problems are cables and connectors and it probably dropped off-line but we'll never know as you don't have diagnostics from then. It isn't clear to me what you're trying to achieve at this stage.

leeknight1981 · April 13, 2021

11 minutes ago, John_M said:

The file system check completed. Is it now mountable when you start the array in normal mode? Is this the disk you formatted? If so it will be empty, as you'd expect. You might find some old remnants of files in the lost+found directory but they probably won't be complete. The SMART report looks ok so the disk itself is probably fine. The most common cause of problems are cables and connectors and it probably dropped off-line but we'll never know as you don't have diagnostics from then. It isn't clear to me what you're trying to achieve at this stage.

Yes this is the Disk i formatted i tried the suggested software but all the video files were like FML99877667 FHG9877676 00021547 so i would never know what they was so i took the loss and got a list of missing data. Stuck the drive back in and the data rebuild completed without error. Iv since downloaded some of the losat data still lots to go. But today the drive did the same thing - Unmountable Disk! The Disks are in a Dell R720XD secured in Drive Trays and the Server is in a cupboard at home 8k miles away and never touched. So im wonderinging why that Disk for a second time has gone to unmountable when the server is not touched. I did as you said "ran the file system repair with the default -n" and restarted the array and it was ok again Disk was mounted fine. Then i did what you advised "try again without the "-n"" psted the result and restarted the array.

So id guess im looking to see why its done it again! So checked smart report downloded and posted logs etc, No reason for say the backpaine to randomly start playing up i guess. I think i seen somewhere a recent issue with unraid 6.9.1 and seagate disks but i may be wrong. I Guess im just looking to identify why im having such issues when the server is not touched as we have 3 UnRaid servers and dont ever have any issues. This R720XD is my Emby Server and has a lot of media on thats taken me many many years to Rip and Encode my discs plus the downloads. If it was say the Disk that was at fault id get the wife to swap it out. Im no UnRaid expert so i look here for help being Deployed in Afghanistan the internet here's not the best and i have to use Remote Desktop to an imac to get onto the R720XD to get diagnostics - logs etc So it can be frustrating and the thought of loosing that amount of data / years work is devistating and when home i will setup a backup machine elsewhere to replicate anything i add / remove

Cheers

Lee

John_M · April 13, 2021

Did either of the diagnostics you posted today cover the period when the disk became unmountable? You've removed them and I don't keep other people's diagnostics. If so, that would be the place to look for clues. Unmountable means file system corruption, which can be caused by hardware or power failing. The disk looks ok from it's SMART assessment and it isn't one of the particular model of IronWolf that others have reported having a problem with the firmware. An untouched server can have connector problems because over time they are subject to heating and cooling cycles which disturbs them. So occasionally disconnecting everything and rebuilding it can be a good thing.

leeknight1981 · April 13, 2021

3 minutes ago, John_M said:

This diagnostics was taken directly on reboot soon as it was up and running i am unsure if the drive went to un-mountable before or after i download the diagnostics on fresh reboot, I Shall keep an eye on it and am doing lots of reading while so I don’t say format a Disk in real Time XD.

Thank you for your patience and support and if it does go again ill get as much logs, diagnostics, reports etc and pop back in

Regards

Lee

3 minutes ago, John_M said:

r720xd-diagnostics-20210413-1315.zip

John_M · April 13, 2021

Disk 7 is already corrupt in that one. Have you still got the earlier one?

leeknight1981 · April 13, 2021

Just now, John_M said:

Disk 7 is already corrupt in that one. Have you still got the earlier one?

Yup this is prior to shutdown

r720xd-diagnostics-20210413-1304.zip

John_M · April 13, 2021

That's the one. Disk 7 is good at the beginning and mounts cleanly:

Apr 12 20:58:02 R720XD kernel: XFS (md7): Mounting V5 Filesystem
Apr 12 20:58:03 R720XD kernel: XFS (md7): Ending clean mount
Apr 12 20:58:03 R720XD kernel: xfs filesystem being mounted at /mnt/disk7 supports timestamps until 2038 (0x7fffffff)
Apr 12 20:58:03 R720XD emhttpd: shcmd (57): xfs_growfs /mnt/disk7
Apr 12 20:58:03 R720XD root: meta-data=/dev/md7               isize=512    agcount=8, agsize=268435455 blks
Apr 12 20:58:03 R720XD root:          =                       sectsz=512   attr=2, projid32bit=1
Apr 12 20:58:03 R720XD root:          =                       crc=1        finobt=1, sparse=1, rmapbt=0
Apr 12 20:58:03 R720XD root:          =                       reflink=1
Apr 12 20:58:03 R720XD root: data     =                       bsize=4096   blocks=1953506633, imaxpct=5
Apr 12 20:58:03 R720XD root:          =                       sunit=0      swidth=0 blks
Apr 12 20:58:03 R720XD root: naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
Apr 12 20:58:03 R720XD root: log      =internal log           bsize=4096   blocks=521728, version=2
Apr 12 20:58:03 R720XD root:          =                       sectsz=512   sunit=0 blks, lazy-count=1
Apr 12 20:58:03 R720XD root: realtime =none                   extsz=4096   blocks=0, rtextents=0

but it's corrupt here:

Apr 12 23:33:46 R720XD kernel: XFS (md7): Metadata CRC error detected at xfs_attr3_leaf_read_verify+0x7d/0xc6 [xfs], xfs_attr3_leaf block 0x91b225d0 
Apr 12 23:33:46 R720XD kernel: XFS (md7): Unmount and run xfs_repair

So it must have happened during the intervening two-and-a-half hours. I'll see if I can see anything.

leeknight1981 · April 13, 2021

23 minutes ago, John_M said:

That's the one. Disk 7 is good at the beginning and mounts cleanly:


Apr 12 20:58:02 R720XD kernel: XFS (md7): Mounting V5 Filesystem
Apr 12 20:58:03 R720XD kernel: XFS (md7): Ending clean mount
Apr 12 20:58:03 R720XD kernel: xfs filesystem being mounted at /mnt/disk7 supports timestamps until 2038 (0x7fffffff)
Apr 12 20:58:03 R720XD emhttpd: shcmd (57): xfs_growfs /mnt/disk7
Apr 12 20:58:03 R720XD root: meta-data=/dev/md7               isize=512    agcount=8, agsize=268435455 blks
Apr 12 20:58:03 R720XD root:          =                       sectsz=512   attr=2, projid32bit=1
Apr 12 20:58:03 R720XD root:          =                       crc=1        finobt=1, sparse=1, rmapbt=0
Apr 12 20:58:03 R720XD root:          =                       reflink=1
Apr 12 20:58:03 R720XD root: data     =                       bsize=4096   blocks=1953506633, imaxpct=5
Apr 12 20:58:03 R720XD root:          =                       sunit=0      swidth=0 blks
Apr 12 20:58:03 R720XD root: naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
Apr 12 20:58:03 R720XD root: log      =internal log           bsize=4096   blocks=521728, version=2
Apr 12 20:58:03 R720XD root:          =                       sectsz=512   sunit=0 blks, lazy-count=1
Apr 12 20:58:03 R720XD root: realtime =none                   extsz=4096   blocks=0, rtextents=0

but it's corrupt here:


Apr 12 23:33:46 R720XD kernel: XFS (md7): Metadata CRC error detected at xfs_attr3_leaf_read_verify+0x7d/0xc6 [xfs], xfs_attr3_leaf block 0x91b225d0 
Apr 12 23:33:46 R720XD kernel: XFS (md7): Unmount and run xfs_repair

So it must have happened during the intervening two-and-a-half hours. I'll see if I can see anything.

Will pay in Gold

John_M · April 13, 2021

During that time window the Unassigned Devices plugin tries to automount some network shares but fails, some Docker containers start up. Then there's the use of the CA Backup/Restore Appdata plugin in restore mode, which stops the containers and starts them up again once it has finished. Then a couple of warnings from the Fix Common Problems plugin:

Apr 12 21:08:01 R720XD root: Fix Common Problems: Warning: Share Deluge set to not use the cache, but files / folders exist on the cache drive
Apr 12 21:08:11 R720XD root: Fix Common Problems: Warning: Complex bonding mode on eth0 ** Ignored

Then there are a lot of Mover errors like this one (logging is enabled):

Apr 12 22:56:21 R720XD move: error: move, 391: No such file or directory (2): lstat: /mnt/cache/Deluge/Incomp/The Simpsons/25ª Temporada/The Simpsons s25e13 - The Man Who Grew Too Much.mkv

which are no doubt due to the issue mentioned in the first FCP warning. In other words, there are files on the cache that belong to the Deluge share but they won't be moved because that share is set to cache:no. Which takes us to 23:00, when the Mover runs again but with logging turned off. Then not much until corruption is detected on Disk7. There's nothing that I can see in the log to explain it, such as a controller glitch or read error. So it must have been caused by something that doesn't get logged. I'd be reseating cables but that might be difficult if you're remote from the hardware.

leeknight1981 · April 13, 2021

2 minutes ago, John_M said:
During that time window the Unassigned Devices plugin tries to automount some network shares but fails, some Docker containers start up. Then there's the use of the CA Backup/Restore Appdata plugin in restore mode, which stops the containers and starts them up again once it has finished. Then a couple of warnings from the Fix Common Problems plugin:
Apr 12 21:08:01 R720XD root: Fix Common Problems: Warning: Share Deluge set to not use the cache, but files / folders exist on the cache drive
Apr 12 21:08:11 R720XD root: Fix Common Problems: Warning: Complex bonding mode on eth0 ** Ignored
Then there are a lot of Mover errors like this one (logging is enabled):
Apr 12 22:56:21 R720XD move: error: move, 391: No such file or directory (2): lstat: /mnt/cache/Deluge/Incomp/The Simpsons/25ª Temporada/The Simpsons s25e13 - The Man Who Grew Too Much.mkv
which are no doubt due to the issue mentioned in the first FCP warning. In other words, there are files on the cache that belong to the Deluge share but they won't be moved because that share is set to cache:no. Which takes us to 23:00, when the Mover runs again but with logging turned off. Then not much until corruption is detected on Disk7. There's nothing that I can see in the log to explain it, such as a controller glitch or read error. So it must have been caused by something that doesn't get logged. I'd be reseating cables but that might be difficult if you're remote from the hardware.

I had to restore app data for emby, Im using deluge to re get the lost data. With that deluge is rigting to the array disk 1 & 7 i think and then both parity disks, So i am going to try figure out the best setting for deluge IE move the folders to the cache pool only not the array so it downloads to cache and then when i used filebot to rename i can move them from the cache to the media folder which is along the lines of /mnt/R720XD/Array Share/Plex/Tv Shows (Sourced). Not sure if thats the correct way or not i was concerned about all the extra Read/Writes on the aray and or parity's as i have almost 8TB of stuff to re download. So do i leave deluge set to mnt/user/deluge and cache Yes or do i move it to say mnt/cache/deluge and onec downloded move it to Tv Shows (Sourced) folder using krusader

Cheers

Lee

John_M · April 13, 2021

I don't use downloaders so I can't recommend a good workflow but it makes sense to keep the partial files off the array until they're complete. I'm sure there's plenty of advice to be found in the support threads for the particular containers you use.

leeknight1981 · April 14, 2021

On 4/13/2021 at 8:11 PM, John_M said:

I don't use downloaders so I can't recommend a good workflow but it makes sense to keep the partial files off the array until they're complete. I'm sure there's plenty of advice to be found in the support threads for the particular containers you use.

Just as an update Disk 7 has now got the Red X and says emulated, logs 22:28 before i shut down and 22:42 on reboot shall i order a 12TB to replace it with? or will the corrupted move onto a New Drive

Regards

Lee

r720xd-diagnostics-20210414-2242.zip r720xd-diagnostics-20210414-2227.zip

John_M · April 14, 2021

Thanks for the diags before and after rebooting. The controller lost communication with the disk and when you rebooted the disk didn't come back on-line. Assuming there isn't something strange going on with cables (a loss of power to the disk, for example) it looks as though it has actually failed. It's a shame that SMART didn't give more of a warning. If that is indeed the case you at least know where you stand and can replace the disk and move on. The replacement would rebuild in the same state as the current emulated disk. If that's still mountable and has your files on it then that's how it will rebuild. If it was my server I'd certainly replace the disk with a new one and keep the old one to investigate at my leisure.

leeknight1981 · April 15, 2021

9 hours ago, John_M said:

Thanks for the diags before and after rebooting. The controller lost communication with the disk and when you rebooted the disk didn't come back on-line. Assuming there isn't something strange going on with cables (a loss of power to the disk, for example) it looks as though it has actually failed. It's a shame that SMART didn't give more of a warning. If that is indeed the case you at least know where you stand and can replace the disk and move on. The replacement would rebuild in the same state as the current emulated disk. If that's still mountable and has your files on it then that's how it will rebuild. If it was my server I'd certainly replace the disk with a new one and keep the old one to investigate at my leisure.

Cheers John! Many thanks i have ordered a 12TB WD to Replace the 8TB as i am slowly replacing the Seagates with WD! that 8TB is the newest of the seagates i have 3 x 4TB Seagates left in there that are like 5 years old so they was going to be swapped out anyway but in oldest date order, See how i get on after Replacing the Disk

leeknight1981 · April 18, 2021

So the iffy disk 7 was an 8TB it has been replaced with a 12TB WD, Parity / Data rebuild took 24 ish hours with 0 errors! when i click the folder it says "No listing: Too many files" also in MC cannot read directory contents. some of my shares keep dissapering but re apppear after a re start isos and system 2 off hand. is there anyway to get help with all this cos its doing my nut in and im going to smash the lot. my Deluge now says error where they was downloading fine there is some really weird shit going on and i cant get to the bottom of it from here

Cheers

Lee

John_M · April 18, 2021

It looks like a corrupt file system. Have you repaired it since replacing the disk?

leeknight1981 · April 18, 2021

23 minutes ago, John_M said:

It looks like a corrupt file system. Have you repaired it since replacing the disk?

No i havent didn’t realise i needed too, Do i start the Array in Maintance mode and do it i have no idea anymore its just messing with my brain too much! i just need a working stable server its been fine past 8 years now one thing after another.

Edited April 18, 2021 by leeknight1981

John_M · April 18, 2021

On 4/15/2021 at 12:19 AM, John_M said:

The replacement would rebuild in the same state as the current emulated disk.

We've been through the file system repairing procedure before so just do the same again.

leeknight1981 · April 18, 2021

14 minutes ago, John_M said:

We've been through the file system repairing procedure before so just do the same again.

XD

Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x80002038/0x1000 btree block 1/1032 is suspect, error -74 bad magic # 0x3d1cede3 in btbno block 1/1032 Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x800026b8/0x1000 btree block 1/1240 is suspect, error -74 bad magic # 0x241a9c92 in btbno block 1/1240 Metadata CRC error detected at 0x43cfad, xfs_cntbt block 0x28c0bf890/0x1000 btree block 5/25263895 is suspect, error -74 bad magic # 0x241a9c92 in btcnt block 5/25263895 agf_freeblks 219408396, counted 219359283 in ag 5 agf_btreeblks 67, counted 66 in ag 5 agi unlinked bucket 8 is 1608 in ag 5 (inode=10737419848) agf_btreeblks 87, counted 85 in ag 1 sb_ifree 3509, counted 3576 sb_fdblocks 2402598327, counted 2427541432 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 Metadata CRC error detected at 0x44d87d, xfs_bmbt block 0x80002030/0x1000 - agno = 1 Metadata CRC error detected at 0x44d87d, xfs_bmbt block 0x80002030/0x1000 btree block 1/1031 is suspect, error -74 bad magic # 0x241a9c92 in inode 2147492304 (data fork) bmbt block 268436487 bad data fork in inode 2147492304 cleared inode 2147492304 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 4 - agno = 8 - agno = 2 - agno = 3 - agno = 6 - agno = 7 - agno = 10 - agno = 9 - agno = 5 entry "2147492304" at block 0 offset 752 in directory inode 296 references free inode 2147492304 clearing inode number in entry at offset 752... Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... bad hash table for directory inode 296 (no data entry): rebuilding rebuilding directory inode 296 - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 10737419848, moving to lost+found Phase 7 - verify and correct link counts... Metadata corruption detected at 0x44d778, xfs_bmbt block 0x80002030/0x1000 libxfs_bwrite: write verifier failed on xfs_bmbt bno 0x80002030/0x1000 Maximum metadata LSN (1984442616:371148) is ahead of log (1:2). Format log to cycle 1984442619. xfs_repair: Releasing dirty buffer to free list! xfs_repair: Refusing to write a corrupt buffer to the data device! xfs_repair: Lost a write to the data device! fatal error -- File system metadata writeout failed, err=117. Re-run xfs_repair.

NO Flag

Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 3 - agno = 10 - agno = 4 - agno = 2 - agno = 5 - agno = 7 - agno = 6 - agno = 9 - agno = 8 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done

Edited April 18, 2021 by leeknight1981

Have I lost data?

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

John_M

leeknight1981

leeknight1981

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation