Disk possibly corrupted but status still shows as normal?

highgear · April 25, 2023

Hello, I noticed the other day that a number of my media files were inaccessible within Plex. I investigated and found that these certain files no longer showed at all within the media folder. After looking into it I found that disk6 shows in midnight commander in red as "?disk6" and when I click on it I get a message saying "Cannot read directory contents". All the other disks appear to be functioning normally.

Does this mean that disk6 is corrupted? What could have caused this/what can be done to prevent it in the future? If so, am I able to remove the drive and rebuild using parity data on a new or even the same disk to recover my files?

If the disk is corrupted, why would the disk still show as normal/green from within Unraid?

Thanks for the help any and all assistance is appreciated! I've attached my diagnostics I just pulled.

Edit:

I reviewed the logs and I keep seeing these repeated, among other errors:

emhttpd: error: get_fs_sizes, 6081: Input/output error (5): statfs: /mnt/user/downloads

sys_disk_free: VFS disk_free failed. Error was : Input/output error

smbd[25122]: chdir_current_service: vfs_ChDir(/mnt/user/Hyperspin) failed: Input/output error.

I'm thinking I need to run xfs_repair on disk6 but I want to confirm there's nothing I should do first. If I need to run xfs_repair, I should be using this command correct?

xfs_repair -v /dev/md6

tower-diagnostics-20230425-1300.zip

Edited April 26, 2023 by highgear

JorgeB · April 26, 2023

Syslog rotated so we can't see the start of the problem, reboot and post new diags after array start.

highgear · April 26, 2023

@JorgeB Ok I rebooted and grabbed new diagnostics. After I rebooted, Disk 6 says "Unmountable: Wrong or no file system".

tower-diagnostics-20230426-0843.zip

Edited April 26, 2023 by highgear

JorgeB · April 26, 2023

Check filesystem on disk6.

highgear · April 26, 2023

26 minutes ago, JorgeB said:

Check filesystem on disk6.

Ok thanks. I just tried to run xfs_repair within the GUI using the -v command and got this error after a bit:

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 33370936 bytes) in /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(693) : eval()'d code on line 895

JorgeB · April 26, 2023

Try on the CLI, array must still be started in maintenance mode:

xfs_repair -v /dev/md6

highgear · April 26, 2023

23 minutes ago, JorgeB said:
Try on the CLI, array must still be started in maintenance mode:
xfs_repair -v /dev/md6

Thanks! Ok that ran and completed without any errors. Now that disk has 1.2+TB free whereas before I'd guess it only had <200gb free. There's also a bunch of files in a new lost+found folder.

Should I continue using this disk? Is there anything I can do to recover the lost files assuming I don't have a backup of this disk? Anything I should do moving forward to prevent this?

JorgeB · April 26, 2023

Disk looks OK, filesystem corruption can happen independent of the disk heath, check the lost + found folder or restore from a backup.

highgear · April 26, 2023

1 minute ago, JorgeB said:

Disk looks OK, filesystem corruption can happen independent of the disk heath, check the lost + found folder or restore from a backup.

Ok thanks. Unfortunately no backup for the media files. If I pulled the disk and did a rebuild from parity what would happen?

JorgeB · April 26, 2023

Parity cannot help with filesystem corruption, assuming it's in sync it would rebuild the same thing.

highgear · April 27, 2023

21 hours ago, JorgeB said:

Parity cannot help with filesystem corruption, assuming it's in sync it would rebuild the same thing.

Gotcha that's what I thought. How can I prevent this in the future? This is the second time I've lost a significant amount of data due to filesystem corruption. I recently swapped to all new SATA cables but I suppose the damage could have already occurred.

I use a Supermicro mb and a solid Seasonic PSU w/ server grade RAM. Should I run memtest?

JorgeB · April 27, 2023

If you're using ECC RAM unlikely that the problem is there, any unclean shutdowns? Could also be something in the storage subsystem.

highgear · April 28, 2023

12 hours ago, JorgeB said:

If you're using ECC RAM unlikely that the problem is there, any unclean shutdowns? Could also be something in the storage subsystem.

Ok I am using ECC so good to know. No unclean shutdowns in at least 3-4 months. I thought as long as parity checks came back ok everything was fine?

What would I look for in the storage subsystem? Thanks

itimpi · April 28, 2023

57 minutes ago, highgear said:

I thought as long as parity checks came back ok everything was fine?

That just means that parity agrees at the bit level with what is on the drives, not that there is no corruption at the file system level. I would recommend running a file system check (and repair if needed) on all drives.

highgear · April 28, 2023

13 hours ago, itimpi said:

That just means that parity agrees at the bit level with what is on the drives, not that there is no corruption at the file system level. I would recommend running a file system check (and repair if needed) on all drives.

Ok thanks for explaining. So what command do you recommend using first? Just -n?

Run this on all drives in maintenance mode, even parity? Thanks again.

itimpi · April 28, 2023

45 minutes ago, highgear said:

Ok thanks for explaining. So what command do you recommend using first? Just -n?

Run this on all drives in maintenance mode, even parity? Thanks again.

I would definitely start with the -n option to see if there are any obvious problems. You can always repeat later without the -n option. The tests should not take long if the drives are in xfs format.

Parity has no file system so you cannot run a check on it.

JorgeB · April 29, 2023

Note that the xfs_repair output is not always clear if there's correuotiuon or not, you need to check the exit code, or run it always without -n.

highgear · May 1, 2023

Thanks for the help everyone. I ran xfs_repair on my other disks and everything looked good.

Disk possibly corrupted but status still shows as normal?

Recommended Posts

highgear

Link to comment

JorgeB

Link to comment

highgear

Link to comment

JorgeB

Link to comment

highgear

Link to comment

JorgeB

Link to comment

highgear

Link to comment

JorgeB

Link to comment

highgear

Link to comment

JorgeB

Link to comment

highgear

Link to comment

JorgeB

Link to comment

highgear

Link to comment

itimpi

Link to comment

highgear

Link to comment

itimpi

Link to comment

JorgeB

Link to comment

highgear

Link to comment

Join the conversation