Array Stuck trying to start at Mounting Disks

anotherusr · September 22, 2016

Array won't start, it sits at "mounting disks..."

I've attached what the log is showing.

JorgeB · September 22, 2016

You need to start in maintenance mode and run xfs_repair on disk1.

If array is set to autostart disable it by editing disk.cfg on your flash drive (in the config folder):

change startArray="yes" to "no"

Then:

https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_XFS

anotherusr · September 22, 2016

Thank you for the quick reply. I'll try this when I get home from work and get back about results. Much appreciated!

anotherusr · September 23, 2016

This is what came up after I ran the repair command

Phase 1 - find and verify superblock...

- block cache size set to 1435920 entries

Phase 2 - using internal log

- zero log...

zero_log: head block 1612749 tail block 1460635

ERROR: The filesystem has valuable metadata changes in a log which needs to

be replayed. Mount the filesystem to replay the log, and unmount it before

re-running xfs_repair. If you are unable to mount the filesystem, then use

the -L option to destroy the log and attempt a repair.

Note that destroying the log may cause corruption -- please attempt a mount

of the filesystem before doing this.

JorgeB · September 23, 2016

You need to use -L

anotherusr · September 23, 2016

PERFECT I'm back up and running! Thank you Johnnie.Black!

If I may.. What told you that's what I needed to do? I try to learn from these things so I can recognize them in the future myself.

JorgeB · September 23, 2016

unRAID crashed right after trying to mount disk1, you can see it in the log, disk1 is (md1).

anotherusr · September 23, 2016

Good Deal, Thanks again!

anotherusr · September 25, 2016

Sep 25 11:41:26 Tower shfs/user: err: shfs_create: create_path: /mnt/cache/appdata/couchpotato/data/db_backup appdata/couchpotato/data/db_backup/1474818086.tar.gz /mnt/disk1/appdata/couchpotato/data/db_backup (30) Read-only file system

Sep 25 11:41:27 Tower shfs/user: err: shfs_read: read: (5) Input/output error

Sep 25 11:41:27 Tower kernel: BTRFS error (device sdf1): bdev /dev/sdf1 errs: wr 3635, rd 158575, flush 0, corrupt 0, gen 0

My server started slowing down again the same way it did before I reset and ran into the problem that had me create this post. Is this the same thing starting to happen again?

Squid · September 25, 2016

That is referencing the cache drive. The previous was about disk#1.

You should post your diagnostics for additional help

anotherusr · September 26, 2016

Attached are my Diagnostics part one (the file was 324kb and this site only allows 320kb so I have the other two files that were in the doc in the next post. Thanks Squid.

tower-diagnostics-20160925-2149.zip

anotherusr · September 26, 2016

Part 2 of the diagnostics (smart & system)

tower-diagnostics-20160925-2149_-_pt_2.zip

Squid · September 26, 2016

#1 you really should make your appdata share to be cache-only. Right now, mover is moving it to the array every day. (But this isn't related to your problem, but may/will cause issues on with your apps down the road)

Unless I'm just over tired, (possible), it looks like there's actually a syslog missing somewhere as the logs seem to jump from Sept 24 right to Sept 25 but no matter...

On the 23rd when the array started up, (and the dockers, etc were all loading) you had a bunch of this from your cache drive

Sep 23 12:56:35 Tower kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Sep 23 12:56:35 Tower kernel: ata5.00: cmd 61/40:18:10:57:53/05:00:06:00:00/40 tag 3 ncq 688128 out
Sep 23 12:56:35 Tower kernel:         res 40/00:ac:e0:a5:53/00:00:06:00:00/40 Emask 0x10 (ATA bus error)

And I'm going to go with bad / loose power or sata cable to the ssd as the cause. (https://lime-technology.com/wiki/index.php/The_Analysis_of_Drive_Issues#Drive_interface_issue_.234)

System runs on for a while, then I think a syslog is missing from the diagnostics as it jumps from

Sep 24 17:00:31 Tower root: .d..t...... appdata/plexmediaserver/Plex Media Server/Cache/Transcode/Sessions/
Sep 24 17:00:31 Tower root: >f+++++++++ appdata/plexmediaserver/Plex Media Server/Cache/Transcode/Se

straight to on the next syslog file

Sep 25 04:40:01 Tower root: Community Applications Auto Update Running
Sep 25 04:40:13 Tower shfs/user: err: shfs_open: open: /mnt/cache/appdata/sonarr/nzbdrone.db-wal (30) Read-only file system
Sep 25 04:40:13 Tower shfs/user: err: shfs_open: open: /mnt/cache/appdata/sonarr/nzbdrone.db-shm (30) Read-only file system
Sep 25 04:40:35 Tower shfs/user: err: shfs_write: write: (30) Read-only file system
Sep 25 04:40:35 Tower shfs/user: err: shfs_write: write: (30) Read-only file system

What I'm going to surmise is that the bad connection reared its ugly head again and then the cache drive dropped, and then remounted itself but this time in read-only mode to prevent any further corruption

Another strange thing is that the ssd isn't showing up in the smart reports at all.

I'm nowhere near as good as johnny.black in trying to help with btrfs corruption (and my personal feeling is to use XFS instead of btrfs unless you're running a cache-pool) But it won't be a bad idea to power everything down, and reseat the cables, splitters everything. and then see what happens.

JorgeB · September 26, 2016

I don't really trust the BTRFS repair tools, I would back up cache, format it and start over.

Also, unless you plan to add a 2nd SSD to create a pool I would go with XFS instead of BTRFS.

anotherusr · September 26, 2016

Squid, thank you for the help. I powered down and tried to re-seat the sata cables. I booted back up and saw no cache device present. I said to myself ok let's troubleshoot this for a minute. Swapped sata cables as well as sata power connectors 1-4 of the motherboard sata seamed to work but not 5 or 6. Couldn't figure out what to do than I tried power cycling one last time just to make sure I was still having the issue before posting ... I fired up and I can see the cache drive again. I feel whatever that is might come back around sooner than later as at first I figured ok bad cable that's possible but not 4.

I did set App data to Cache only as you suggested Squid.

Johnnie.black I included a attachment that shows I was already on XFS, is there another place you turn XFS on? I don't plan to add another cache drive btw.

Where do I back it up to and do I need to backup everything? Curious why should I start over, possible corruption somewhere in the files, if that's the case wouldn't I still have it when I backed up?

Back to the issue, I didn't get any other errors so far only this warning.

Sep 25 23:44:34 Tower root: Updating templates...

Sep 25 23:44:34 Tower root: Warning: file_get_contents(): Filename cannot be empty in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 145

Sep 25 23:44:35 Tower root:

Sep 25 23:44:35 Tower root: Warning: file_get_contents(): Filename cannot be empty in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 145

Sep 25 23:44:35 Tower root:

Sep 25 23:44:35 Tower root: Warning: file_get_contents(): Filename cannot be empty in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 145

Sep 25 23:44:37 Tower kernel: docker0: port 2(veth59a2802) entered forwarding state

JorgeB · September 26, 2016

That setting is for array disks, you have to click on the cache disk to change its filesystem, though if it's working correctly now you can leave it as is.

You had filesystem corruption, not file corruption, but sometimes a badly corrupt filesystem can have corrupt files.

Array Stuck trying to start at Mounting Disks

Recommended Posts

anotherusr

Link to comment

JorgeB

Link to comment

anotherusr

Link to comment

anotherusr

Link to comment

JorgeB

Link to comment

anotherusr

Link to comment

JorgeB

Link to comment

anotherusr

Link to comment

anotherusr

Link to comment

Squid

Link to comment

anotherusr

Link to comment

anotherusr

Link to comment

Squid

Link to comment

JorgeB

Link to comment

anotherusr

Link to comment

JorgeB

Link to comment

Join the conversation