Jump to content

So, I think I screwed myself tonight, looking for help. 6.11.5+


mattw
Go to solution Solved by trurl,

Recommended Posts

It all started when I wanted to remove my cache drive, the system had been unstable after a ram upgrade and the addition of my first cache drive.  So, I shut down and removed the ram.  Booted up and everything was good.  Did tools>new config and cleared the config and marked the cache drive and not installed.  Rebooted to an offline array that wanted me to reassign devices.  This is where the problem begins... my parity drive and one of my 2TB data drives are 1 digit different in the serial number and I reversed them and rebooted.  System restarted and started a parity check, which I canceled because I thought it was odd.  Looked at the disk assignments are realized I had really screwed up.  I reassigned the drives and rebooted with parity valid checked to try and avert more damage.  Realizing what I had done, I unassigned the parity disk and moved it to the array and started the array.  It now tells me that the 2 WD drives are unreadable, I somewhat expected that since it likely over wrote part of one or the other.  The cache drive that started all of this was on a PCI controller, I removed the controller and moved the empty cache drive to one of my mobo sata controller ports. 

 

So, how screwed am I?  I do not have a backup since getting to an offsite backup at 30Mbps is painful.  The loss of years worth of movies and all of my digitized cd collection will be painful.

 

I am in this state now... parity drive in array and array not starting, but it knows about my shares.

image.png.19d7487b65a157ac357b7e32ac44973f.png

 

The system now thinks my array is 70% full, was 49% before this adventure.

 

Interesting stuff from the logs...  disk 3 sees corruption, was the disk swapped in place of the parity drive.  Disk 5 is showing unsupported file system (real parity drive).

 

Dec 18 20:02:36 Tower  emhttpd: shcmd (154): mkdir -p /mnt/disk1
Dec 18 20:02:36 Tower  emhttpd: shcmd (155): mount -t xfs -o noatime,nouuid /dev/md1 /mnt/disk1
Dec 18 20:02:36 Tower kernel: SGI XFS with ACLs, security attributes, no debug enabled
Dec 18 20:02:36 Tower kernel: XFS (md1): Mounting V5 Filesystem
Dec 18 20:02:36 Tower kernel: XFS (md1): Ending clean mount
Dec 18 20:02:36 Tower  emhttpd: shcmd (156): xfs_growfs /mnt/disk1
Dec 18 20:02:36 Tower root: meta-data=/dev/md1               isize=512    agcount=4, agsize=91571160 blks
Dec 18 20:02:36 Tower root:          =                       sectsz=512   attr=2, projid32bit=1
Dec 18 20:02:36 Tower root:          =                       crc=1        finobt=1, sparse=1, rmapbt=0
Dec 18 20:02:36 Tower root:          =                       reflink=1    bigtime=1 inobtcount=1
Dec 18 20:02:36 Tower root: data     =                       bsize=4096   blocks=366284638, imaxpct=5
Dec 18 20:02:36 Tower root:          =                       sunit=0      swidth=0 blks
Dec 18 20:02:36 Tower root: naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
Dec 18 20:02:36 Tower root: log      =internal log           bsize=4096   blocks=178849, version=2
Dec 18 20:02:36 Tower root:          =                       sectsz=512   sunit=0 blks, lazy-count=1
Dec 18 20:02:36 Tower root: realtime =none                   extsz=4096   blocks=0, rtextents=0
Dec 18 20:02:36 Tower  emhttpd: shcmd (157): mkdir -p /mnt/disk2
Dec 18 20:02:36 Tower  emhttpd: shcmd (158): mount -t xfs -o noatime,nouuid /dev/md2 /mnt/disk2
Dec 18 20:02:36 Tower kernel: XFS (md2): Mounting V5 Filesystem
Dec 18 20:02:37 Tower kernel: XFS (md2): Ending clean mount
Dec 18 20:02:37 Tower  emhttpd: shcmd (159): xfs_growfs /mnt/disk2
Dec 18 20:02:37 Tower root: meta-data=/dev/md2               isize=512    agcount=4, agsize=91571160 blks
Dec 18 20:02:37 Tower root:          =                       sectsz=512   attr=2, projid32bit=1
Dec 18 20:02:37 Tower root:          =                       crc=1        finobt=1, sparse=1, rmapbt=0
Dec 18 20:02:37 Tower root:          =                       reflink=1    bigtime=1 inobtcount=1
Dec 18 20:02:37 Tower root: data     =                       bsize=4096   blocks=366284638, imaxpct=5
Dec 18 20:02:37 Tower root:          =                       sunit=0      swidth=0 blks
Dec 18 20:02:37 Tower root: naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
Dec 18 20:02:37 Tower root: log      =internal log           bsize=4096   blocks=178849, version=2
Dec 18 20:02:37 Tower root:          =                       sectsz=512   sunit=0 blks, lazy-count=1
Dec 18 20:02:37 Tower root: realtime =none                   extsz=4096   blocks=0, rtextents=0
Dec 18 20:02:37 Tower  emhttpd: shcmd (160): mkdir -p /mnt/disk3
Dec 18 20:02:37 Tower  emhttpd: shcmd (161): blkid -t TYPE='xfs' /dev/md3 &> /dev/null
Dec 18 20:02:37 Tower  emhttpd: shcmd (162): mount -t xfs -o noatime,nouuid /dev/md3 /mnt/disk3
Dec 18 20:02:37 Tower kernel: XFS (md3): Mounting V5 Filesystem
Dec 18 20:02:37 Tower kernel: XFS (md3): Corruption warning: Metadata has LSN (1:115531) ahead of current LSN (1:71954). Please unmount and run xfs_repair (>= v4.3) to resolve.
Dec 18 20:02:37 Tower kernel: XFS (md3): log mount/recovery failed: error -22
Dec 18 20:02:37 Tower kernel: XFS (md3): log mount failed
Dec 18 20:02:37 Tower root: mount: /mnt/disk3: wrong fs type, bad option, bad superblock on /dev/md3, missing codepage or helper program, or other error.
Dec 18 20:02:37 Tower root:        dmesg(1) may have more information after failed mount system call.
Dec 18 20:02:37 Tower  emhttpd: shcmd (162): exit status: 32
Dec 18 20:02:37 Tower  emhttpd: /mnt/disk3 mount error: Wrong or no file system
Dec 18 20:02:37 Tower  emhttpd: shcmd (163): umount /mnt/disk3
Dec 18 20:02:37 Tower root: umount: /mnt/disk3: not mounted.
Dec 18 20:02:37 Tower  emhttpd: shcmd (163): exit status: 32
Dec 18 20:02:37 Tower  emhttpd: shcmd (164): rmdir /mnt/disk3
Dec 18 20:02:37 Tower  emhttpd: shcmd (165): mkdir -p /mnt/disk4
Dec 18 20:02:37 Tower  emhttpd: shcmd (166): mount -t xfs -o noatime,nouuid /dev/md4 /mnt/disk4
Dec 18 20:02:37 Tower kernel: XFS (md4): Mounting V5 Filesystem
Dec 18 20:02:37 Tower kernel: XFS (md4): Ending clean mount
Dec 18 20:02:37 Tower  emhttpd: shcmd (167): xfs_growfs /mnt/disk4
Dec 18 20:02:37 Tower root: meta-data=/dev/md4               isize=512    agcount=4, agsize=122094660 blks
Dec 18 20:02:37 Tower root:          =                       sectsz=512   attr=2, projid32bit=1
Dec 18 20:02:37 Tower root:          =                       crc=1        finobt=1, sparse=1, rmapbt=0
Dec 18 20:02:37 Tower root:          =                       reflink=1    bigtime=1 inobtcount=1
Dec 18 20:02:37 Tower root: data     =                       bsize=4096   blocks=488378638, imaxpct=5
Dec 18 20:02:37 Tower root:          =                       sunit=0      swidth=0 blks
Dec 18 20:02:37 Tower root: naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
Dec 18 20:02:37 Tower root: log      =internal log           bsize=4096   blocks=238466, version=2
Dec 18 20:02:37 Tower root:          =                       sectsz=512   sunit=0 blks, lazy-count=1
Dec 18 20:02:37 Tower root: realtime =none                   extsz=4096   blocks=0, rtextents=0
Dec 18 20:02:37 Tower  emhttpd: shcmd (168): mkdir -p /mnt/disk5
Dec 18 20:02:37 Tower  emhttpd: shcmd (169): blkid -t TYPE='xfs' /dev/md5 &> /dev/null
Dec 18 20:02:37 Tower  emhttpd: shcmd (169): exit status: 2
Dec 18 20:02:37 Tower  emhttpd: shcmd (170): blkid -t TYPE='btrfs' /dev/md5 &> /dev/null
Dec 18 20:02:37 Tower  emhttpd: shcmd (170): exit status: 2
Dec 18 20:02:37 Tower  emhttpd: shcmd (171): blkid -t TYPE='reiserfs' /dev/md5 &> /dev/null
Dec 18 20:02:37 Tower  emhttpd: shcmd (171): exit status: 2
Dec 18 20:02:37 Tower  emhttpd: /mnt/disk5 mount error: Unsupported or no file system
Dec 18 20:02:37 Tower  emhttpd: shcmd (172): umount /mnt/disk5
Dec 18 20:02:38 Tower root: umount: /mnt/disk5: not mounted.
Dec 18 20:02:38 Tower  emhttpd: shcmd (172): exit status: 32
Dec 18 20:02:38 Tower  emhttpd: shcmd (173): rmdir /mnt/disk5
Dec 18 20:02:38 Tower  emhttpd: shcmd (174): sync
Dec 18 20:02:38 Tower  emhttpd: shcmd (175): mkdir /mnt/user0
Dec 18 20:02:38 Tower  emhttpd: shcmd (176): /usr/local/sbin/shfs /mnt/user0 -disks 30 -o default_permissions,allow_other,noatime  |& logger
Dec 18 20:02:38 Tower  emhttpd: shcmd (177): mkdir /mnt/user
Dec 18 20:02:38 Tower  emhttpd: shcmd (178): /usr/local/sbin/shfs /mnt/user -disks 31 -o default_permissions,allow_other,noatime -o remember=0  |& logger
Dec 18 20:02:38 Tower  emhttpd: shcmd (180): /usr/local/sbin/update_cron

 

Fix common problems is also reporting this for every share on the drive, somewhat expected i think.

 

image.thumb.png.187f69a908d9817de9b24e6d8a47c242.png

Edited by mattw
More info
Link to comment

You might still be able to recover, at least some data, it will depend on the amount of damaged done to parity and disk3, you can try to recover data from both the old disk3 and the emulated disk3 and see if one is better than the other.

 

When you are done with the backups first restore the array:

 

-Tools -> New Config -> Retain current configuration: All -> Apply
-Check all assignments and re-assign parity do the correct slot
-IMPORTANT - Check both "parity is already valid" and "maintenance mode" and start the array (note that the GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the checkbox, but it won't be as long as it's checked)
-Stop array
-Unassign disk3
-Start array (in normal mode now), and post the diagnostics.

Link to comment
5 hours ago, JorgeB said:

You might still be able to recover, at least some data, it will depend on the amount of damaged done to parity and disk3, you can try to recover data from both the old disk3 and the emulated disk3 and see if one is better than the other.

 

When you are done with the backups first restore the array:

 

-Tools -> New Config -> Retain current configuration: All -> Apply
-Check all assignments and re-assign parity do the correct slot

This tool seems to get me in more trouble...  Right now, I have the parity disk as disk 5.  The server says the array is stopped.  So, do I move the disk 5 to the parity slot before running new config or after?  It seems to me if I move the disk 5 it is going to tell me that it is missing and that will be a problem.  Also in the current state I do not have the "parity is already valid" and "maintenance mode" boxes.
-IMPORTANT - Check both "parity is already valid" and "maintenance mode" and start the array (note that the GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the checkbox, but it won't be as long as it's checked)
-Stop array
-Unassign disk3
-Start array (in normal mode now), and post the diagnostics.

image.thumb.png.dc21e2777ac90a7b249fe85285941dd3.png

image.thumb.png.b6fe1e39bbe58240d38e502b74c62b84.png

Link to comment

Ok, so I did the new config and assigned all.  Went back to the array screen and it will not let me unassign disk 5.  When I tell it "no device" it just leaves it as is.

 

I have attached my current diags.  So, I am a network engineer for a major university and have been for 32 years, but a server guy I am not!  I feel really dumb at the moment.  Why does my array and shares appear to be online but the dashboard tells me that it is stopped?  This is my array options screen, no option for valid parity or to start or stop the array.

 

tower-diagnostics-20221219-2105.zip

 

image.thumb.png.744f087e0691d39f032ea6ca3f8d7d16.png

Edited by mattw
More info.
Link to comment

Ok, different browser which has never even connected to the server...  New config, selected all and applied.  Back to main and tried to reassign drive 5 and when I tell it "no device" it will not remove disk 5 no matter what.  It still insists my array is offline, but I can still get to all of my shares.

 

I am still getting nightly emails from it telling me it is healthy...?

image.png.37ce63680ef155a35a227e6c294c10aa.png

 

 

tower-diagnostics-20221220-0933.zip

Edited by mattw
Link to comment

Ok, reboot has been done and new drive config is in place, system was set to boot into maintenance mode and parity is marked valid.  Should I start the array?  How do I tell if I am in maintenance mode?  Sorry to be so dense.

 

So, if I am following correctly... I should check maintenance mode and proceed as below?

 

IMPORTANT - Check both "parity is already valid" and "maintenance mode" and start the array (note that the GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the checkbox, but it won't be as long as it's checked)
-Stop array
-Unassign disk3
-Start array (in normal mode now), and post the diagnostics.

 

image.png.47eddc28dade3da3ad2eb3f27ed24d45.png

image.png.1ccb206c57cae5b3ee277ee4ea8de6c3.png

image.thumb.png.8a53767cd7966e7878b43e544d97a61b.png

tower-diagnostics-20221220-1151.zip

Edited by mattw
Link to comment

I have tried with Firefox (my normal browser) and with Microsoft Edge.  Both seem to yield the same results.  BTW, just did a reboot as the only option in the "Array Operations" tab was to reboot or shutdown.  I can't run it on my phone, S10+, does not render well enough to trust pushing buttons with my old eyes.  This is so frustrating, had this server running for years on 5.0 until my key died and life was in the way and I could not take enough time to troubleshoot it.  Then adding cache drive and ram and my lack of abilities got me to this point.

 

I will quit using Firefox during this process, it must have real issues with Unraid. 

 

After the reboot from Edge, I have the option to start the array and to enable maintenance mode when I do it.  The stale config message is now gone.

Edited by mattw
Link to comment

So, following the above guide...

If the file system is XFS or ReiserFS (but NOT BTRFS), then you must start the array in Maintenance mode, by clicking the Maintenance mode check box before clicking the Start button. This starts the unRAID driver but does not mount any of the drives.

 

So, I am in maintenance mode as requested in the doc.

You should see a page of options for that drive, beginning with various partition, file system format, and spin down settings. The section following that is the one you want, titled Check Filesystem Status. There is a box with the 2 words Not available in it. This is the command output box, where the progress and results of the command will be displayed. Below that is the Check button that starts the test or repair, followed by the options box where you can type in options for the test/repair command. The options box may already have default options in it, for a read-only check of the file system. For more help, click the Help button in the upper right.

 

The result of clicking the disk 3 gives me none of the options I would expect to see in the doc.  So, with an emulated drive how do I get to the menus I need to see?  

image.png.b547b31b31f5ab051d5e12f33bac139c.png

 

I do see the options on one of my installed and live drives.

image.thumb.png.91671a4e305892698b068ac0031d6328.png

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...