Multiple isues at once: new cache drive in RAID1, Docker not starting, disk emulation


Karyudo

Recommended Posts

Going to have to leave for a while again. Plenty of stuff on the web about Midnight Commander.

 

The main thing you need to know is the path to your disks. The array disks are /mnt/disk1, /mnt/disk2, etc. and the Unassigned Device is going to be /mnt/disks/whatever-you-mount-it-as

 

Maybe someone else will pick this thread up before I get back. I will return later tonight.

Link to comment

I've got mc up and running, and I can find the drive (and array folders). What I can't figure out how to do is to select the other pane, or do any of the commands. One Linux video shows the guy doing it with the mouse, which doesn't work. And no key combination I've tried works, either. :/

 

Figured out my mistake: I ran mc from the unRAID terminal. Once I used bitvise (already had this installed, but didn't have PuTTY), then the mouse worked.

 

Thanks again for your continued help! Feels like I'm making progress, and learning a ton in the process.

Edited by Karyudo
Link to comment

Yeah, I thought Function keys were... key... to starting the actions at the bottom of the screen (e.g. "Copy"), but they didn't work when I used the internal Terminal to invoke mc. Everything's ticking along nicely in Bitvise/PuTTY, though. Few more hours of curated copying, and I should be in a position to start a rebuild. Which hopefully will get about halfway done overnight.

Link to comment

OK, I got all the important stuff backed up. I also shucked the new drive. I tried to run SMART diagnostics on a previously-iffy parity drive in Linux Mint (using GSmartControl) and it wouldn't run any tests. That seems a bit dire, but that's a problem for another day. I guess I'll use the Disk 8 that's temporarily out of service and the new drive to rebuild.

 

I went to stop the array (so I can add the new drive and remove Disk 1), and now there's a new thing I haven't seen before: it's taking forever (more than 10 minutes, at least) on "Array Stopping • Retry unmounting user share(s)..." and not finishing.

 

What fresh hell is this?!

 

Link to comment

I waited another 15 minutes or so, and then rebooted. The 4TB backup drive was missing from Unassigned Devices. I replaced the SATA cable and plugged it back in, and it came up. I copied off a few new files from the recent Drive 8, and then unmounted it. I went to unmount the backup drive, but it won't unmount!

 

I fired up mc again, and didn't see any background jobs pending. (I chose 'background' for some of the earlier copying, too, so I could queue some stuff for copying.)

 

Why won't unRAID allow this drive to be unmounted from Unassigned Devices? (Diagnostic zip attached.)

shinagawa-diagnostics-20180903-2224.zip

Link to comment

It occurred to me this morning before logging in that I have turned you loose with a dangerous weapon (mc) without giving you a very important warning.

 

Your user shares are at /mnt/user. It is very important that you not mix user shares and disks when moving or copying files.  No mixing paths starting /mnt/user/sharename and paths starting /mnt/disk# or /mnt/cache, either with mc or from the command line. See here for more details:

 

https://forums.unraid.net/topic/32836-user-share-copy-bug/

 

Link to comment

I've heard about that "don't cross the streams" warning before (I think maybe it was in a Spaceinvader One video?), but it was never relevant before... and the thread you've linked to gets into the weeds awfully quickly. What I did was to copy (only! no writes) from a user share (somehow I have two: user and user0?) to the Unassigned Device-mounted drive, which is a drive that has never been part of the unRAID array. I should be OK, then, right?

 

As for PuTTY sessions, no, I don't still have a session open. Let me guess what you're thinking: I used 'background' to copy something, then closed the session before the background copy finished, and that's holding the drive hostage? Something like that?

Edited by Karyudo
Link to comment
1 hour ago, Karyudo said:

What I did was to copy (only! no writes) from a user share (somehow I have two: user and user0?) to the Unassigned Device-mounted drive, which is a drive that has never been part of the unRAID array. I should be OK, then, right?

Of course, copy is a write, but since it is to a disk outside the "fusion" of the user shares then it's OK. Just be aware that cache is part of the user shares just as the array disks are so that warning also applies to mixing cache and user shares.

 

1 hour ago, Karyudo said:

As for PuTTY sessions, no, I don't still have a session open. Let me guess what you're thinking: I used 'background' to copy something, then closed the session before the background copy finished, and that's holding the drive hostage? Something like that?

What I really had in mind is a terminal, console, putty, telnet, ssh, etc. session with a folder on a disk as the current working directory. As in, if you asked for a directory listing without specifying a path, which folder would it list. And I assume if you had mc still open it probably has a working directory open in each pane.

 

Link to comment

Another bit of info. user0 is the user shares excluding any files on cache. So it is only the "fusion" of the top level folders from all the array disks, while user includes the cache disk. If you actually know how the user shares work the generic warning about mixing them doesn't necessarily apply. You just have to be aware that your destination path is definitely not the same as the source path after the user shares fuse things.

 

Mover is probably the main reason user0 exists. The mover makes use of user0 by moving cache-yes shares from cache to user0 and cache-prefer shares from user0 to cache.

Link to comment

Yeah, by "copy" I meant "copy from," which is read-only from the user share's point of view. Good point about the cache, though: I'm not sure I'd have known that was part of the user shares! And thanks for the explanation of user0.

 

I've rebooted my Linux Mint machine, and also re-opened and immediately exited the unRAID "Terminal" button, and the drive still won't unmount. I also checked Windows Task Manager on my "main" PC, and Bitvise SSH was still running in the background. So I killed that, and... still won't unmount.

 

I'm about to reboot my Windows machine. Wish me luck... (I'll be back with an edit). Aaand... nope. Still won't unmount.

 

How do I move forward from here? I want to use the drive that won't unmount as one of the new drives for the rebuild of Disk 1 and Disk 8.

Edited by Karyudo
Link to comment

You have a cable or controller problem. The controller keeps resetting the SATA link:

Sep  3 22:25:03 Shinagawa kernel: ata14: hard resetting link
Sep  3 22:25:03 Shinagawa kernel: ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Sep  3 22:25:03 Shinagawa kernel: ata14.00: configured for UDMA/66
Sep  3 22:25:03 Shinagawa kernel: ata14: EH complete

It's your Unassigned Devices disk. The controller has lost communication with it so it can't be unmounted when you try to shut down, hence the lock up.

Link to comment

Ah. So... how to proceed? Try a clean power down? And if that fails... hard power off?? (And then try a different port, surely?)

 

I was already thinking maybe I should get a new controller, since most of the recent problems seem to have been with drives attached to the Vantec controller and not the Supermicro mobo ports. Any "known great" controllers? Or, more importantly, "definitely avoid" PCI-E cards? (I'm sure there are other threads for this, but if there's a quick answer that'll save a bunch of hunting, it would be much appreciated.)

 

Yikes: Just checked Amazon, and the reviews for the Vantec card I've got (4 channel, 6 port, UGT-ST644R) are pretty bad. People are complaining about drives being randomly disconnected. So maybe I'll be adding this card to the "definitely avoid" list.

Edited by Karyudo
Added info about Vantec UGT-ST644R
Link to comment
17 minutes ago, Karyudo said:

Any "known great" controllers? Or, more importantly, "definitely avoid" PCI-E cards?

 

If you want a simple two port controller that fits in a PCIe x1 slot get something that used an ASMedia controller, like an ASM1061 or 1062 which works out of the box. If you want an eight port controller that fits in a PCIe x8 slot get something with an LSI/Avago/Broadcom controller. The Dell Perc H310 is a popular choice but you have to cross flash the firmware, which can be a headache for some people. Avoid cards that use Marvell controllers.

 

Edited by John_M
x8
Link to comment

After a brief search, I think I'll go for several 2-port controllers. My mobo has 10 on-board SATA ports, and seven PCI-E slots, so the low port-per-card density isn't a problem.

 

Of course, looks like there's nothing I can get today or tomorrow that's not Marvell.

 

Can I / should I stop ignoring the cache problem, so I can recover a port by going back to a single SSD? The extra drive is a millstone around my neck: it's taking a SATA port and a drive slot I could use to get all the array drives off the faulty (?) controller. And, anecdotally, I didn't have any trouble until I added another drive (up from 2 to 3) on the Vantec controller.

Link to comment

Sorry for making you re-post that... I recall it's a link on Page 1, too.

 

I guess my real questions are:

• Is it OK to do a clean shutdown, even if the UD drive can't be unmounted? Or do I have to do something else first?

• Is my reasoning on wanting to do the cache drive removal now sound? To me it makes sense, but I think we've already established that I don't know what I'm doing / have Murphy-class luck.

• Do you forsee any problems with removing the cache drive, then shutting down, re-cabling to remove dependence on a likely-faulty controller, and starting up again?

Link to comment
12 minutes ago, Karyudo said:

Is it OK to do a clean shutdown, even if the UD drive can't be unmounted? Or do I have to do something else first?

 

If the UD drive can't be unmounted you won't get a clean shutdown. The best you can do is try and if it fails you'll need to force it. Before you do though, how far have you got with recovering your data? This thread is long and complicated with a lot of twists and turns and a few tracks off to the side. You were copying data from two emulated disks onto a UD drive so you're heavily reliant on all remaining array disks being perfectly readable. It seems like you've already forced an unclean shutdown and you can't continue as things are so shut down as cleanly as you can and remove the UD drive then shrink your cache and use the freed up port to reinstate the UD drive so you can continue rescuing your data.

 

31 minutes ago, Karyudo said:

Do you forsee any problems with removing the cache drive, then shutting down, re-cabling to remove dependence on a likely-faulty controller, and starting up again?

 

That seems like the best way to move forward from where you currently are.

  • Upvote 1
Link to comment

I'm happy with the data I got off the emulation. I got personal data, photos and stuff like that all squared away. So I'm not freaking out (anymore).

 

I forgot I had another couple of 2-port SATA cards in another box that I had previously been using with WHS. So I might use those to host two or three of the drives. Unfortunately, they're Marvell-based, too. But fingers crossed.

 

Thanks for your reasoned opinion. I'm going to think things through one more time, write down the plan, and then see how things go. (C'mon, unRAID, justify my dual parity drives!)

Link to comment
9 minutes ago, Karyudo said:

Unfortunately, they're Marvell-based, too. But fingers crossed.

 

If you insist on using them you might want to disable IOMMU (otherwise known as VTd) in your BIOS. It could possibly save you some grief... or it might not. You won't be running any VMs until you've sorted your server out anyway.

 

 

Link to comment

Oh. That sounds quite a bit worse than what I sort of expected the Marvell problem to be. I don't use virtualization, but I also don't have a monitor and keyboard & mouse set up near the server in order to be able to fiddle with the BIOS, so I figured I'd check again whether I could get some better SATA cards. Turns out the "SYBA (I/O Crest) SATA III 2 Internal 6Gbps Ports PCI-e Controller Card (SY-PEX40039)" has the ASM1061 chipset, they're CA$21 each on Amazon.ca, and they'll be here tomorrow. So... ordered.

 

Once again, thank you for your patience and persistence in helping me solve this mess.

Link to comment

The rebuild has completed successfully. Thank you, John_M and trurl! I appreciate your patience and education.

 

I used SATA ports only on the mobo and ASM1061 controllers. I re-used Disk 8 as Disk 8, and a newly-shucked drive for Disk 1. The original Disk 1 is in Unassigned Devices. In the end, I didn't mess with pulling the 4TB drive out of the cache pool. Yet—that's coming up shortly.

 

Next steps are to move torrents from the cache to an Unassigned Devices drive (using Midnight Commander should be OK, right?), move stuff that should never have been on the cache drive to the array where it belongs (should I use MC for this, or can I make Mover do it?), re-enable Docker, and get back to normal operation.

 

 

ScreenClip3.png

shinagawa-diagnostics-20180906-1623.zip

Edited by Karyudo
Link to comment
2 hours ago, Karyudo said:

I didn't mess with pulling the 4TB drive out of the cache pool. Yet—that's coming up shortly.

 

Next steps are to move torrents from the cache to an Unassigned Devices drive (using Midnight Commander should be OK, right?), move stuff that should never have been on the cache drive to the array where it belongs (should I use MC for this, or can I make Mover do it?), re-enable Docker, and get back to normal operation.

I think you have this exactly right. Get data out of cache before you mess with the pool.

 

Midnight Commander is good for moving stuff to your Unassigned Devices.

 

Mover will move any cache-yes shares from cache to the array so you just need to make that setting and run mover. Then you can decide if and how you want any of your shares to use cache. I don't remember if I already gave this link but it is worth some study:

 

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?page=2#comment-537383

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.