[DOCKER CONTAINER] DUC - Disk Usage Charts (and duplicate file finding!) - Docker Containers

January 7, 201511 yr

Over the holidays, I built my first unRAID server and setup the CrashPlan docker container. I was interested in learning more about docker AND wanted to interactively browse my array for organization and to reduce duplicates.

I stumbled upon DUC which creates awesome interactive Disk Usage Charts like this:

I've created a DUC (and apache) container based on the suggested phusion baseimage, and created a template repository and template so that you can easily install it in unRAID.

I'm looking for others to test my container and provide feedback. I'm still working to reduce its size and to add more features (I've forked DUC, and will do a pull request for my changes once I'm complete)

PLEASE CONSIDER THIS EXPERIMENTAL AND A WORK IN PROGRESS.

My template repo: https://github.com/digitalman2112/docker-templates

(Update: the above URL didn't work for someone, but the following did work: https://github.com/digitalman2112/docker-templates/tree/master/digitalman2112)

My duc-docker build repo: https://github.com/digitalman2112/duc-docker

My DUC fork repo: https://github.com/digitalman2112/duc

My Docker Hub page: https://registry.hub.docker.com/u/digitalman2112/duc/

When installing into unRAID, map your array (or a portion of) to /data (this is the default if you use the docker tab and the template) - It will default to READ ONLY, as there is no need for the container to have write access to the data for indexing.

You also need to map port 80 from the docker container to a host port. The template defaults this to 2112. (updated port based on feedback)

Once started, you can access the DUC web page with interactive charts by using the Web UI menu option for the running container. This will work if you leave the port option set to 2112, or you can remap it as you see fit and visit http://<container_ip>:80/cgi-bin/duc.cgi

You will need to start an index operation by visiting the web interface and clicking reindex (be patient, it doesn't update the page while doing the reindex...yet). You can add additional indexes or try my new duplicate file utility in duc by starting bash in the running container with docker exec. If you do this, look at the duc command line options, and be sure to specify the database location like this: -d /duc/duc.db This is no longer required as I changed the HOME dir for the container to /duc and it will automatically place the .duc.db there now

Given that the container has access to your data, strongly suggest keeping it internal to your network.

I'll be adding the duplicate file functions to the CGI in the coming weeks to avoid the need for command line use, if you are really interested, ping me and I'll get you started with it. It is a matching based file attibutes (name, extension, size) NOT a hash / crc match, so it is very fast, but you still need to validate the files are actually duplicated. I also have a mode that finds duplicate FOLDERS - which is extremely handy for finding mass sets of duplicate photos.

Any & all feedback welcomed as I'm new to unRAID, new to docker, and new to duc - but willing to learn

Ian

(Edits due to build updates)

Quote

January 7, 201511 yr

Author

Template developers: I've been unable to get the icons to show up properly in my docker dashboard - any idea what I've done wrong?

Quote

January 7, 201511 yr

Look forward to trying this.

Any particular reason you decided on defaulting to port 8080? That port is already used by a very popular addon with a long history here. Search for unMenu thread for more details.

Quote

January 7, 201511 yr

Author

Any particular reason you decided on defaulting to port 8080? That port is already used by a very popular addon with a long history here. Search for unMenu thread for more details.

Total ignorance on my part. I'll change it to something else and update the template later tonight. Thanks for the tip!

Quote

January 7, 201511 yr

You shouldn't have to rebuild it, we can change the host port when we set it up, or you can just change your XML. Really nice job! Only small thing... most of us have updated to phusion *.15

Quote

January 7, 201511 yr

Author

You shouldn't have to rebuild it, we can change the host port when we set it up, or you can just change your XML.

Yeah, I'm a little slow to remember all the moving parts. Much easier to just change the xml.

Only small thing... most of us have updated to phusion *.15

Ok, THAT will be a rebuild.

Quote

January 8, 201511 yr

Thanks OP, I added it

Nice graphic.

Quote

January 8, 201511 yr

Author

Updates based on feedback:

1) Updated to phusion *.15

2) Default port mapping changed to 2112

3) Removed the auto-indexing on startup. I'd added that before I added the abiity to trigger an index from the website, and you may not want a big index job at startup.

4) Updated template xml description with some getting-started info.

5) Redirected from webroot to the cgi-script to save a little typing & help new users find it easier

6) Combined some commands in the Dockerfile to reduce # of layers (and removed some old commented-out commands)

Also: I moved the icons to imgur as I saw in another template - but still not getting working icons for some reason....Are others seeing the icons?

Quote

January 8, 201511 yr

Icon work, Banner Work.

Thanks.

Quote

January 8, 201511 yr

Wow! Cool stuff! Will try this out tomorrow myself!!

Quote

January 8, 201511 yr

Working great! Icon and banner working for me also.

2112 - Rush fan?

Quote

January 8, 201511 yr

Working great! Icon and banner working for me also.

2112 - Rush fan?

Any good human being is a Rush fan

Quote

January 8, 201511 yr

Author

Working great! Icon and banner working for me also.

Funny that they don't work for me...oh well

2112 - Rush fan?

Any good human being is a Rush fan

At least the intelligent ones

Quote

January 8, 201511 yr

Author

If anyone is interested in the duplicate file finding functions I've been adding (still a work in progress, but tests out ok on my data) - here's a screenshot to give you an idea of what it does:

duc has a number of command-line utils, in this case I'm calling duc dup (which I added) with:

--database to specify the location of the duc index database (in my container it is at /duc/.duc.db) (NOTE: As of 1/9/2015, you no longer need to specify the database - and the location shown in the image is now incorrect.)
--megabytes to specify the minimum file or folder size to use for comparisons. I like to work on the biggest items first, and reduce the noise. This also makes it run extremely fast.
-f for folderscan (only compare folders, not files.)
the path to scan. In this case /data. Note, that this has to be an indexed path. In my container, we map external data to /data. You can specify a subpath of that index, or if you've used duc index at the command line to index another path, you can specify it. The key is to remember that the duc dup command works on the EXISTING DUC INDEX, it does not read the disk directly.

Note that this scan took .16 seconds The same scan with --megabytes=1 returns 539 matches, and takes .44 secs.

All candidate matches are returned as a row (I left a few examples in the screenshot), and then a summary table is listed below. If you enable other match types, you will see the match type on the left - and a summary of matches by those types at the end of the scan.

Match types are:

Name + Size: always enabled
Extension + Size: enable with -e (not valid with folderscan -f)
Size: enable with -s
Name: enable with -n

You can also enable case insensitivity with -i.

At some point, I'll add this functionality to the web interface for DUC so that you don't need to docker exec into the container for dup - but until that time, it is available on the command line.

Until then - here's how I run it.

[*]SSH into your unRAID box

[*]Start a bash shell in the DUC container using: docker exec -it DUC /bin/bash (This assumes you've used the template and the container is called DUC - you can see the name on the unRAID docker tab

[*]Use the command line instructions above to run dup scans - don't forget that database option or you will likely get a "Database corrupt and not usable" message.

[*]If you remove duplicates, don't forget you'll need to rerun the index command (on the command line, or in the web interface) to get updated results. To reindex /data from the command line in the container bash shell, you'd use the command duc index /data

[*]Exit the bash shell from the docker when you are done and then close the ssh session to your unRAID server

NOTE: This duplicate file finding does NOT use a CRC / Hash to compare files. It is returning duplicate CANDIDATES, and you need to personally validate that they are duplicates!

Please alert me if something doesn't work, or if these instructions need to be improved

If you want help getting started, just ping me.

Quote

January 8, 201511 yr

Thanks! Now I don't have to use DaisyDisk on my mac to do this.

Quote

January 9, 201511 yr

Author

Updated container (and xml template) published.

New parameters that can be set when installing the container:

-m or --maxlevels Max # of levels shown in chart (web) - defaults to 5
-p or --pixels Size of the chart (web) in pixels - defaults to 1000px
-l or --list Include a directory listing with the chart (web) on / off - defaults to on
-i or --index Ability to trigger an index operation from the web page can be turned off - defaults to on

These are specified TOGETHER as one environment variable named DUC_CGI_OPTIONS

width=700 http://i.imgur.com/lUB0ibR.png[/img]

Other changes:

Web pages always show the full path now (before if you drilled in, the url bar showed the previous path, plus some numbers)
The indexing page now sends back some info and asks you to be patient - at least you know it is working
Fixed a pointer error that was showing up in the apache logs

Quote

January 10, 201511 yr

Author

A new container push with updated DUC features:

I also changed the HOME for the container to /duc which means that you no longer need to specify the database on any duc command line calls. I edited my past posts to reflect the change.

Quote

January 14, 201511 yr

Thanks for this docker - works nice for me - also love the dup function - wish for a user interface but can deal just fine with the command line.

Quote

January 15, 201511 yr

I mapped /data to /mnt/user to get all my user shares, is that the basic way?

It loads, but just sits at /data with 0 0 0 listed (after hitting reindex).

Otherwise, maybe i'm not waiting long enough, whats the "time to index" something like 6TB ? Like 10 mins, or 2 hours or what?

Looking forward to looking it over, and really want to hit up the duplicates feature, see if my "dumping ground" folder has gone crazy or not.

Quote

January 15, 201511 yr

I mapped /data to /mnt/user to get all my user shares, is that the basic way?

It loads, but just sits at /data with 0 0 0 listed (after hitting reindex).

Otherwise, maybe i'm not waiting long enough, whats the "time to index" something like 6TB ? Like 10 mins, or 2 hours or what?

Looking forward to looking it over, and really want to hit up the duplicates feature, see if my "dumping ground" folder has gone crazy or not.

It will take a while, I never timed mine but with 14TB I'm pretty sure it was only a few minutes.

Quote

January 15, 201511 yr

With 9TB it only took like 2-3 minutes for me.

Quote

January 15, 201511 yr

Starting index of: /data

This operation can take minutes on large paths, please be patient.

The indexing should continue even if this window is closed, however there will be no notification.

Indexed 374819 files and 26413 directories, (4.6TB total) in 15.04 secs.

Quote

January 17, 201511 yr

This operation can take minutes on large paths, please be patient.

The indexing should continue even if this window is closed, however there will be no notification.

Indexed 21005 files and 1752 directories, (265.8GB total) in 01 minutes, and 31.77 seconds.

Hmm, I guess I need more cpu power/faster hdds?

Quote

January 17, 201511 yr

My file structure is pretty boring (almost ALL movies), running this did reveal an interesting item... a "missing" movie that I overlooked when they all got converted to MKV for Plex (still in mts format). Denzel thanks you

Quote

January 17, 201511 yr

started my docker 120min and referenced /mnt/user and still cannot see anything, but I have about 37tb of data, so I assume this is expected? Is there a log I can check if it's still running? AFter 6 hours it still is not displaying anything. Now it has been over 9 hrs and still nothing is displayed. So something must have gone wrong somewhere.

Quote

[DOCKER CONTAINER] DUC - Disk Usage Charts (and duplicate file finding!)

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)