Ubuntu: A centralized apt package cache using squid-deb-proxy

ubuntuIt is common in secure production datacenters for internal hosts to be forced to go through a reverse proxy (e.g. Squid) for public internet access.  The same concept can be applied to apt package management, where setting up a centralized package proxy enables caching as well as security controls.

In a datacenter where you could have hundreds of host instances all needing the same package/kernel/security patch, having a cache of packages inside your network can save a significant amount of network bandwidth and operator time.

And just like an internet proxy that whitelists only specific domains, a package proxy can have a whitelist of apt repositories, as well as a blacklist of specific packages.

In this article we’ll go through installation and configuration of squid-deb-proxy, which is just a packaging of Squid3 with specific tunings for package caching.  Since most Security and Operations teams are familiar with Squid already, this makes it easier to get deployment approval versus other package caching solutions.

Install squid-deb-proxy

This package is available from the Ubuntu repositories, so installation is as simple as running apt-get, making sure we open the firewall for port 8000, which is where the service will be running.

$ sudo apt-get update -q
$ sudo ufw allow 8000/tcp
$ sudo apt-get install squid-deb-proxy -y

This installs a service for squid on port 3128 as well as squid-deb-proxy on port 8000, but we really don’t want the extra squid service so let’s disable it.

$ sudo service squid stop


# unlink service from /etc/rc.*
$ sudo update-rc.d -f squid remove

If you are on Ubuntu 16.04 with Systemd, then in addition to unlinking with update-rc.d above, also disable the service.

$ sudo systemctl disable -f squid.service

Now we can validate that only squid-deb-proxy is running and on port 8000 with the following commands:

$ ps -ef | grep squid-deb-proxy
$ netstat -an | grep "LISTEN "

Cached files are stored under a hierarchy in “/var/cache/squid-deb-proxy” as verified by the following:

$ grep cache_dir /etc/squid-deb-proxy/squid-deb-proxy.conf
cache_dir aufs /var/cache/squid-deb-proxy 40000 16 256

And service logs are located in “/var/log/squid-deb-proxy”.

Client validation

From the squid-deb-proxy host, first tail the logs so you can see the client actions that will be taking place:

$ sudo chmod go+r+x /var/log/squid-deb-proxy -R
$ cd /var/log/squid-deb-proxy
$ tail -f access.log cache.log store.log

From a separate OS instance, or even the same host if you want, configure apt so that it goes through squid-deb-proxy on port 8000 if it wants to download packages.

$ echo "Acquire::http::Proxy \"http://192.168.2.125:8000\";" | sudo tee  /etc/apt/apt.conf.d/00proxy

$ sudo apt-get update -q

From the client side (running ‘apt-get update’), everything should be transparent, and you will get the standard output showing the main Ubuntu repositories being access.

But from the squid-deb-proxy server side, you should see output from the access.log and store.log showing retrievals from repositories such as “archive.ubutu.com”.

Cache validation

Now let’s validate that squid-deb-proxy is caching packages.  We will work with the ‘curl’ package that comes from .ubuntu.com.  Here is how you check its repository source.

$ sudo apt-cache policy curl

Now uninstall the package, clear the local .deb cache of the package, and then reinstall – which will force squid-deb-proxy to fetch and cache it.

$ sudo apt-get purge curl -y && sudo apt-get clean; sudo apt-get install curl -y

In the console where you are still tailing the three logs, you should see something like:

==> store.log <==
1518045482.217 SWAPOUT 00 00000004 233A483398FC046AF8466B37C4ABB2D2 200 1518045483 1517438617 1518131883 application/x-debian-package 138468/138468 GET http://archive.ubuntu.com/ubuntu/pool/main/c/curl/curl_7.47.0-1ubuntu2.6_amd64.deb 

==> access.log <== 1518045482.218 115 192.168.2.125 TCP_MISS/200 138900 GET http://archive.ubuntu.com/ubuntu/pool/main/c/curl/curl_7.47.0-1ubuntu2.6_amd64.deb - HIER_DIRECT/91.189.88.152 application/x-debian-package

access.log is showing a TCP_MISS pull of “curl_7.47.0-1ubuntu2.6_amd64.deb” from the main repository of archive.ubuntu.com which means it was not present in either the memory or disk cache.

store.log indicates a SWAPOUT  of the package which means a save to disk of the curl package pulled from the main repository.

Let’s see if this is actually stored on disk like SWAPOUT indicates by looking for all files in the “/var/cache/squid-deb-proxy” folder hierarchy.

$ sudo find /var/cache/squid-deb-proxy/ -type f | sudo xargs ls -l

This will return any files from the cache, and show you their detailed sizes.  There should be one that almost matches the size indicated by access log (138900).   The reason it is not an exact match is that the local cache file contains all the HTTP response headers as well as binary data which you can verify by opening it up using vi or emacs.

Now let’s verify caching is activated by re-installing curl again.

$ sudo apt-get purge curl -y && sudo apt-get clean; sudo apt-get install curl -y

And now access.log will indicate a TCP_MEM_HIT meaning that the content was served directly from memory.

1518046777.909 0 192.168.2.125 TCP_MEM_HIT/200 138909 GET http://archive.ubuntu.com/ubuntu/pool/main/c/curl/curl_7.47.0-1ubuntu2.6_amd64.deb - HIER_NONE/- application/x-debian-package

Whitelisting repositories

By default, only the official Ubuntu repositories are whitelisted in “/etc/squid-deb-proxy/”.  If you add the OpenJDK ppa repository, then do an update you will get “Failed to fetch http://ppa.launchpad.net/openjdk-r/…”

$ sudo add-apt-repository ppa:openjdk-r/ppa -y
$ sudo apt-get update
$ sudo apt-cache policy openjdk

This does not work because ppa.launchpad.net is not yet whitelisted as a target repository.  Add this entry to “/etc/squid-deb-proxy/mirror-dstdomain.acl.d/10-default” and reload the configuration.

$ echo "ppa.launchpad.net" | sudo tee -a /etc/squid-deb-proxy/mirror-dstdomain.acl.d/10-default
$ sudo service squid-deb-proxy force-reload

$ sudo apt-get update

Now “sudo apt-get update” will correctly complete.  Depending on if you are running Ubuntu 14.04 or 16.04, the source of OpenJDK will come from the ppa site and installing it would cache the packages.

# in ppa for Ubuntu 16.04
$ sudo apt-cache policy openjdk-7-jdk

# in ppa for Ubuntu 14.04
$ sudo apt-cache policy openjdk-8-jdk

Blacklisting packages

Packages can also be blacklisted.  As an example, let’s stop the “curl” package from being downloaded.

$ echo "curl" | sudo tee -a /etc/squid-deb-proxy/pkg-blacklist.d/10-default
$ sudo service squid-deb-proxy force-reload

Then uninstall and attempt to install the “curl” package again.

$ sudo apt-get purge curl -y && sudo apt-get clean; sudo apt-get install curl -y

You will see a 403 forbidden fetch error from the console, and on the server side access.log you will see the same error.

==> access.log <==
1518050370.122 0 192.168.2.125 TCP_DENIED/403 4089 GET http://archive.ubuntu.com/ubuntu/pool/main/c/curl/curl_7.47.0-1ubuntu2.6_amd64.deb - HIER_NONE/- text/html

SSL Repositories and caching

Squid-deb-proxy does not have an issue downloading packages from  HTTPS enabled repositories, but it cannot cache those packages because it cannot read the content of the communication.

So if you are getting packages from an HTTPS enabled repository, it will not have the advantage of caching.

If you absolutely must have a caching solution for HTTPS, you could look into Apt-Cacher-NG which uses remapping where the URL looks like HTTP from the client host, but then the server  communicates via TLS to the repository.  This requires changes in each client machine’s “/etc/apt” sources.

 

REFERENCES

https://wiki.squid-cache.org/SquidFaq/OperatingSquid (managing squid cache)

http://www.squid-cache.org/Doc/config/refresh_pattern/ (squid refresh_pattern)

http://www.squid-cache.org/Doc/config/cache_replacement_policy/ (squid cache_replacement_policy)

http://www.comfsm.fm/computing/squid/FAQ-6.html (squid result codes: TCP_HIT, TCP_MISS, TC_MEM_HIT)

http://blog.neu.edu.cn/elm/archives/99 (squid file systems and comparisons for performance: UFS, AUFS, DiskD, COSS)

https://gist.github.com/mjs/bfcea2e87e9a603420d7b32d25704b65 (exxample squid-deb-proxy.conf)

https://www.midnightfreddie.com/using-squid-to-cache-apt-updates-for-debian-and-ubuntu.html (package cache, squid only)

https://hub.docker.com/r/muccg/squid-deb-proxy/~/dockerfile/ (docker container)