Docker: determining container responsible for largest overlay directories

Whether you are running a docker daemon on a development host or a GKE worker node using Docker as the container engine, it is important to understand the amount of disk storage being utilized by the containers.

If you navigate into the ‘/var/lib/docker/overlay2’ directory, you will  see cryptic hashed ids representing the containers layers instead of a human readable name.  This can be resolved using docker inspect and some scripting.

# as root
sudo su

# grab the size and path to the largest overlay dir
du /var/lib/docker/overlay2 -h | sort -h | tail -n 100 | grep -vE "overlay2$" > large-overlay.txt

# make sure json parser is installed 
apt-get install jq -y 

# construct mappings of name to hash
docker inspect $(docker ps -qa) | jq -r 'map([.Name, .GraphDriver.Data.MergedDir]) | .[] | "\(.[0])\t\(.[1])"' > docker-mappings.txt

# for each hashed path, find matching container name
cat large-overlay.txt | xargs -l bash -c 'if grep $1 docker-mappings.txt; then echo -n "$0 "; fi'

REFERENCES

docker inspect

stackoverflow, using docker inspect to find hashes for container

stackoverflow, args with multiple arguments

stackoverflow, running multiple commands with xargs