Python: TreeMap visualization of hierarchical Pandas DataFrame

python-logoThe Plotly graphing library has a wide array of visualizations for datasets.  And it has native support for Pandas DataFrame, which makes it convenient for datasets coming from a wide range of sources.

One visualization that I find particularly useful for hierarchical data is the TreeMap, which can group on data lineage, uses area as the primary data point, and then color of the area as a secondary facet.

Below is an example TreeMap from the plotly site that show human life expectancy grouped on continent/country, with the area representing total population of the country, and color denoting life expectancy.

Article overview

In this article, I will show you how to create a simple TreeMap with the plotly library using a Pandas DataFrame loaded with hierarchical data.

The dataset will be from a fictional pet store, which sells a variety of animals, grouped as mammals or reptiles.

Create Python3 virtual environment

Ensure you have the latest python3 and python3 virtual environment OS packages.

# make sure python3 venv is installed
sudo apt-get update
sudo apt-get install software-properties-common python3 python3-dev python3-pip python3-venv curl git -y

Then create the Python virtual environment where we will isolate the required modules.

# create directory for test
mkdir plotlytest
cd plotlytest

# create and enter virtual environment
python3 -m venv .
source bin/activate

Install Python modules into virtual environment

The modules and their dependencies need to be downloaded and isolated for use in the virtual environment.

# required modules
pip3 install pandas plotly kaleido

Example to build TreeMap of pet store sales

We will download and run a simple Python script I wrote for this article, ExamplePlotlyTreeMap.py3.

# grab my example script
wget https://raw.githubusercontent.com/fabianlee/ga4-pandas-py3-treemap/main/ExamplePlotlyTreeMap.py3
chmod +x *.py3

# run script
./ExamplePlotlyTreeMap.py3

Hierarchical Data for pet store

Running the script above will output the full DataFrame to the console.

  category  sales  sales_delta    name
0  reptile      1           -2   snake
1  reptile      2            0  lizard
2  reptile      4            5  turtle
3   mammal      8           10     dog
4   mammal      1           -3     cat

The ‘category’ field is the hierarchical parent lineage that allows the TreeMap to group animals together, in this case into either mammals or reptiles.

The ‘sales’ field is how many have been sold this month, and ‘sales_delta’ is how that sales count differs from last month (positive means more than last month, negative means less than last month).

TreeMap visualizations exported

Additionally, the script creates files named “/tmp/petstore-treemap.{png,html}”.

Below is what the static png image will look like.  You can open the html file with a browser to view an interactive treemap that has mouse-over values for each cell and click-into details.

This visualization shows that dogs are very popular in absolute values (large area), while the color show that cats and snakes may need more marketing because they are trending down from last month.

DataFrame creation

The pet store Pandas DataFrame is created as shown below.  We hard-coded for simplicity of this example, adding the reptiles first, and then the mammals.  But this could just as easily have been from a csv file, or any other data source.

# main DataFrame
df = pd.DataFrame()

# add reptiles to dataframe
df_newrows = pd.DataFrame({"category":["reptile","reptile","reptile"],"sales":[1,2,4],"sales_delta":[-2,0,5],"name":["snake","lizard","turtle"]})
# 'append' is being deprecated, so use pd.concat instead
#df = df.append(dfp,ignore_index=True)
df = pd.concat([df,df_newrows],ignore_index=True)

# add mammals to dataframe
df_newrows = pd.DataFrame({"category":["mammal","mammal"],"sales":[8,1],"sales_delta":[10,-3],"name":["dog","cat"]})
df = pd.concat([df,df_newrows],ignore_index=True)

TreeMap parameters

Then we define the variables to send into the plotly.treemap function.

  • Specify ‘df’ as our DataFrame variable
  • Use ‘path’ to define the hierarchical data lineage (category > name)
  • Use ‘sales’ to Set the area of the data point to mean total sales for the month
  • Use ‘sales_delta’ for the color of the area to show if sales are growing or shrinking from the previous month
 # define TreeMap
 fig = px.treemap(
    data_frame=df,
    path=['category','name'],
    labels='name',
    values='sales',
    color='sales_delta',
    color_continuous_scale='blues'
    )

TreeMap export

Finally, we export this TreeMap visualization to our local file system as an html diagram with mouse rollover, and then as a simple PNG image.

fig.write_html("/tmp/petstore-treemap.html")
fig.write_image("/tmp/petstore-treemap.png")

 

REFERENCES

plotly, treemap

plotly colorscales – blues, reds, etc.

github fabianlee, example TreeMap for petstore

python3 native virtual environments with venv