Python: Querying JSON files with JSONPath using jsonpath_rw_ext

python-logoThere are many tools that utilize json, and when it is relatively simple you can use standard modules or even custom coding to pull out the desired portions.  But once these data structures reach a certain level of complexity you really should consider a Python module that implements JSONPath (analogous to xPath for XML).

The ability to query JSON using JSONPath can be done with Python modules such as jsonpath_rw and jsonpath_rw_ext  which gives us the ability to easily specify objects deep in the graph.

Prerequisite modules

You will need to use pip to install the ‘jsonpath_rw’ and ‘jsonpath_rw_ext’ modules.

pip install jsonpath-rw jsonpath-rw-ext --user

Or if you want to install these libraries in a virtualenv.

virtualenv jsonpath
source jsonpath/bin/activate
pip install jsonpath-rw jsonpath-rw-ext

Example JSON

I am going to borrow the JSON used in the Mozilla tutorials, that defines a superhero squad and a list of heroes on that team.  Here is a github link to this json.

{
  "squadName": "Super hero squad",
  "homeTown": "Metro City",
  "formed": 2016,
  "secretBase": "Super tower",
  "active": true,
  "members": [
    {
      "name": "Molecule Man",
      "age": 29,
      "secretIdentity": "Dan Jukes",
      "powers": [
        "Radiation resistance",
        "Turning tiny",
        "Radiation blast"
      ]
    },
    {
      "name": "Madame Uppercut",
      "age": 39,
      "secretIdentity": "Jane Wilson",
      "powers": [
        "Million tonne punch",
        "Damage resistance",
        "Superhuman reflexes"
      ],
      aliases: [ "Lady Destructo", "Little Red" ]
    },
    {
      "name": "Eternal Flame",
      "age": 1000000,
      "secretIdentity": "Unknown",
      "powers": [
        "Immortality",
        "Heat Immunity",
        "Inferno",
        "Teleportation",
        "Interdimensional travel"
      ],
      aliases: [ "Power Flame", "Red Roaster" ]
    }
  ]
}

Program Run

I will go into further explanation, but for now let’s just run the program and look at the type of information being pulled out of the JSON above.  Here is the link to the Python script on github.

(jsonpath) $ python SquadTestJSONPath.py 

Squad: Super hero squad

Members:
Molecule Man
Madame Uppercut
Eternal Flame

Count members in list: 3
Count members using len extension: 3

Powers of Madame Uppercut
Madame Uppercut has the power of Million tonne punch
Madame Uppercut has the power of Damage resistance
Madame Uppercut has the power of Superhuman reflexes

Aliases?
Madame Uppercut has aliases: [u'Lady Destructo', u'Little Red']
Eternal Flame has aliases: [u'Power Flame', u'Red Roaster']

Does anyone have an alias that contains 'Red'?
Madame Uppercut has alias that contains 'Red', [u'Lady Destructo', u'Little Red']
Eternal Flame has alias that contains 'Red', [u'Power Flame', u'Red Roaster']

Which specific aliases contain the word 'Red'?
Alias that contains 'Red': Little Red
Alias that contains 'Red': Red Roaster

Any JSON library would allow us to pull this type of information, but hopefully as you how the JSONPath written, it will become evident this could be very useful in larger data structures.

Program setup

At the top, we import ‘jsonpath’ from the basic jsonpath_rw. And then ‘parse’ from the jsonpath_rw_ext, which gives us the extensions in functionality (filters, etc.)

from jsonpath_rw import jsonpath
from jsonpath_rw_ext import parse

# for calling extended methods
import jsonpath_rw_ext as jp

Read JSON from disk

# read file from disk
json_file = open("squad.json")
json_data=json.load(json_file)

Squad name, simple top level element

For a very simple query, we just ask for a simple top level element value.

print("Squad: {}".format( jp.match1("squadName",json_data) ) )

Members by name, element in each array item

for name in jp.match("$.members[*].name",json_data):
print(" {}".format(name))

Member count, len extension on array

# use extensions to provide direct count of number of members in array
print("Count members using len extension: {}".format( jp.match1("$.members.`len`",json_data ) ))

Powers of particular member, embedded array

lookFor="Madame Uppercut"
print("\nPowers of {}".format(lookFor))
powers = jp.match1("members[?name='" + lookFor + "'].powers",json_data)
for power in powers:
  print(" {} has the power of {}".format(lookFor,power))

Lookup array element where embedded structure has matching word

# find only array items that have nested element
print("\nDoes anyone have an alias that contains 'Red'?")
memberHasAliases=jp.match("members[?(aliases[*]~'.*Red.*')]",json_data)
for member in memberHasAliases:
  print("{} has alias that contains 'Red', {}".format( member['name'],member['aliases'] ))

Find nested array items that contain word

print("\nWhich specific aliases contain the word 'Red'?")
for thisalias in jp.match("members[*].aliases[?(@~'.*Red.*')]",json_data):
  print(" Alias that contains 'Red': {}".format( thisalias ))

REFERENCES

jsonpath-rw github

jsonpath-rw-ext project

jsonpath-rw-ext docs

Borrowed super hero squad json from Mozilla tutorial