There are many tools that utilize json, and when it is relatively simple you can use standard modules or even custom coding to pull out the desired portions. But once these data structures reach a certain level of complexity you really should consider a Python module that implements JSONPath (analogous to xPath for XML).
The ability to query JSON using JSONPath can be done with Python modules such as jsonpath_rw and jsonpath_rw_ext which gives us the ability to easily specify objects deep in the graph.
Prerequisite modules
You will need to use pip to install the ‘jsonpath_rw’ and ‘jsonpath_rw_ext’ modules.
pip install jsonpath-rw jsonpath-rw-ext --user
Or if you want to install these libraries in a virtualenv.
virtualenv jsonpath source jsonpath/bin/activate pip install jsonpath-rw jsonpath-rw-ext
Example JSON
I am going to borrow the JSON used in the Mozilla tutorials, that defines a superhero squad and a list of heroes on that team. Here is a github link to this json.
{ "squadName": "Super hero squad", "homeTown": "Metro City", "formed": 2016, "secretBase": "Super tower", "active": true, "members": [ { "name": "Molecule Man", "age": 29, "secretIdentity": "Dan Jukes", "powers": [ "Radiation resistance", "Turning tiny", "Radiation blast" ] }, { "name": "Madame Uppercut", "age": 39, "secretIdentity": "Jane Wilson", "powers": [ "Million tonne punch", "Damage resistance", "Superhuman reflexes" ], aliases: [ "Lady Destructo", "Little Red" ] }, { "name": "Eternal Flame", "age": 1000000, "secretIdentity": "Unknown", "powers": [ "Immortality", "Heat Immunity", "Inferno", "Teleportation", "Interdimensional travel" ], aliases: [ "Power Flame", "Red Roaster" ] } ] }
Program Run
I will go into further explanation, but for now let’s just run the program and look at the type of information being pulled out of the JSON above. Here is the link to the Python script on github.
(jsonpath) $ python SquadTestJSONPath.py Squad: Super hero squad Members: Molecule Man Madame Uppercut Eternal Flame Count members in list: 3 Count members using len extension: 3 Powers of Madame Uppercut Madame Uppercut has the power of Million tonne punch Madame Uppercut has the power of Damage resistance Madame Uppercut has the power of Superhuman reflexes Aliases? Madame Uppercut has aliases: [u'Lady Destructo', u'Little Red'] Eternal Flame has aliases: [u'Power Flame', u'Red Roaster'] Does anyone have an alias that contains 'Red'? Madame Uppercut has alias that contains 'Red', [u'Lady Destructo', u'Little Red'] Eternal Flame has alias that contains 'Red', [u'Power Flame', u'Red Roaster'] Which specific aliases contain the word 'Red'? Alias that contains 'Red': Little Red Alias that contains 'Red': Red Roaster
Any JSON library would allow us to pull this type of information, but hopefully as you how the JSONPath written, it will become evident this could be very useful in larger data structures.
Program setup
At the top, we import ‘jsonpath’ from the basic jsonpath_rw. And then ‘parse’ from the jsonpath_rw_ext, which gives us the extensions in functionality (filters, etc.)
from jsonpath_rw import jsonpath from jsonpath_rw_ext import parse # for calling extended methods import jsonpath_rw_ext as jp
Read JSON from disk
# read file from disk json_file = open("squad.json") json_data=json.load(json_file)
Squad name, simple top level element
For a very simple query, we just ask for a simple top level element value.
print("Squad: {}".format( jp.match1("squadName",json_data) ) )
Members by name, element in each array item
for name in jp.match("$.members[*].name",json_data): print(" {}".format(name))
Member count, len extension on array
# use extensions to provide direct count of number of members in array print("Count members using len extension: {}".format( jp.match1("$.members.`len`",json_data ) ))
Powers of particular member, embedded array
lookFor="Madame Uppercut" print("\nPowers of {}".format(lookFor)) powers = jp.match1("members[?name='" + lookFor + "'].powers",json_data) for power in powers: print(" {} has the power of {}".format(lookFor,power))
Lookup array element where embedded structure has matching word
# find only array items that have nested element print("\nDoes anyone have an alias that contains 'Red'?") memberHasAliases=jp.match("members[?(aliases[*]~'.*Red.*')]",json_data) for member in memberHasAliases: print("{} has alias that contains 'Red', {}".format( member['name'],member['aliases'] ))
Find nested array items that contain word
print("\nWhich specific aliases contain the word 'Red'?") for thisalias in jp.match("members[*].aliases[?(@~'.*Red.*')]",json_data): print(" Alias that contains 'Red': {}".format( thisalias ))
REFERENCES
Borrowed super hero squad json from Mozilla tutorial