Python: exploring the use of startswith against a list: tuple, regex, list comprehension, lambda

python-logoAs with any programming language, Python has a multitude of ways to accomplish the same task.  In this article, I will explore the idea of taking a string and checking if it ‘startswith’ any of the strings from a predetermined list.

As context for the example code, RFC1918 sets out several IPv4 ranges that can be considered part of a private IP space: 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16.  Given a single user provided IPv4 address, we will implement a rudimentary ‘startswith’ check of whether this address matches any from the list of private ranges.  I will implement this logic using:

  1. tuple support of str.startswith
  2. regex
  3. list comprehension with filter
  4. lambda filter

Full code available in github, experiment_startswith_list.py.

Defined variables

Each of the implementations below will use the following list of IP prefixes to determine if an IPv4 address is private.  The ‘userip’ variable contains the single IPv4 address that we are attempting to determine if private or not.

# simplified list of CIDR prefixes defined as private in RFC1918
PRIVATE_IP_LIST = ['10.','172.16.','192.168.']

# IP address that we are testing against
userip="192.168.1.1"

#1 Using tuple support of str.startswith

The first implementation uses the str.startswith built-in support to take a tuple collection.

if userip.startswith(tuple(PRIVATE_IP_LIST)):
  print("YES '{}' is a private IPv4 address".format(userip))
else:
  print("NO '{}' is not a private IPv4 address".format(userip))

#2 Using regex

The second implementation constructs a regex, then uses ‘re.match’ to determine if there was a match.

myregex = "^" + "|^".join(PRIVATE_IP_LIST)
testmatch = re.match(myregex,userip)
if testmatch:
  print("YES '{}' is a private IPv4 address".format(userip))
else:
  print("NO '{}' is not a private IPv4 address".format(userip))

A regex is constructed with the following syntax “^value1|^value2|^value3”, where the caret means match the beginning of the string and the pipe sign is an OR so that multiple expressions are tested.

#3 Using list comprehension filter

The third implementation uses list comprehension and “if” filter to only select the values from the list that match.

cidr_matches = [ cidr for cidr in PRIVATE_IP_LIST if userip.startswith(cidr) ]

if len(cidr_matches)>0:
  print("YES '{}' is a private IPv4 address starting with this range: {}".format(userip,cidr_matches))
else:
  print("NO '{}' is not a private IPv4 address".format(userip))

#4 Using lambda filter

The fourth implementation uses a lambda filter to select only the values from the list that match.

cidr_matches = filter(lambda s: userip.startswith(s),PRIVATE_IP_LIST)

if len(cidr_matches)>0:
  print("YES '{}' is a private IPv4 address starting with this range: {}".format(userip,cidr_matches))
else:
  print("NO '{}' is not a private IPv4 address".format(userip))

Discussion

I won’t tell you definitively that one of these is better for your situation or scale.  That is up to you to determine.

I think using the built-in tuple support (#1) is the easiest to read, and the easiest to maintain by programmers of any level.  But it also does not provide you exactly which items from the list were matched.  And of course, we are lucky that str.startswith() has tuple support for this very contrived example, if my example was instead to use str.find() for a “contains” logic then we would not have built-in tuple support.

The regex (#2) has a lot of flexibility beyond just looking at ‘startswith’.  For example, you could use the dollar sign to match the end, or use wildcards and character classes to find a match.  However, it does not provide exactly which item from the list matched.

The list comprehension with filter (#3) and lambda filter (#4) have the benefit of providing the exact items from the list that matched.   In general, I favor list comprehension for readability and maintenance, but that is personal preference.

The list comprehension and lambda both have the additional benefit of test flexibility.  Although we are using str.startswith() as an example here, you could literally use any function or test expression.  For example, doing a “contains” type logic with str.find() would be a simple replacement of the test expression .

cidr_matches = [ cidr for cidr in PRIVATE_IP_LIST if userip.find(cidr)!=-1 ]

 

 

REFERENCES

docs.python.org, list comprehension

u.arizona.edu, list_comprehensions and lambda