Monday, April 2, 2012

Python: Regular Expression 101 Example Code

Reference: http://stackoverflow.com/q/9980381/1276534
Authors: Rajeev, George

In computer science theory class, we learned about regular expression. But it is unclear what exactly can it do at first, today I would like to introduce data validation as an example that uses the concept of regular expression. Python itself, like other language I assume (heard), has an implementation of regular expression. It comes standard from python too, see: http://docs.python.org/library/re.html

For example, you would like to ask the user for a telephone number, in the format of: 917-222-1234, if it is not in the format of XXX-XXX-XXXX, it will ask the user again until it is store. Let's take a look at the sample code.
import re

while True:
    # Get the user's input into the string
    myString = input('Enter your telephone number: ')
    
    # Matching it with the regular expresssion
    # isGoodTelephone will return True if it matches
    isGoodTelephone = re.match('^[0-9]{3}-[0-9]{3}-[0-9]{4}$', myString)
    
    if (isGoodTelephone):
        print('Great! Got your phone number into the system')
        print('Entry:', myString)
    else:
        print('Not in the correct format. Ex: xxx-xxx-xxxx')
    print()

Output of the csci133rep1.py:
Enter your telephone number: 917-123-1234
Great! Got your phone number into the system
Entry: 917-123-1234

Enter your telephone number: 9171231234
Not in the correct format. Ex: xxx-xxx-xxxx
Actually the basic of the regular expression is not too hard to learn, take a look at the bottom and you will able to figure out how to use it with no problem. Didn't need to put too much comment to make it understandable. Although there are much more ways to use it than just the telephone.
^[0-9]{3}-[0-9]{3}-[0-9]{4}$
^       # mark the start of the telephone string
[0-9]   # any one of the 0123456789
{3}     # match it exactly three times, no less
-       # a hyphen symbol
[0-9]   # any one of the number between 0 and 9
{3}     # exactly three copies
-       # another hyphen symbol
[0-0]   # any one of the number 0-9
{4}     # four times
$       # mark the end of the telephone string

No comments:

Post a Comment