George Chan's Blog

Wednesday, March 28, 2012

csci133c7.py

or known as csci133cleanup.py

In this tutorial we will write a program that clean up the string, it is one of the most classic program. Almost every student will be given a novel text file or input text file and ask them to do something on the data. So the first thing is to "open and load" the text file, and get the English letters into a new string. This tutorial looks long, because I included the full source code of every single program, but in fact it is only minor changes. Read on!

# Version 1 of csci133cleanup.py
# Full implementation of cleanup
wordList = [] # Create a list to store our words
abc = 'abcdefghijklmnopqrstuvwxyz'

with open('novel.text') as book:
    for line in book:
        cleanline = ''
        for character in line.tolower():
            if character in abc:
                cleanline += character
            else:
                # Important! We have append a space!
                cleanline += ' '
        for word in cleanline.split():
            if word not in wordList:
                wordList.append(word)

The first version we are only cleaning up the string text, so there are nothing too special about it. But notice, on line 18, we appended a space to it. Why? Take a moment to think about it, or try to clean 'Doctor--John' on a piece of paper.

Answer: Because we need this mechanism to separate possible words, for example, here is a string Doctor--John. If we did not append a space, we will get 'DoctorJohn' in one word. When we want every single word in the file, we want to separate them instead of keeping them as the same one.

# without space append: Doctor--John, result in DoctorJohn
# with space append: Doctor--John, result in Doctor  John (YES!)

Of course this is not without its problem, for example, we will be left with a lot of 's', so we will want to check if it is already in the list or not. (See line 16), if they are in the list, we might not want to append it again. *Depend on your need, maybe you can add a line number to it. See the next example.

# Version 2 of csci133cleanup.py
# Insert the line numbers into the dictionary
wordList = {} # Create a dictionary to store them
abc = 'abcdefghijklmnopqrstuvwxyz'

with open('novel.text') as book:
    for line in book:
        lineNumber = 1 # Starting at line 1
        cleanline = ''
        for character in line.tolower():
            if character in abc:
                cleanline += character
            else:
                # Important! We have append a space!
                cleanline += ' '
        for word in cleanline.split():
            if word in wordList:
                wordList[word].append(lineNumber)
            else:
                # Store the value as a list that contain 1 item
                wordList[word] = [lineNumber]
        lineNumber += 1

Take a moment to read and compare the code. The very first line is different. We are using a dictionary instead of list. Because when we want to check if the item is in the dictionary already or not, we want to use its build in function, instead of going them one by one. And the other difference is, we are now appending the line number into a list of them. There is an interesting part to it, See line 21.

wordList[word] = [lineNumber]

Notice, we can not use wordList[word] = lineNumber. Because we are creating the first value for the dictionary's key. We instead will create this value as a list that contain one integer. I actually did not aware of this when I was learning python, I keep running into error, because I only used a single interger. And when I try to append to this single integer, it does not work.

The last version we want to search it, we want to look up our dictionary we just created. Take a look at the last couple of lines.

# Version 3 of csci133cleanup.py
# This version include part 1 - 3
wordList = {}
abc = 'abcdefghijklmnopqrstuvwxyz'

with open('novel.text') as book:
    lineNumber = 1
    for line in book:
        cleanline = ''
        for character in line.tolower():
            if character in abc:
                cleanline += character
            else:
                cleanline += ' '
        for word in cleanline.split():
            if word in wordList:
                # do something, such as append line number
                wordList[word].append(lineNumber)
            else:
                wordList[word] = [lineNumber]
        lineNumber += 1

while True:
    word = input('Enter a word here: ' )
    if word in wordList:
         print('Found on lines:, wordList[word])
    else:
         print('Not found.')

wordList = {'apple':[2, 25, 55, 100], 'banana':[5, 10, 36, 90]' ...}

This is the first time we see a while statement in python, the structure of the while loop is simple. while (condition is true), it will execute all the code within it once, and then check if the condition is true, if it is true, do it again, if it is not, it will exist and go to the next statement. See we have 'True' as the condition, that means this loop will run forever, until we kill it with keyboard interrupt.

Keyboard interrupt hot key: Control + C

Tuesday, March 27, 2012

csci133ifelif.py

Reference: http://stackoverflow.com/questions/7052393/python-elif-or-new-if
Today when I am reading on the python exercises, I came across one of the exercise program it uses elif (in chapter 10). For a second I am not sure what does it mean because it is called differently. But when I read closely to the source file. It looks like it is trying to replaces some of the other if else statements. Finally I look it up online, I found out it is a little bit more than just if else loops.

def foo(var):
    # Check if var is 5
    if var == 5:
        var = 6
    elif var == 6:
        var = 8
    else:
        var = 10
    return var

def bar(var):
    if var == 5:
        var = 6
    if var == 6:
        var = 8
    if var not in (5, 6):
        var = 10
    return var

print foo(5) # 6
print bar(5) # 8

You can see the exam of foo(5), if the val is 5. Then the rest of them are treated as (else) loop. The elif is a nested else if loop. It is good (maybe) if you want a cleaner looking program, because you don't have the nested else if loops, the indent level is smaller, and faster compare to a sequence of if, if, if statements, because you are not checking explicitly for every single if statement. Note: always try to put the most common condition on the top, so things can check off the 'list of conditions' faster.

For example: If you want to check if a string is English word or not, you would want to check if "isalpha()" or not, and then you start to clean up the letters. So that way, your loop will exist as soon as it knows it contain non-letter characters.

Friday, March 23, 2012

csci133class.py

Tkinter module provide many data types, such as Frames, Labels, and Buttons. They equipped with their own sub routines, get for Entry, some of them is standard, like pack, and after. But wouldn't it be nice if we can do their to all the data type (class) we create?

For our ice cream store, we want to create memebership account (Object).

standardMember = Account('George Chan')
standardMember.deposit(100)

Or, for a worker timesheet program, we can create worker that contain other datatype object)

class iceCreamMember:
def __init__(self, name, age):
    self.myName = name
    self.myAge = age
def getAge(self):
    return self.myAge
def getName(self):
    return self.myName

member1 = iceCreamMember('George', 23)
member2 = iceCreamMember('Gerry', 29)

print(member1.getAge())
print(member2.getName())

In python, when we want to create a new datatype (new class), we would use a class statement. The keyword class tells python that it is a new datatype we are creating, notice it is a good style to always name your class with a upper case letter.

And when we want to create an instance of the iceCreamMemeber, an individual object of this new type, we will use the class name as it is a function.

member1 = iceCreamMember('George', 23)

This create a new iceCreamMember object, and member1 refers to it. Remember the constructor is called when we create an instance of the object in C++? In python, the __init__ function will be called right away.

def __init__(self, name, age):

Look at this line, the __init__ function takes 3 parameters, but look at the line #20, we only actually passing 2 to it. Why? Because the first argument is always set to refer to the new instance we just created. (Self, itself). You can name it anything, but the first one is always point to the object itself, so it makes senss to use 'self'. The order of ('George', 23) is important, because name gets 'George', as a string, and age gets 23, as an integer.

Different from C++, python's class instance does not have member variable pre say. They have attributes, and you do it with the syntax of.

self.myName = name
self.myAge = age

To create a function for our userdefined class, we just use def like we always do, with the difference of giving it a (self) parameter. Notice, when we call the function .getName(), we don't have to give it anything. Since the self argument is generated automatically.

But if we try to print(member1), something weird will happen, try it. It actually let us learn another fact about how python work.

>>> 
<__main__.iceCreamMember object at 0x0000000002EE1E48>

The reason why we get the memory address like output, it is because we have yet to "teach" python how we want to print this. python is computer, and that's what the iceCreamMemeber to python is at this moment, let's add another method to our class.

def __str__(self):
    return self.myName + ', and age ' + str(self.myAge)

So when we ask python to print, it will know what to print.

*Important* : self vs. Deck

class goldMemeber(iceCreamMemeber):
def __init__(self, discount):
    iceCreamMemeber.__init__(self)
    self.myDiscount = discount
def __str__(self):
    return "{0}% of discount for member".format(self.myDiscount)

We can understand this new class as: GoldMemeber is a kind of iceCreamMemeber. And the GoldMemeber inherits all the function the iceCreamMemeber has. So the getName, getAge function, will be provided to the GoldMemeber automically.

iceCreamMemeber.__init__(self)

*Important*: We must use the explicitly passing method to to pass the instead. If we use self.__init__(), we will be instead calling the goldMember.__init__ function. Which is the function we are trying to define right now at this moment.

Monday, March 19, 2012

csci133number.py

Reference: http://docs.python.org/library/stdtypes.html#numeric-types-int-float-long-complex
Please, please, please click on it and read it if you want to know the full details of the information. The reference page tell you exactly everything you ever need to understand them!

There are total of 4 number type (or Numeric Type): integers, float, long, complex.

Integer - implemented using long in C, have 32bits of precisions. Integers can be positive and negative, but they are whole numbers. (1,2, 3, 4, 5, 0, -1, -2..)
Floating point - implemented using double in C. When you need decimal point, you can use floating point. Such as 1234.5 + 1234.5.
Long integers - have unlimited amount of precisions. (See here for long integers). It is useful if you are calculating the amount of debt United States is under. (wink wink)
Complex - have real and imaginary component to it. (See here for Complex numbers). I think it is very useful, but I am not yet familiar with it yet, I shall come back to it soon.

If floating point is not enough for your usage, you can also use fraction, and decimal. Python is very nice because it support mixed arithmetic, when you do this, python will convert the broader type into narrower type. For example: Integer * Floating point = Integer.

The relationship is the follow: Complex > Floating point > Long Integer > Integer

# Get the absolute value 
abs(-5)

# Convert something into integer
myNumberInString = '100'
anotherNumber = 50
result = 0
result += int(myNumberInString) + anotherNumber

# The power of x function 2^3
pow(2, 3)

There is a thing called module, it is a package of tools. Similar to header files in C++, remember the math class in C++? There is one similar to it in python, and it is called math too. We have to always import the module before we can use its tool. There are many of them, feel free to experiment with it, they provide you a lot of useful subroutine.

# Sample code for math moduele
import math
print(math.pi)
>>> 3.1415926535897931

csci133Buildin.py

Everything in Python are objects, and python's build-in types are objects too. When you use the build-in types, you don't have to worry about anything such as memory allocation, implement insert, search, sort, list, print, and get routines. We can start immediately work on our code. *In C++, we usually call them functions, In JAVA, we call them methods and finally, in Python, they are called routines. Here is a list of reason why you should try to use build-in types as much as you can.

They are easy- if you need simple program, they are great for fast development , easy to write, debug, and for others to read and understand your code. You can write a program to calculate expense in just about 5 mins using the build in types.
They are useful - you can use them to build more complex object. They are like lego, you can stack them and form different tools.
Efficient - if you want performance, look no further, they are perfected by developer, and will only get better as more release follow. It is less likely you will write a more efficient routines than them, although maybe for highly specialized input.
Always here - every python comes with them, you don't have to download anything extra, and they are standardize cross everyone. Everyone has the same copies when they download the Python.

Name	Example	Sample Code	Reference
Number	12345, 1234.5	csci133number.py	Link
String	‘Hello Python’	csci133string.py	Link
List	[1, 2, 3] [‘a’, ‘b’, ‘c’]	csci133list.py	Link
Dictionary	{‘username’:’password’}	csci133dictionary.py	Link
Boolean	True, False	csci133boolean.py	Link

Here is an index of some of the basic build in types offered by python, please note there are a lot more other kinds that I didn't have a chance to cover! Although it is my goal, to write about all the build in data types :) . Python is dynamically typed (instead of declaring the type and compile the code), and strongly typed (means you can not perform other type function on another type). It is something to keep in mind when you are learning other languages.

Friday, March 16, 2012

csci133p6.py

This is a taste of graphical user interface, and including myself too. When the program is run with a GUI, the feeling is just so much "different", you are able to change your option with a click of bottom, there are now colors! You can change the color of the background easily. And the very sample program that you write are now so much more interesting. But the planning is now more important than ever as well, because you need to figure out what is it you want to be display on the screen before you start.

The focus we are using in this series of tutorial is from Tkinter, or known as TK.

from Tkinter import *

root = Tk()
# Change the background color to light green
root['bg'] = 'light green'

# Create a title widget in the frame of root
simpleTitle = Label(root)
simpleTitle['text'] = 'Hello Tkinter!'
simpleTitle.pack()

mainloop()

The output of the program above is like this, it is a simple window with the string, 'Hello Tkinter!', but try to make the window size better!

Do you see the green color? That's the light green background color we set. It is not shown when the GUI launched, you can resize the window, by moving your mouse pointer to the border of the window, and then just drag it larger. So there you have it you see the background color. You can also change the color of the text label if you want it too.

Look at line 1:

from Tkinter import *

We have to import the module of Tkinter, because although python does ship with this, you have to let python know you want to use this.
We create a Tk() object named root, this is like a base frame. And then we can change the background color, by accessing its ['bg'] indexed variable, and assign it to 'light green'. If you want, you can change it to 'light pink' as well.
Look at line 8:

simpleTitle = Label(root)
simpleTitle['text'] = 'Hello Tkinter!'
simpleTitle.pack()

We here create a Label named simpleLabel, it is based on the root's frame, that's the reason. If there is another frame, we would replace root with the name of the frame. For example: myLabel = Label(anotherFrame). Just like ['bg'] field, there is a ['text'] field associate with it too, we access it the same way we access the background color, we can assign a string to it. And at last, we have to call .pack(), to ask Tk to draw it onto the screen. When we make changes, we have to always do .pack(). Notice pack() is a function, it actually takes parameters!

Reference: http://docs.python.org/library/tkinter.html#packer-options

Here is the list of possible options:

expand: if it is set to true, it will expand as the window's size get bigger. Ex: foo.pack(expand=YES)

fill: it is to screen, there are 3 options to it, either X, Y, BOTH
Ex: foo.pack(fill=BOTH)

side: it get where your widget want to be positioned. TOP (default), BOTTOM, LEFT, RIGHT
Ex: foo.pack(side=LEFT)

anchor: which way it snaps on to, it is think as a compass, has 8 directions. "n", "ne", "e", "se", "s", "sw", "w", "nw", "center"
Ex: foo.pack(anchor=w)

Example of such chained options:

foo.pack(expand=YES, side=LEFT, fill=X)

Wednesday, March 14, 2012

csci133c5.py

In python, there is an extremely useful data type called dictionary. What a dictionary is a collection of unordered (key, value) pairs. Notice, one value can be mapped to one key only, so if you try to add another value with the same key, it will change the value it originally assigned to it. It makes sense because if you have a few values mapped to the same key, you have no idea which is which.

Take a look at the example for the dictionary below

passwords = {'george':'dog', 'gerry':'cat', 'stephen':'chicken'}
name = input('Username: ')
password = input('Password: ')
if name == passwords[password]:
   print('Correct, welcome:', name)
else:
   print('Sorry, bad password.')

Output:

Username: george
Password: dog
Correct, welcome: george

Username: george
Password: cat
Sorry, bad password.

Reference: http://docs.python.org/tutorial/datastructures.html#dictionaries
And lastly, we can not use a list as the key either, because you can modify a list by accessing it with the index assignment, slice assignment or other methods. You create a dictionary with a pair of braces{}, it is now an empty dictionary. You separate the entries by comma, and you give the value to the key by :. key: value is the syntax for the dictionary entries.

keys(), which return a list of all the keys used in the dictionary
del dictionary[key], delete the key:value pair
if you store a key that is already has a value, the old value will be gone
the keys can be strings and numbers
the value can be objects of any kind

When you want to look up a key and see what its value is, do this:

passwords['george']
# Return value will be 'dog'

Look at line 4, there is a if statement, similar to C++, if is a control flow statement. You can write a else statement that allow it to be executed when the if statement condition is not satisfy. But you can just use if statement by itself too.

myNum = 10
if myNum > 5:
    print('My number is greater than 5') # Totally cool too

So, now you know for loop, you know how to open a file, and a if else. Let's start to create some program with them. It is like lego, when you have more parts, you can create more complex (powerful) projects.

Monday, February 20, 2012

csci133c4.py

Since we downloaded a txt file, this tutorial is about opening it. Here is the code.

with open('book.txt') as book:
    # The book is a list containing all lines
    for line in book:
        # If the line contain the string 'yourselves'
        if 'yourselves' in line:
            print(line)

The first line there is a reserved python keyword with, followed by open('filename'). It is actually a path name between the (), but if you don't provide a path, it is assumed it is the current directory. In case you have to access a file from a different folder, you would provide it a path with the file name. Such as ../../csci133/ch1/csci133c4.py.

You would open the text file into another object . You don't have to call it book, you can call it anything you want, that would be list containing all the "lines". In the event of it is actually a text file with strings after strings, you does make sense to call it something along the line of data, book, and myFile. You can try to iterate through it with for loop.

for object in objects:
    # do something here

Example problem #1: Count the number of words contained in the txt file

count = 0
with open('book.txt') as book:
    for line in book:
            # += is the same as count = count + count
            count += len(line.split())
print('There are', count, 'of words.')

Before we are done, there is also way to get input from the user, you would set up a variable, and assign it the value of the input.
Example problem #2: Take an input from the user (like cin in C++)

name = input('What is your name?')
print('Hello, nice to meet you', name)

Python's list comprehension
It is a handy way to write code in a very short way, compare the two code. They do the same thing. I heard from people, it is a way to pretend to be smart, on the other hand, it seems useful to write code in a short way I guess. I prefer slightly the longer way, but I have a feeling in the professional world, everyone use list comprehension. Because it is faster, and less likely to create errors.

for number in numbers:
    data[number] = data[years][time] 
    return data

the data to be changed for object in objects

return [data[number][time] for number in years]

# Regular code
iceCreams = IceCreamFlavor()
iceCreamMenu = [] # iceCreamMenu is a list
for number in range(5):
    iceCreamMenu = iceCreams.random()

# Same code with list comprehension
iceCreams = IceCreamFlavor()
iceCreamMenu = [iceCreams.random() for number in range(5)]

The more I use it, the more I think it is useful. Just like all functions and methods (well, most of them). It is meant to help us write program faster and easier. Just like a a lot of the helper function is build in, this is build in to help us not have to spend as much time to get what we want.

Notice the range(5)? That means it is created a list of 5 numbers, [1, 2, 3, 4, 5]. Useful tool as well! Instead of doing a = [1, 2, 3, 4, 5]. And for number in a.

# The range(5)
for number in range(5):
    # Statement

# Same as this
numbers = [0, 1, 2, 3, 4]
for number in numbers:
    # Statement

Reference: http://docs.python.org/library/functions.html#range

range([start], stop[, step])
It is very commonly used in for loops, the start and stop arguments must be integers, if you didn't give it a step argument, it will be default to 1. If you didn't give it a start argument, it will be default to 0.

The range() function create a list with the number 0, 1, 2, 3, 4. Notice python is zero based, so if you need to print out from 1 to 5. You would not want to offset the number variable. <Correct way: to use the start argument>

# To print 1-5 the correct way
for number in range(1,6):
    print(number)

Output:

>>> for number in range(1,6):
...     print(number)
... 
1
2
3
4
5
>>>

As for the step argument, it is what the increment is used. If you use the step of 2. Ex: range(0, 10, 2), it will give you [2, 4, 6, 8, 10]

Practice Problem:

# Write a range(start, stop, step) that generate the following results
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[0, 2, 4, 6, 8, 10]
[1, 3, 5, 7, 9, 11]
[0, 50, 100]
[3, 9, 12, 15, 18, 21]

Answer to the problems: Notice for the question when it ask you where to stop, you can either stop at the next value (11) for the case of 10, or stop at the next increment value, 150 for the case of 100, or 101 for the case of 100. It will still work the same. I just like to do it this way, if you get the same answer, then you should be good to go.

>>> for number in range(10):
>>> for number in range(1, 11):
>>> for number in range(0, 12, 2):
>>> for number in range(1, 13, 2):
>>> for number in range(0, 150, 50):
>>> for number in range(3, 23, 3):

Note: Because reading again on the range() documentation, I thought the only way to do print 1-5 is to offset. Which show us how important and good it is to read the reference.

Monday, February 6, 2012

csci133c3.py

Welcome to the 3rd notes for the python programming lab. Let's take a look at the code and see what does it do. Today we will talk about the function. Function is a block of code that do varies kind of thing, and might or might not return the value it get.

line = 'This is a sample line of text.'
# Measure the length of the line string
print(len(line))

# Split the line string by spaces, return a list
print(line.split())

# Measure how many word are there now
print(len(line.split()))

Output:

30
['This', 'is', 'a', 'sample', 'line', 'of', 'text.']
7

Pay close attention to line #3, when we call a function (or to use a function). We can also pass some data if the function accept it, that data is called parameter, or argument. So we passed the line into the length function, len(line). And the function returned the result of 30, and then it is being passed to the print function and print on the screen. That is why it is outputed as 30.

Whenever we want to print thing on the screen, we are always actually calling the print function, and passing the number, letter, or word into the function. What about the split function? What does it do? From the python's web manual.
http://docs.python.org/library/stdtypes.html#str.split

str.split([sep[, maxsplit]])

Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified, then there is no limit on the number of splits (all possible splits are made).

For example, ' 1 2 3 '.split() returns ['1', '2', '3'], and ' 1 2 3 '.split(None, 1) returns ['1', '2 3 '].

The function require a string to be invoked on, that's why it has str.split(), the split function actually takes argument. But at the moment we just want it to split with blanks, or spaces. Note each data type has their own set of functions build in to them.

Example Problem #1: Turn a string into all lower case
If you want to turn a string to lower case

word = 'THE QUEEN'
word = 'THE APPLE'
newWord = word.lower()
print('Before:', word)
print('After:', newWord)

Output:

Before: THE QUEEN
After: the queen

Example Problem #2: Turn every word into lower case, remove the symbols, print them one by one. (One on each line). String is from a random novel on the web.

line = 'http://www.gutenberg.org/cache/epub/39133/pg39133.txt'
newLine = ''
for char in line:
    if char in 'abcdefghijklmnopqrstuvwxyz':
        newLine = newLine + char
    else:
        newLine = newLine + ' '
print('Before:', line)
print('After:', newLine)
print('Split it:', newLine.split())
for word in newLine.split():
    print(word)

Output:

Before: http://www.gutenberg.org/cache/epub/39133/pg39133.txt
After: http   www gutenberg org cache epub       pg      txt
Split it: ['http', 'www', 'gutenberg', 'org', 'cache', 'epub', 'pg', 'txt']
http
www
gutenberg
org
cache
epub
pg
txt

So before we go, let's go to this link, download the file into our current dir. Because the next tutorial we will use that file! http://www.gutenberg.org/cache/epub/39133/pg39133.txt , and rename it to book.txt

Practice Problem: Write a program to clean up the following string, turn all letters into lower case, and print one word per line.

foo = '[_He approaches Fabiani._]'

Output:

he
approaches
fabiani

Wednesday, February 1, 2012

Pydev in Eclipse on Mac

When you downloaded Pydev for Eclipse, you have to set up the Python interpreter before you can start using it. Here is the instruction.

To configure a Python or Jython interpreter in
Eclipse > Preferences > PyDev

When you are in Preferences, find PyDev, and click on the Interpreter - Python

Choose New in the upper right and enter /usr/bin/python
Eclipse will then take care of the rest for you – ie. updating the $PYTHONPATH

Reference & Credit: On Using Pydev on a Mac.