George Chan's Blog: March 2012

Saturday, March 31, 2012

Python: Get the most frequent elements from list when there is more than one

Reference: stackoverflow
Authors: james_kansas, Niklas B.

The question is: When you are given a list that is unsorted, how do you get the most frequent appeared element, in particular, when there is more than one.

from collections import Counter

def myFunction(myDict):
    myMax = 0 # Keep track of the max frequence
    myResult = [] # A list for return
    for key in myDict:
        # Finding out the max frequence
        if myDict[key] >= myMax:
            if myDict[key] == myMax:
                myMax = myDict[key]
                myResult.append(key)
            # Case when it is greater than, we will delete and append
            else:
                myMax = myDict[key]
                del myResult[:]
                myResult.append(key)
    return myResult

foo = ['1', '1', '5', '2', '1', '6', '7', '10', '2', '2']
print('The list:', foo)
myCount = Counter(foo)
print(myCount)

print(myFunction(myCount))

Output

The list: ['1', '1', '5', '2', '1', '6', '7', '10', '2', '2']
Counter({'1': 3, '2': 3, '10': 1, '5': 1, '7': 1, '6': 1})
['1', '2']

More Reading: http://stackoverflow.com/questions/1518522/python-most-common-element-in-a-list

csci133allCombination.py

Reference: http://stackoverflow.com/q/9961077/1276534
Authors: PePe, Li-aung Yip

Overview: Example of this nested while loop is a bad idea, and noting the while loop in python require resetting the loop counter to be 0. And for usage of getting the combination of everything, use either nested for loop with range() function, or itertools, which takes list and return every possible combination from each element in the list you give as argument. And also, xrange() is replaced by range() at 3.0.

In C++, although it might not be recommended, we can write a nested while loop and the following code will work. Because of unknown reason.

a = 0
b = 0
c = 0
while a <= 5:
    while b <=3:
        while c <= 8:
            print(a , b , c)
            c += 1
        b += 1
    a += 1

Output is the following

Answer Because we need to remember to reset the loop's counter, a, b, c respectively on each iteration. But this method is kind of funky.

a = 0
b = 0
c = 0

while a <= 5:
    while b <=3:
        while c <= 8:
            print(a , b , c)
            c += 1
        b += 1
        c = 0 # reset
    a += 1
    b = 0 # reset
    c = 0 # reset

I think most python programmer would prefer using the for loop over the range() function. It is interesting also to note and learn that, xrange() is the range() function, if you are using python 2.x. From 3.0 on, use range instead. :)

for a in range(5+1): # Note xrange(n) produces 0,1,2...(n-1) and does not include n.
    for b in range (3+1):
        for c in range (8+1):
            print(a, b, c)

But then wait... from the Li-aung Yip, there is a better way. Check out this solution which involve using itertools.product()

import itertools
for a, b, c in itertools.product(range(5+1), range(3+1), range(8+1)):
    print a,b,c

For even more reading: Dan Goodger's "Code Like a Pythonista: Idiomatic Python" Thought: I think 2 second way resemble C++ the most to me, I don't know if I want to use while loop even in C++. But it is great to learn another function from the itertools, the itertools.product(). And nice to see the use for for a, b, c. I think it is powerful, but never use it in my code yet, should practice using it.

import itertools

colors = ['red', 'green', 'blue']
vehicles = ['car', 'train', 'ship', 'boat']
numbers = [1, 2, 3, 4, 5]

"""Pints out all the possible combination of number of color vehicles"""
for color, vehicle, number in itertools.product(colors, vehicles, numbers):
    print(number, color, vehicle)

More reference: http://docs.python.org/library/itertools.html

Wednesday, March 28, 2012

csci133c7.py

or known as csci133cleanup.py

In this tutorial we will write a program that clean up the string, it is one of the most classic program. Almost every student will be given a novel text file or input text file and ask them to do something on the data. So the first thing is to "open and load" the text file, and get the English letters into a new string. This tutorial looks long, because I included the full source code of every single program, but in fact it is only minor changes. Read on!

# Version 1 of csci133cleanup.py
# Full implementation of cleanup
wordList = [] # Create a list to store our words
abc = 'abcdefghijklmnopqrstuvwxyz'

with open('novel.text') as book:
    for line in book:
        cleanline = ''
        for character in line.tolower():
            if character in abc:
                cleanline += character
            else:
                # Important! We have append a space!
                cleanline += ' '
        for word in cleanline.split():
            if word not in wordList:
                wordList.append(word)

The first version we are only cleaning up the string text, so there are nothing too special about it. But notice, on line 18, we appended a space to it. Why? Take a moment to think about it, or try to clean 'Doctor--John' on a piece of paper.

Answer: Because we need this mechanism to separate possible words, for example, here is a string Doctor--John. If we did not append a space, we will get 'DoctorJohn' in one word. When we want every single word in the file, we want to separate them instead of keeping them as the same one.

# without space append: Doctor--John, result in DoctorJohn
# with space append: Doctor--John, result in Doctor  John (YES!)

Of course this is not without its problem, for example, we will be left with a lot of 's', so we will want to check if it is already in the list or not. (See line 16), if they are in the list, we might not want to append it again. *Depend on your need, maybe you can add a line number to it. See the next example.

# Version 2 of csci133cleanup.py
# Insert the line numbers into the dictionary
wordList = {} # Create a dictionary to store them
abc = 'abcdefghijklmnopqrstuvwxyz'

with open('novel.text') as book:
    for line in book:
        lineNumber = 1 # Starting at line 1
        cleanline = ''
        for character in line.tolower():
            if character in abc:
                cleanline += character
            else:
                # Important! We have append a space!
                cleanline += ' '
        for word in cleanline.split():
            if word in wordList:
                wordList[word].append(lineNumber)
            else:
                # Store the value as a list that contain 1 item
                wordList[word] = [lineNumber]
        lineNumber += 1

Take a moment to read and compare the code. The very first line is different. We are using a dictionary instead of list. Because when we want to check if the item is in the dictionary already or not, we want to use its build in function, instead of going them one by one. And the other difference is, we are now appending the line number into a list of them. There is an interesting part to it, See line 21.

wordList[word] = [lineNumber]

Notice, we can not use wordList[word] = lineNumber. Because we are creating the first value for the dictionary's key. We instead will create this value as a list that contain one integer. I actually did not aware of this when I was learning python, I keep running into error, because I only used a single interger. And when I try to append to this single integer, it does not work.

The last version we want to search it, we want to look up our dictionary we just created. Take a look at the last couple of lines.

# Version 3 of csci133cleanup.py
# This version include part 1 - 3
wordList = {}
abc = 'abcdefghijklmnopqrstuvwxyz'

with open('novel.text') as book:
    lineNumber = 1
    for line in book:
        cleanline = ''
        for character in line.tolower():
            if character in abc:
                cleanline += character
            else:
                cleanline += ' '
        for word in cleanline.split():
            if word in wordList:
                # do something, such as append line number
                wordList[word].append(lineNumber)
            else:
                wordList[word] = [lineNumber]
        lineNumber += 1

while True:
    word = input('Enter a word here: ' )
    if word in wordList:
         print('Found on lines:, wordList[word])
    else:
         print('Not found.')

wordList = {'apple':[2, 25, 55, 100], 'banana':[5, 10, 36, 90]' ...}

This is the first time we see a while statement in python, the structure of the while loop is simple. while (condition is true), it will execute all the code within it once, and then check if the condition is true, if it is true, do it again, if it is not, it will exist and go to the next statement. See we have 'True' as the condition, that means this loop will run forever, until we kill it with keyboard interrupt.

Keyboard interrupt hot key: Control + C

Tuesday, March 27, 2012

csci133ifelif.py

Reference: http://stackoverflow.com/questions/7052393/python-elif-or-new-if
Today when I am reading on the python exercises, I came across one of the exercise program it uses elif (in chapter 10). For a second I am not sure what does it mean because it is called differently. But when I read closely to the source file. It looks like it is trying to replaces some of the other if else statements. Finally I look it up online, I found out it is a little bit more than just if else loops.

def foo(var):
    # Check if var is 5
    if var == 5:
        var = 6
    elif var == 6:
        var = 8
    else:
        var = 10
    return var

def bar(var):
    if var == 5:
        var = 6
    if var == 6:
        var = 8
    if var not in (5, 6):
        var = 10
    return var

print foo(5) # 6
print bar(5) # 8

You can see the exam of foo(5), if the val is 5. Then the rest of them are treated as (else) loop. The elif is a nested else if loop. It is good (maybe) if you want a cleaner looking program, because you don't have the nested else if loops, the indent level is smaller, and faster compare to a sequence of if, if, if statements, because you are not checking explicitly for every single if statement. Note: always try to put the most common condition on the top, so things can check off the 'list of conditions' faster.

For example: If you want to check if a string is English word or not, you would want to check if "isalpha()" or not, and then you start to clean up the letters. So that way, your loop will exist as soon as it knows it contain non-letter characters.

Friday, March 23, 2012

csci133class.py

Tkinter module provide many data types, such as Frames, Labels, and Buttons. They equipped with their own sub routines, get for Entry, some of them is standard, like pack, and after. But wouldn't it be nice if we can do their to all the data type (class) we create?

For our ice cream store, we want to create memebership account (Object).

standardMember = Account('George Chan')
standardMember.deposit(100)

Or, for a worker timesheet program, we can create worker that contain other datatype object)

class iceCreamMember:
def __init__(self, name, age):
    self.myName = name
    self.myAge = age
def getAge(self):
    return self.myAge
def getName(self):
    return self.myName

member1 = iceCreamMember('George', 23)
member2 = iceCreamMember('Gerry', 29)

print(member1.getAge())
print(member2.getName())

In python, when we want to create a new datatype (new class), we would use a class statement. The keyword class tells python that it is a new datatype we are creating, notice it is a good style to always name your class with a upper case letter.

And when we want to create an instance of the iceCreamMemeber, an individual object of this new type, we will use the class name as it is a function.

member1 = iceCreamMember('George', 23)

This create a new iceCreamMember object, and member1 refers to it. Remember the constructor is called when we create an instance of the object in C++? In python, the __init__ function will be called right away.

def __init__(self, name, age):

Look at this line, the __init__ function takes 3 parameters, but look at the line #20, we only actually passing 2 to it. Why? Because the first argument is always set to refer to the new instance we just created. (Self, itself). You can name it anything, but the first one is always point to the object itself, so it makes senss to use 'self'. The order of ('George', 23) is important, because name gets 'George', as a string, and age gets 23, as an integer.

Different from C++, python's class instance does not have member variable pre say. They have attributes, and you do it with the syntax of.

self.myName = name
self.myAge = age

To create a function for our userdefined class, we just use def like we always do, with the difference of giving it a (self) parameter. Notice, when we call the function .getName(), we don't have to give it anything. Since the self argument is generated automatically.

But if we try to print(member1), something weird will happen, try it. It actually let us learn another fact about how python work.

>>> 
<__main__.iceCreamMember object at 0x0000000002EE1E48>

The reason why we get the memory address like output, it is because we have yet to "teach" python how we want to print this. python is computer, and that's what the iceCreamMemeber to python is at this moment, let's add another method to our class.

def __str__(self):
    return self.myName + ', and age ' + str(self.myAge)

So when we ask python to print, it will know what to print.

*Important* : self vs. Deck

class goldMemeber(iceCreamMemeber):
def __init__(self, discount):
    iceCreamMemeber.__init__(self)
    self.myDiscount = discount
def __str__(self):
    return "{0}% of discount for member".format(self.myDiscount)

We can understand this new class as: GoldMemeber is a kind of iceCreamMemeber. And the GoldMemeber inherits all the function the iceCreamMemeber has. So the getName, getAge function, will be provided to the GoldMemeber automically.

iceCreamMemeber.__init__(self)

*Important*: We must use the explicitly passing method to to pass the instead. If we use self.__init__(), we will be instead calling the goldMember.__init__ function. Which is the function we are trying to define right now at this moment.

Monday, March 19, 2012

csci133number.py

Reference: http://docs.python.org/library/stdtypes.html#numeric-types-int-float-long-complex
Please, please, please click on it and read it if you want to know the full details of the information. The reference page tell you exactly everything you ever need to understand them!

There are total of 4 number type (or Numeric Type): integers, float, long, complex.

Integer - implemented using long in C, have 32bits of precisions. Integers can be positive and negative, but they are whole numbers. (1,2, 3, 4, 5, 0, -1, -2..)
Floating point - implemented using double in C. When you need decimal point, you can use floating point. Such as 1234.5 + 1234.5.
Long integers - have unlimited amount of precisions. (See here for long integers). It is useful if you are calculating the amount of debt United States is under. (wink wink)
Complex - have real and imaginary component to it. (See here for Complex numbers). I think it is very useful, but I am not yet familiar with it yet, I shall come back to it soon.

If floating point is not enough for your usage, you can also use fraction, and decimal. Python is very nice because it support mixed arithmetic, when you do this, python will convert the broader type into narrower type. For example: Integer * Floating point = Integer.

The relationship is the follow: Complex > Floating point > Long Integer > Integer

# Get the absolute value 
abs(-5)

# Convert something into integer
myNumberInString = '100'
anotherNumber = 50
result = 0
result += int(myNumberInString) + anotherNumber

# The power of x function 2^3
pow(2, 3)

There is a thing called module, it is a package of tools. Similar to header files in C++, remember the math class in C++? There is one similar to it in python, and it is called math too. We have to always import the module before we can use its tool. There are many of them, feel free to experiment with it, they provide you a lot of useful subroutine.

# Sample code for math moduele
import math
print(math.pi)
>>> 3.1415926535897931

csci133Buildin.py

Everything in Python are objects, and python's build-in types are objects too. When you use the build-in types, you don't have to worry about anything such as memory allocation, implement insert, search, sort, list, print, and get routines. We can start immediately work on our code. *In C++, we usually call them functions, In JAVA, we call them methods and finally, in Python, they are called routines. Here is a list of reason why you should try to use build-in types as much as you can.

They are easy- if you need simple program, they are great for fast development , easy to write, debug, and for others to read and understand your code. You can write a program to calculate expense in just about 5 mins using the build in types.
They are useful - you can use them to build more complex object. They are like lego, you can stack them and form different tools.
Efficient - if you want performance, look no further, they are perfected by developer, and will only get better as more release follow. It is less likely you will write a more efficient routines than them, although maybe for highly specialized input.
Always here - every python comes with them, you don't have to download anything extra, and they are standardize cross everyone. Everyone has the same copies when they download the Python.

Name	Example	Sample Code	Reference
Number	12345, 1234.5	csci133number.py	Link
String	‘Hello Python’	csci133string.py	Link
List	[1, 2, 3] [‘a’, ‘b’, ‘c’]	csci133list.py	Link
Dictionary	{‘username’:’password’}	csci133dictionary.py	Link
Boolean	True, False	csci133boolean.py	Link

Here is an index of some of the basic build in types offered by python, please note there are a lot more other kinds that I didn't have a chance to cover! Although it is my goal, to write about all the build in data types :) . Python is dynamically typed (instead of declaring the type and compile the code), and strongly typed (means you can not perform other type function on another type). It is something to keep in mind when you are learning other languages.

Friday, March 16, 2012

csci133p6.py

This is a taste of graphical user interface, and including myself too. When the program is run with a GUI, the feeling is just so much "different", you are able to change your option with a click of bottom, there are now colors! You can change the color of the background easily. And the very sample program that you write are now so much more interesting. But the planning is now more important than ever as well, because you need to figure out what is it you want to be display on the screen before you start.

The focus we are using in this series of tutorial is from Tkinter, or known as TK.

from Tkinter import *

root = Tk()
# Change the background color to light green
root['bg'] = 'light green'

# Create a title widget in the frame of root
simpleTitle = Label(root)
simpleTitle['text'] = 'Hello Tkinter!'
simpleTitle.pack()

mainloop()

The output of the program above is like this, it is a simple window with the string, 'Hello Tkinter!', but try to make the window size better!

Do you see the green color? That's the light green background color we set. It is not shown when the GUI launched, you can resize the window, by moving your mouse pointer to the border of the window, and then just drag it larger. So there you have it you see the background color. You can also change the color of the text label if you want it too.

Look at line 1:

from Tkinter import *

We have to import the module of Tkinter, because although python does ship with this, you have to let python know you want to use this.
We create a Tk() object named root, this is like a base frame. And then we can change the background color, by accessing its ['bg'] indexed variable, and assign it to 'light green'. If you want, you can change it to 'light pink' as well.
Look at line 8:

simpleTitle = Label(root)
simpleTitle['text'] = 'Hello Tkinter!'
simpleTitle.pack()

We here create a Label named simpleLabel, it is based on the root's frame, that's the reason. If there is another frame, we would replace root with the name of the frame. For example: myLabel = Label(anotherFrame). Just like ['bg'] field, there is a ['text'] field associate with it too, we access it the same way we access the background color, we can assign a string to it. And at last, we have to call .pack(), to ask Tk to draw it onto the screen. When we make changes, we have to always do .pack(). Notice pack() is a function, it actually takes parameters!

Reference: http://docs.python.org/library/tkinter.html#packer-options

Here is the list of possible options:

expand: if it is set to true, it will expand as the window's size get bigger. Ex: foo.pack(expand=YES)

fill: it is to screen, there are 3 options to it, either X, Y, BOTH
Ex: foo.pack(fill=BOTH)

side: it get where your widget want to be positioned. TOP (default), BOTTOM, LEFT, RIGHT
Ex: foo.pack(side=LEFT)

anchor: which way it snaps on to, it is think as a compass, has 8 directions. "n", "ne", "e", "se", "s", "sw", "w", "nw", "center"
Ex: foo.pack(anchor=w)

Example of such chained options:

foo.pack(expand=YES, side=LEFT, fill=X)

Wednesday, March 14, 2012

csci133c5.py

In python, there is an extremely useful data type called dictionary. What a dictionary is a collection of unordered (key, value) pairs. Notice, one value can be mapped to one key only, so if you try to add another value with the same key, it will change the value it originally assigned to it. It makes sense because if you have a few values mapped to the same key, you have no idea which is which.

Take a look at the example for the dictionary below

passwords = {'george':'dog', 'gerry':'cat', 'stephen':'chicken'}
name = input('Username: ')
password = input('Password: ')
if name == passwords[password]:
   print('Correct, welcome:', name)
else:
   print('Sorry, bad password.')

Output:

Username: george
Password: dog
Correct, welcome: george

Username: george
Password: cat
Sorry, bad password.

Reference: http://docs.python.org/tutorial/datastructures.html#dictionaries
And lastly, we can not use a list as the key either, because you can modify a list by accessing it with the index assignment, slice assignment or other methods. You create a dictionary with a pair of braces{}, it is now an empty dictionary. You separate the entries by comma, and you give the value to the key by :. key: value is the syntax for the dictionary entries.

keys(), which return a list of all the keys used in the dictionary
del dictionary[key], delete the key:value pair
if you store a key that is already has a value, the old value will be gone
the keys can be strings and numbers
the value can be objects of any kind

When you want to look up a key and see what its value is, do this:

passwords['george']
# Return value will be 'dog'

Look at line 4, there is a if statement, similar to C++, if is a control flow statement. You can write a else statement that allow it to be executed when the if statement condition is not satisfy. But you can just use if statement by itself too.

myNum = 10
if myNum > 5:
    print('My number is greater than 5') # Totally cool too

So, now you know for loop, you know how to open a file, and a if else. Let's start to create some program with them. It is like lego, when you have more parts, you can create more complex (powerful) projects.