I moved your stuff but don't freak out cuz I left behind a symlink

Saturday, June 29, 2013

Learning new nix-craft is kinda like being handed shiny new electronics. You're instantly intrigued with what new stuff you can do with it. This entry is the first in a series about nix-craft.

I'm not sure when I first heard of symlinking, but the first time it hit me as "something I should get acquainted with" was while I was reading dotfiles.github.io. Context clues provided me with the idea that to symlink was to leave a little note to your OS about where you moved the file it expected to find there.

So, as I had endeavored to clean my virtual $HOME of dotfiles, it became apparent that I needed to get familiar with symlinks.

Leaving a note

To create a symlink, you'd type this into your terminal:

ln -s path/to/target path/to/link

In English, this reads as: Create a symlink at path/to/link that redirects to path/to/target.

Reading the note

Most programs play nicely with symlinks. For instance, if you were to symlink your ~/.vimrc to ~/dotfiles/vim/.vimrc, Vim would access ~/.vimrc but would end up reading from ~/dotfiles/vim/.vimrc instead. It's kind of like a HTTP 302 that redirects you to another URI, or in our case, another path on your filesystem. Microsoft Windows users might find symlinks to be similar to shortcuts. They're essentially the same thing with the caveat that Windows programs would unfortunately get cranky if you tried to pass the shortcut file as an executable file. In *nix systems, symlinks in place of the actual file is okay for the most part!

It's also helpful to know what path your existing symlinks are pointing to. A simple ls doesn't list out your symlinks. You'll want to run ls -l, and symlinks will look something like the second entry below:

-rw-------   1 sitong  staff    28054 Jun 29 17:52 .viminfo
lrwxr-xr-x   1 sitong  staff       33 Jun 29 16:57 .vimrc -> /Users/sitong/dotfiles/vim/.vimrc

I actually ran ls -al since the symlinks in my $HOME are all dotfiles.

Removing the note

Removing the symlink is just an rm to the symlink. For instance:

rm ~/.vimrc

A little bonus

So we saw that ln -s is the command for symlinks but what if you passed a naked ln command to your terminal?

You would actually be creating a hard link. Hard links point directly to a file's memory space instead of a filesystem path as with symlinks. This means you can move or rename the target file and the hard link will still resolve to that original target file. If you were to move or rename your symlinked target file then when some program tries to resolve the symlink, it won't find anything at that path. Try it out yourself!

[*nix-craft] [dotfiles] [shellscript]

Objects, attributes, functions, and methods

Sunday, April 07, 2013

I set some time out to open the box on some Python fundamentals this weekend. I have a few topics to write about but, for today, let's talk a little about objects.

Objects

Objects are a big deal in Python. And rightfully so, as most developers use Python in an object oriented manner, though you can certainly write imperatively or pseudo-functionally too. In fact, some people subscribe to the thought that everything in Python is an object. I'm not going to argue about whether that's true or not so we'll just tip-toe around it and state that there are an awful lot of objects.

So what is an object? Let's use Wikipedia's definition. Objects are characterized by these properties:

Identity (Objects are unique from each other)

class Trivial(object):
    pass

obj, diff_obj = Trivial(), Trivial()

id(obj) # 25851216
id(diff_obj) # 26133840

State (Can store data in object)

import os # modules are objects
os.my_data = 'arbitrary'
print os.my_data # arbitrary

Behavior (Can manipulate the object)

print ' '.join(['strings', 'are', 'objects', 'with', 'methods'])
# strings are objects with methods

Attributes and Methods

You're probably familiar with setting object attributes to static data or writing simple methods if you've worked with objects in the past.

class Lannister(object):
    def __init__(self, name):
        self.name = name

    def say_family_motto(self):
        print '{} says: "Hear Me Roar!"'.format(self.name)

tyrion = Lannister()

print tyrion.name
# Tyrion Lannister
tyrion.say_family_motto()
# Tyrion Lannister says: "Hear Me Roar!"

But you're actually not limited to such trivial behavior. For instance, since functions are first class objects in Python, you could easily do the following too.

def say_common_motto():
    print 'A Lannister Always Pays His Debts.'

tyrion.say_common_motto = say_common_motto

tyrion.say_common_motto()
# A Lannister Always Pays His Debts.

As you can see, we've set the .say_common_motto attribute to the function object say_common_motto, then invoked the attribute.

Before we continue, we should make a distinction between .say_common_motto and .say_family_motto.

type(tyrion.say_common_motto)
# <type 'function'>
type(tyrion.say_family_motto)
# <type 'instancemethod'>

Python resolves the type of .say_common_motto to a function object and .say_family_motto to an instance method object. What's the difference?

You can think of a method as a function with a unique difference. A method always takes the object it's associated with as it's first argument. You don't have a choice on the matter.

Under the hood, when Python sees tyrion.say_family_motto(), it first looks for a .say_family_motto attribute in tyrion (this means you can override an object method by writing to an object attribute with the same name). When it doesn't find that, the interpreter will jump up to the class level and look for say_family_motto function defined at the class level. Once found, it will invoke the function as a method - namely, by attaching the tyrion object to the head of the argument list. That's also why a method definition always includes a positional argument placeholder first (traditionally called self in Python).

Methods are really just a shortcut though. You can actually just define your own function with an argument placeholder for an object and attach the function to the object attribute. An example makes this clearer. Suppose we rewrote the say_common_motto function to make it equivalent to a method.

def say_common_motto(self):
    print '{} says: "A Lannister Always Pays His Debts."'.format(self.name)

tyrion.say_common_motto = say_common_motto

tyrion.say_common_motto(tyrion)
# Tyrion Lannister says: "A Lannister Always Pays His Debts."

tyrion.say_family_motto()
# Tyrion Lannister says: "Hear Me Roar!"

As you can see, defining a method in the class is a lot cleaner than attaching a function to the instance object. But both styles get you to the same resolution eventually.

Anyways, hopefully you know a little more about Python objects now.

[objects] [python] [functions]

Sustained momentum

Wednesday, March 20, 2013

In the next couple months, I'm planning on writing more about Javascript as I pick it up in work and from my own reading. Towards the end of Hacker School, I decided I wanted to explore it as my second programming language. Reasoning: I can return to my functional programming roots and I can round out my understanding of web development by getting intimate with how the browser works.

However, this doesn't mean I will stop developing in Python. Aside from Django exposure at work, I'm planning to continue building web applications for myself - and hopefully with more robust front-end UI.

But, that's not quite good enough. If there was one thing I took away from Hacker School, it was the idea that I should always be pushing myself to become a better programmer. Web development is fun and lets me exercise my creative side, but I want to step outside my comfort zone and pursue other avenues of technology too. I'm not entirely sure what direction I'll look but I think my best bet is to attach myself to an open source project and develop a deep understanding of that problem space.

Anyways, I'll think on that and come back with my decision within the next month. Hold me to it!

[future] [javascript] [python] [django] [open source] [hacker school]

No man is an island: Collaboration with git

Tuesday, March 12, 2013

I'm currently in the middle of a Git crash course at work. Though to be honest, it's not really a course. It's more just me crashing. Luckily git is forgiving while I'm expanding my terminology past add, commit, push, status, diff, and clone. In this post, I'll summarize what I'm learning in my own words, but you should read the amazingly clear Pro Git text if you have the time.

Version control for the masses

When you're just a lone soul contributing to your own project, git is basically just a backup system. You're really only keeping snapshots of your project progress and not really worrying about concurrent user editing. But add other contributors into the equation and suddenly you'll need to be in control of all the moving parts of your codebase, or you may have a mess of things.

Growing a forest from a single tree

Every git repository holds a wealth of information about a project. Since each repository houses an entire project and it is dead simple to pull down repository clones from websites like Github and BitBucket, it is easy to see how a popular project might spread its roots over many machines. This can lead to a lot of chaos though, and teams that are working together on projects will need to work on atomizing their changes. Individual commits do fulfill this fairly well on the small scale, but you also need to organize your commits into groups. A good way of conducting this separation is through git's branching model.

git branch [new_branch]

This command creates a new branch separate from your current development branch. Branching is useful to organize paths of development, as you're literally splitting off a new path. For instance, you might want to make a new branch for every bug-fix/feature request of your website or for both server and client sides of your new game. If not given a parameter, git branch will simply list your outstanding branches and show you which branch you're currently working in.

To actually move branches, you'll want to run this command:

git checkout [existing_branch]

Then you can either git branch or git status to see where you are.

Finding your way back home

Great, so you've fixed three bugs and added two features, including pillow armor for your game's main character. How do you get all of that jazz back onto your main development or master branch?

Generally there's two ways about this.

git merge [existing_side_branch]

While you're in a branch (such as the main/master branch) and you want to merge in a side branch, you'll generally be safe with this command. If your current branch never moved forward, the merge will just fast forward your current branch to the position of the side branch. If your current branch has progressed separately from the side branch, the merge will recursively find the two branches' common ancestor, then perform a three-way merge. You'll end up with an extra commit that merges your two branches. This extra commit will have two parents instead of one (one for each branch that it was merged from).

You can avoid having two parent commits and have a more straightforward development path with the second method of merging.

git rebase [existing_main_branch]

You'll start in your side branch. The rebase command starts with the divergent main branch and tries to sequentially stack the each side branch's commits on top. If there are no merge conflicts, you'll end up with a straight development path.

Knowing where your keys are

The true beauty of git isn't realized until you spread out repositories onto different machines and servers. Remote locations are tracked in the remote variable.

git remote

This lists all saved remote locations. Your list may contain origin (Github/Bitbucket), heroku (Heroku), or the name for your own private server.

Anyways, that's essentially the foundation you need for version control. Stay tuned to a later post for more advanced git control.

[git] [version control] [crash course] [rebase] [github] [heroku] [bitbucket]

Understanding heaps

Saturday, February 02, 2013

I'm spending this weekend filling in some of the holes in my programming background, data structures in particular. As part of the learning process, I want to share my new understanding of heaps.

What is it?

A heap is a type of priority queue, which is just a queue where dequeued elements are primarily selected based on their "priority" and secondarily selected based on the "first in first out" principle. A heap prioritizes by key values of objects it stores, which leads to the two flavors of heaps: the max heap and min heap. The max heap will dequeue the objects with the highest keys first. If there are multiple maxes, the heap will dequeue by order of insertion. As you'd expect, the min heap is the same except it dequeues objects with the lowest keys first.

Introducing the max heap

Although a heap is technically a priority queue, and although we are going to implement it as an array, heaps are commonly represented as a binary tree. Here is the max heap we're going to work with:

There are two conditions that must be met for a binary tree to be a max heap.

Every parent node must be at least as large as either child nodes. This means the max key will always be at the top of the tree.
The tree must be complete at all levels except the deepest. The deepest level must be populated from the left. This means if the above tree had another node, it would be the left child of the node 8.

We're going to implement this heap as an array. This boils down to extracting nodes left to right, top to bottom. Here's what it will look like:

[11, 5, 8, 3, 4]

Normally heaps hold objects with the keys we are showing in the tree and array representation. For simplicity, we'll just assume the objects and keys are one and the same.

We also need to keep track of the parent and children of any given node. That's pretty natural for binary trees but how is that implemented in arrays? Turns out it's pretty simple with one-indexing, due to the way we are pulling elements off the tree.

  1  2  3  4  5  <-- One-indexing
[11, 5, 8, 3, 4]
                             EXAMPLE:
parent(i) = floor(i/2)            parent(2) = 1
left_child(i) = i*2           left_child(2) = 4
right_child(i) = i*2 + 1     right_child(2) = 5

Let's code up what we have. I prefer to have zero-indexing so the math on the parent/children calculation will have to compensate.

class MaxHeap(object):
    def __init__(self):
        self.arr = []

    def parent(self, i):
        if i == 0:
            return 0
        return (i + 1)//2 - 1

    def children(self, i):
        left = (i + 1) * 2 - 1
        right = left + 1
        return left, right

Now, a heap has two basic operations: insert and extract_max (extract_min). These are guaranteed to run in O(logN) time, or in other words, the height of the binary tree.

Insert

What does an insert look like in our binary tree model? Following the conditions of a binary heap, we have to insert the new object at the deepest level, moving in from the left. That corresponds to the node marked X below:

It should be pretty obvious that the object we insert could break the heap property of this tree. Whatever key we end up with, we first need to compare it with its parent node. The parent node's key is 8, so if we insert an object with key <= 8 then we wouldn't have to do anything further. But what if the key is something higher like 15? Then our insertion would break heap condition #2.

In this situation, the next step is to swap our inserted node with it's parent node.

Cool. But now when we check 15 against its new parent, the root, we notice we're still in trouble. So we swap again. These swap acts are also known as "bubbling up" or "heapifying up", as the inserted object is promoted levels due to its priority.

Now that 15 has made it's way up to the root node, we can stop. As you can see, the new tree follows all conditions to be classified as a max heap. Here's the code for the insert operation.

    def insert(self, key):
        self.arr.append(key)
        i = len(self.arr) - 1
        parent = self.parent(i)
        while self.arr[i] > self.arr[parent]:
            self.arr[i], self.arr[parent] = self.arr[parent], self.arr[i]
            if parent:
                i = parent
                parent = self.parent(i)
            else: # parent is root
                break

As you can see, we first insert to the end of the array, then while the inserted key is larger than its parent, we swap it upward until it finds its place.

Extract Max

I believe this is also known as Delete Max, but in either case, this operation removes the max from the heap. And because we're working with a heap, we know the max is the root of the tree (or first element of the array), so we certainly don't need to traverse through the entire structure to determine it.

Now, as mentioned before, the first step is to remove the root node. That leaves us with a tricky situation. What node will replace the root?

Let's go through our options.

Pull up the 5. Well, now we fail both conditions. The root is now less than the right child, and more importantly we have an incomplete binary tree.
Pull up the 8. Cool, our root is the max, but now we have an entire empty branch on the right. That seems like a big problem and we can't just fix it by pulling from an entirely separate branch.
Pull up the 3. Breaks the complete tree again.
Pull up the 4. We still have a complete tree (in terms of a heap at least). We have a bad root but maybe we can shift that node back down.

So the general solution seems to be, promote the last element of the tree/array.

So now the new root is no longer the max. We need to swap it with one of its children. If we swap with 5, we'll still have the same problem. We'll need to swap with 8. This should lead us to the realization that we'll always want to swap with the higher of the two children, otherwise we'll just be making more of a mess of things.

Now we have our max extracted heap!

Here's the code:

    def extract_max(self):
        self.arr[0] = self.arr[-1]
        del self.arr[-1]
        i = 0
        left, right = self.children(i)
        while right < len(self.arr): # if right exists, so does left
            if self.arr[i] < self.arr[left] or self.arr[i] < self.arr[right]:
                if self.arr[left] > self.arr[right]:
                    self.arr[left], self.arr[i] = self.arr[i], self.arr[left]
                    i = left
                else:
                    self.arr[right], self.arr[i] = self.arr[i], self.arr[right]
                    i = right
                left, right = self.children(i)
            else: # we have a heap again
                return
        if left < len(self.arr):
            if self.arr[left] < self.arr[i]:
                self.arr[left], self.arr[i] = self.arr[i], self.arr[left]

I'm not actually returning the max here, but you can build that in pretty easily.

extract_max is a bit more complicated than insert. Swapping the new root down takes a few more comparision steps as we have to decide which child to swap with. We're still within O(logN) time though, since we will never need to swap more than the height of the binary tree.

So what are heaps good for?

In general, when you constantly need to access the min or max of some data, heaps are a logical choice, as you just need to pluck it from the top of the tree/front of the array and do a little rearranging. This means they are useful for tasks like managing bandwith on a router (always send the prioritized traffic first) or handling asynchronous event processing (firing off shortest tasks first).

Heaps are also useful in speeding up certain algorithms that require multiple min or max computations. One example is Dijkstra's shortest path algorithm, where you are constantly computing the minimum path for each node.

So, hopefully you've now got a solid understanding of heaps. Next time you're working on something and you find yourself repeatedly taking minimums or maximums, you should give them a try. They will make your life easier.

[heaps] [data structures] [python]

<< Page 3 / 5 >>