Free Professional Web Development

I'm pleased to announce my immediate availability for some freelance Web development projects. Lately, I've been quite swamped with a very awesome project at my day job. Many things had to be put on hold as a result of that project. However, it looks like the worst is now behind me, and I find myself with much more free time than before!

As such, I'm opening the doors to any and all (except full-time, because I love my day job) Web development opportunities. I guarantee absolute satisfaction and reliable, efficient Web sites, and I deliver quickly. All you need to do is provide the requirements for your project in the comments section of this article. I'll pump out at the very least a proof of concept that becomes your property (free of charge), no questions asked.

You will not be disappointed. Use this as an opportunity 1) to get free stuff and 2) to test your requirements-gathering/communicating skills. Let the battle begin!

Follow-Up to Weighted Sorting in Python

The activity on my latest blog post has been tremendous! I never expected that much activity within an hour or two of posting the article.

The aim of this article is to provide an alternative solution to my weighted sort when you're after increased performance. It might just be useful to those who came here in search of a way to do weighted sorting in Python. I need to give a shout out to Jeremy Brown for suggesting this solution. He's so awesome :P

While the example I posted in my previous article addressed my needs just fine, it is definitely not the fastest option. A better solution would be to completely remove the special IDs from the object list altogether and just place them in front of the list:

import itertools
import random

object_ids = [random.randint(0, 100) for i in range(20)]
special_ids = [random.choice(object_ids) for i in range(5)]

not_special_ids = (i for i in object_ids if i not in special_ids)
for i in itertools.chain(special_ids, not_special_ids):
    # do stuff with each ID
    pass

This solution is quite different from my weighted sort, as there's no sorting going on at all, just a simple generator and using itertools to chain two collections together.

Here's a way you can benchmark see which solution is faster:

from copy import copy
import cProfile
import itertools
import random

object_ids = [random.randint(0, 100) for i in range(20)]
special_ids = [random.choice(object_ids) for i in range(5)]

ITERATIONS = 1000000

def sorting():
    for i in xrange(ITERATIONS):
        l = copy(object_ids)
        l.sort(key=lambda i: int(i in special_ids) * -2 + 1)
        for i in l:
            pass

def chaining():
    for i in xrange(ITERATIONS):
        l = (i for i in object_ids if i not in special_ids)
        for i in itertools.chain(special_ids, l):
            pass

cProfile.run('sorting()')
cProfile.run('chaining()')

Sample output on my box is:

$ python weighted_sort.py
         24000003 function calls in 18.411 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   18.411   18.411 <string>:1(<module>)
  1000000    0.580    0.000    0.580    0.000 copy.py:112(_copy_with_constructor)
  1000000    0.791    0.000    1.510    0.000 copy.py:65(copy)
        1    1.397    1.397   18.411   18.411 weighted_sort.py:11(sorting)
 20000000    8.907    0.000    8.907    0.000 weighted_sort.py:14(<lambda>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  1000000    0.139    0.000    0.139    0.000 {method 'get' of 'dict' objects}
  1000000    6.597    0.000   15.503    0.000 {method 'sort' of 'list' objects}


         16000003 function calls in 7.381 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    7.381    7.381 <string>:1(<module>)
        1    2.744    2.744    7.381    7.381 weighted_sort.py:18(chaining)
 16000000    4.636    0.000    4.636    0.000 weighted_sort.py:20(<genexpr>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

So, you can see that the chaining solution is easily twice as fast as the sorting solution over 1 million iterations. Both of these solutions work perfectly well for my purposes, and I will probably end up switching to the chaining solution sometime in the future.

EDIT After reading lqc's comment on my previous article, I've decided to update this one with more appropriate benchmarks. The information that lqc has shared makes the speed of these solutions much closer.

Here's my updated test script:

$ python weighted_sort.py
         4000003 function calls in 8.437 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    8.437    8.437 <string>:1(<module>)
  1000000    0.558    0.000    0.558    0.000 copy.py:112(_copy_with_constructor)
  1000000    0.741    0.000    1.431    0.000 copy.py:65(copy)
        1    1.319    1.319    8.437    8.437 weighted_sort.py:11(sorting)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  1000000    0.133    0.000    0.133    0.000 {method 'get' of 'dict' objects}
  1000000    5.688    0.000    5.688    0.000 {method 'sort' of 'list' objects}


         17000003 function calls in 7.545 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    7.545    7.545 <string>:1(<module>)
        1    2.818    2.818    7.545    7.545 weighted_sort.py:18(chaining)
 17000000    4.726    0.000    4.726    0.000 weighted_sort.py:20(<genexpr>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

So if you only gain a second over 1 million iterations, I think I prefer the sort(key=special_ids.__contains__) solution! I hope these two articles will help you get started on your adventures with handling special objects before others!

Simple Weighted Sort in Python

Last night I found myself in need of a simple weighted sort function in Python. I had a list of integers which represented object IDs in my project. Some of the objects needed to be processed before the others while iterating over the list of integers, and I already knew which object IDs those were. The order the rest of the object IDs were processed didn't matter at all. I just wanted the special object IDs to arrive at the beginning of the list, and the remaining object IDs could be in any order.

I was surprised at how simple it was to produce such a weighted sort. Here's an example of what I did:

import random
object_ids = [random.randint(0, 100) for i in range(20)]
special_ids = [random.choice(object_ids) for i in range(5)]
print 'Object IDs:', object_ids
print 'Special IDs:', special_ids

object_ids.sort(key=special_ids.__contains__, reverse=True)
print 'Object IDs:', object_ids

And some sample output:

Object IDs: [13, 97, 67, 5, 77, 58, 24, 99, 29, 20, 29, 75, 100, 31, 79, 5, 27, 11, 6, 1]
Special IDs: [13, 1, 27, 6, 67]
Object IDs: [13, 67, 27, 6, 1, 97, 5, 77, 58, 24, 99, 29, 20, 29, 75, 100, 31, 79, 5, 11]

Notice that each of the "special" IDs have shifted from their original position in the object_ids list to be at the beginning of the list after the sort.

The Python documentation for sort says that the key argument "specifies a function of one argument that is used to extract a comparison key from each list element." I'm using it to check to see if a given element in the list is in my special_ids list. If the element is present in the special_ids list, it will be shifted to the left because of the way the special_ids.__contains__ works.

In sorting, a value of 1 (or other positive integer) out of a comparison function generally means "this belongs to the right of the other element." A value of -1 (or other negative integer) means "this belongs to the left of the other element." A value of 0 means "these two elements are equal" (for the purposes of sorting). I'm assuming it works similarly with the key argument. Please correct me if I'm wrong!

As lqc states in the comments below, the key argument works differently. It creates a new sequence of values which is then sorted. Before lqc jumped in, I was using key=int(i in special_ids) * -2 + 1 to do the sorting, which is pretty dumb. Using key=special_ids.__contains__ is much more appropriate. Thanks lqc!!

This sort of weighted sort might not be just right for your needs, but hopefully it will give you a place to start to build your customized weighted sort!

PIR Motion Sensor + LCD Screen + Arduino Uno

For one reason or another, I've recently had the urge to follow in my father's footsteps and start tinkering with electronics. He's basically a wizard. He has fixed a lot of appliances that others tossed out the door without the slightest bit of investigation. I think he did this with the monitor I had hooked up to my very first computer. He just popped open the case, found the problem, soldered some solution in there, and that monitor worked for probably a decade afterwards. Amazing stuff.

Anyway, I have an itch to create a laser trip wire (don't ask). I took a basic electronics class my freshman year in college, so I am already familiar with resistors, capacitors, ICs, breadboards, soldering, etc. It doesn't seem like a very daunting task to create that trip wire, but I'm starting fresh with electronics (it's been a good 8 or 9 years since I last soldered or anything like that).

One of my co-workers mentioned the Arduino as something to get me started on the trip wire, so I started doing some research here and there. The more I read, the more excited I got. I wanted soo badly to buy all of the junk that I'd need to start tinkering with an Arduino, but I didn't think my wife would appreciate that--especially around Christmas time.

Several of my awesome relatives sent me very generous gift cards for Christmas this year, which finally gave me the opportunity to buy a load of stuff for the Arduino without feeling bad. So I ordered loads of stuff. Most of it is here, some of it is still on its way. My Arduino Uno arrived around noon today, and I haven't been able to stop tinkering! It's really a lot of fun, and stupid easy even if you're not a programmer!

I started with the basic "oooh, blinking LEDs!" sort of projects (or "sketches" in Arduino parlance). Then I moved on to tweak those to work with multiple LEDs. There were a few different scenarios I ran though because two of my LEDs weren't lighting up very well. I guess I either hooked them up wrong or I didn't seat them properly... whatever.

The next project was when I hooked up a PIR (passive infrared) motion sensor ($9.99 at RadioShack) and installed a demo sketch that I found on the Arduino wiki. I wasn't quite sure how to wire everything for the PIR sensor, so I took a look at this Make Magazine video to see one way of setting up the circuit. That one simply lights up an LED and makes some noise. I just made mine light up and LED since I don't have a buzzer (yet).

Next I moved on to the LCD. The package came with a 20x4 green-on-black backlit LCD display, a 2.2k Ohm resistor, and a set of pins to connect the LCD to my breadboard or whatever. I actually soldered the pins onto the LCD display board (wow, was that even more difficult than I remember!) so I would have less of a hassle getting all of the contacts working. Getting the LCD to work was pretty easy after that.

Then I took those two projects a bit further. I'm certain others have done this years ago, but I didn't use a sketch that was completely written for me this time so I felt special enough to share :) I added the PIR circuit from before less the LED, merged pieces from both demo programs, and came up with a motion sensor that would flash "INTRUDER ALERT!!" on the LCD screen a few times when triggered. Based on my testing, the sensor works extremely well (once calibrated!!), and it will detect motion well across the open area of my apartment (maybe a bit further than 20 feet)!

I'm quite happy with my work, especially for not having "dealt with" electronics for such a long time. When I tried to share my success with some of my friends, they wanted a video to prove I'm awesome (I guess?). So here it is. Trust me, I know the video is horrible--I speak too quietly (baby sleeping in next room), I stumble over my words (that's just me), and I neglected to even offer you some music to rock out to as I blabber about this stuff.

Here's the program I wrote/modified:

 /*
  * //////////////////////////////////////////////////
  * //making sense of the Parallax PIR sensor's output
  * //////////////////////////////////////////////////
  *
  * Switches a LED according to the state of the sensors output pin. Determines
  * the beginning and end of continuous motion sequences.
  *
  * @author: Kristian Gohlke / krigoo (_) gmail (_) com / http://krx.at
  * @author: Josh VanderLinden / codekoala (.) gmail (@) com / http://www.codekoala.com
  * @date:   3 Jan 2011
  *
  * kr1 (cleft) 2006
  * Released under a creative commons "Attribution-NonCommercial-ShareAlike 2.0" license
  * http://creativecommons.org/licenses/by-nc-sa/2.0/de/
  *
  *
  * The Parallax PIR Sensor is an easy to use digital infrared motion sensor module.
  * (http://www.parallax.com/detail.asp?product_id=555-28027)
  *
  * The sensor's output pin goes to HIGH if motion is present. However, even if
  * motion is present it goes to LOW from time to time, which might give the
  * impression no motion is present. This program deals with this issue by
  * ignoring LOW-phases shorter than a given time, assuming continuous motion is
  * present during these phases.
  *
  */

 #include <LiquidCrystal.h>

 int calibrationTime = 10;   // seconds to calibrate PIR
 long unsigned int pause = 5000; // timeout before we "all" motion has ceased
 long unsigned int lowIn; // the time when the sensor outputs a low impulse

 boolean lockLow = true;
 boolean takeLowTime;

 int flashCnt = 4;  // number of times the LCD will flash when there's motion
 int flashDelay = 500; // number of ms to wait while flashing LCD
 int pirPin = 7;    // the digital pin connected to the PIR sensor's output
 int lcdPin = 13;   // pin connected to LCD
 LiquidCrystal lcd(12, 11, 10, 5, 4, 3, 2);

 void setup() {
     // Calibrates the PIR

     Serial.begin(9600);
     pinMode(pirPin, INPUT);
     pinMode(lcdPin, OUTPUT);

     digitalWrite(pirPin, LOW);

     clearLcd();

     // give the sensor some time to calibrate
     lcd.setCursor(0,0);
     lcd.print("Calibrating...");

     for(int i = 0; i < calibrationTime; i++){
         lcd.print(".");
         delay(1000);
     }

     lcd.print("done");
     delay(50);
 }

 void clearLcd() {
     // Clears the LCD, turns off the backlight
     lcd.begin(20, 4);
     lcd.clear();
     digitalWrite(lcdPin, LOW);
 }

 void alertLcd() {
     // Turns on the LCD backlight and notifies user of motion
     lcd.setCursor(2, 1);
     digitalWrite(lcdPin, HIGH);
     lcd.print("INTRUDER ALERT!!");
     lcd.setCursor(2, 2);
     lcd.print("================");
 }

 void loop() {
     // Main execution loop

     if(digitalRead(pirPin) == HIGH) {
         // flash an alert a few times
         for (int c = 0; c < flashCnt; c++) {
             alertLcd();
             delay(flashDelay);
             lcd.clear();
             delay(flashDelay);
         }

         if(lockLow) {
             // makes sure we wait for a transition to LOW before any further
             // output is made:
             lockLow = false;
             delay(50);
         }
         takeLowTime = true;
     }

     if (digitalRead(pirPin) == LOW) {
         clearLcd();

         if (takeLowTime) {
             // save the time of the transition from high to LOW
             // make sure this is only done at the start of a LOW phase
             lowIn = millis();
             takeLowTime = false;
         }

         // if the sensor is low for more than the given pause,
         // we assume that no more motion is going to happen
         if (!lockLow && millis() - lowIn > pause) {
             // makes sure this block of code is only executed again after
             // a new motion sequence has been detected
             lockLow = true;
             delay(50);
         }
     }
 }

Django-Tracking 0.3.5

I've finally gotten around to looking at a bunch of tickets that had been opened for django-tracking in the past year and a half or so. I feel horrible that it's really taken that long for me to get to them! Every time I got a ticket notification, I told myself, "Okay, I'll work on that this weekend." Many have weekends have passed without any work on any of my projects. I'm going to get better about that!

Anyway, several fixes have gone into the latest version of django-tracking. Some have to do with unicode problems (thanks ramusus!). Others have to do with overall performance, while yet others have to do with overall stability.

The first interesting change in this release is that django-tracking no longer relies on the GeoIP Python API. Instead it's now using django.contrib.gis.utils.GeoIP. I had hoped that this would remove the dependency on the GeoIP C API, but it appears that I was mistaken. Oh well.

Perhaps the biggest improvement in this new release is the use of caching. With caching in place, the middleware classes don't slam the database nearly as badly as they used to. There's still more that could be done with caching to improve performance, but I think what I've got now will be a big help.

Another noteworthy change, in my opinion, is the use of logging. I've sprinkled mildly useful logging messages throughout the code so you can learn when something bad happens that is silently handled. I hope that this will help me improve the quality of the code as it will allow anyone who uses the project (and pays attention to the log messages, of course) to tell me when bad things are happening.

Finally, the packaging code has been updated to be much more simple. Version 0.3.5 has been uploaded to PyPI and is available via pip or easy_install. If you prefer to have the latest copy of the code, the official code repositories are (in order of my personal preference):

I can't wait for your feedback!

VirtualBox Slowing Down Mah Linux (gasp!)

As I booted up my laptop tonight, I noticed that things were extremely sluggish. It didn't take long at all to realize just how painfully slow things were. The problem didn't appear to be during boot, as the boot sequence was about as fast as it usually is. No, the speed was attacked as X was being loaded.

It has been a few days since I had time to use my laptop, and I thought I might have had to do a hard shutdown the last time I used it (for whatever reason). This lead me to wonder if it could have been something with the filesystem being corrupted. Usually a reboot solves these sort of problems for me, so that's what I chose to do. Unfortunately, it didn't do the trick. Upon rebooting, the laptop started up fine during boot, but as soon as X started up everything was slow again.

Some time when I was trying to see why my computer was running so slow, I pulled up the system monitor to see if any processes were obviously hoarding the CPU power. There were a few processes that stood out, and they all started with VBox. The next thing I thought of was the last system update I did. It involved a new kernel, and that last system update was done the last time I used my laptop.

It dawned on me that I hadn't recompiled my VirtualBox drivers since I did my system update. I kicked off the usual /etc/init.d/vboxdrv setup command. As soon as it was done, my computer was all the sudden very responsive--the way it usually is.

Django-Articles 2.1.1 Released

I've been working on some neat changes to django-articles recently, and I've just released version 2.1.1. The most noticeable feature in this release is Auto-Tagging. Since I feel like I've described the feature fairly well in the README, I'll just copy/paste that section here.

The auto-tagging feature allows you to easily apply any of your current tags to your articles. When you save an Article object with auto-tagging enabled for that article, django-articles will go through each of your existing tags to see if the entire word appears anywhere in your article's content. If a match is found, that tag will be added to the article.

For example, if you have tags "test" and "art", and you wrote a new auto-tagged Article with the text:

This is a test article.

django-articles would automatically apply the "test" tag to this article, but not the "art" tag. It will only apply the "art" tag automatically when the actual word "art" appears in the content.

Auto-tagging does not remove any tags that are already assigned to an article. This means that you can still add tags the good, old-fashioned way in the Django Admin without losing them. Auto-tagging will only add to an article's existing tags (if needed).

Auto-tagging is enabled for all articles by default. If you want to disable it by default (and enable it on a per-article basis), set ARTICLES_AUTO_TAG to False in your settings.py file.

Auto-Tagging does not attempt to produce any keywords that magically represent the content of your articles. Only existing tags are used!!

I sure had fun programming this little feature. I know it will be particularly useful for my own site.

Another item I'd like to mention about this release: I've finally started using South migrations in this app. This is a move I've been planning to make for quite some time now.

Head on over to http://bitbucket.org/codekoala/django-articles or use pip install -U django-articles (or easy_install django-articles if you must)! Enjoy!

Vim Tip: Global Delete

Today I was asked to help debug a problem with our product's patcher. All of the debug information for the entire product goes into a single log file, and some processes are quite chatty. The log file that contained the information I was interested in for the patcher problems was some 26.5MB by the time I got it.

All of the lines I was interested in were very easy to find, because they contained specific strings (yay). The problem was that they were scattered throughout the log, in between debug output for other processes. At first, I tried to just delete lines that were meaningless for me, but that got old very quickly. This is how I made my life easier using Vim.

It's possible to do a "global delete" on lines that don't contain the stuff you are interested in. The lines I wanted to see contained one of two words, but I'll just use foo and bar for this example:

:g!/\v(foo|bar)/d

This command will look for any line that does not contain foo or bar and delete it. Here's the breakdown:

  • :g - This is the command for doing some other command on any line that matches a pattern
  • ! - Negate the match (perform the pending command on any line that does not contain the pattern)
  • /\v(foo|bar)/ - The regular expression pattern
    • \v - Use of \v means that in the pattern after it all ASCII characters except '0'-'9', 'a'-'z', 'A'-'Z' and '_' have a special meaning (very magic). Basically, it removes the need to escape almost everything in your regex.
    • (foo|bar) - Find either foo or (|) bar
  • d - The command to perform on matching lines, which is delete in this case

So, executing that command in the Vim window with the log file wiped out all of the lines that didn't have my magical keywords in them.

When I showed my co-worker how awesome Vim was, he was mildly impressed, and then he asked, "What about multiline log messages?" My particular case didn't have any multiline messages, but I wanted to figure it out anyway. I haven't been able to figure out an exact method for deleting the lines that don't match, but I have found a way to show only the lines that match:

:g!/\v^".+(foo|bar)\_.{-}^"/p

This command is pretty close to the previous one.

  • :g - Global command on lines that match a pattern
  • ! - Negate the match (seems a little backward this time)
  • /\v^".+(foo|bar)\_.{-}^"/ - The regular expression pattern
    • \v - Very magic
    • ^" - Find a line that starts with a double quote ("). Each of our individual log messages starts with a double quote that is guaranteed to be at the beginning of the line, so this is specific to our environment.
    • .+ - One or more characters between the " and foo or bar
    • (foo|bar) - Find either foo or (|) bar
    • \_.{-}^" - Non-greedy multiline match. Matches any character, including newlines (because of the \_), and continues matching until it reaches the next line that begins with ^". Again, that double quote is specific to our environment. The {-} is what makes this a "non-greedy" match--it's like using *, but it matches matches as few as possible of the preceding atom.
  • p - The command to perform on matching lines, which is print in this case. This brings up a separate little window that displays each match (which is why I mentioned the negation seemed a bit backward to me). Navigation and whatnot in this window appears to be similar to less on the command line.

And there you have it! I hope you find this information as useful as it has been for me!

Quick And Easy Execution Speed Testing

There have been many times when I've been programming, encounter a problem that probably involves a loop of some sort, and I think of two or more possible ways to achieve the same end result. At this point, I usually think about which one will probably be the fastest solution (execution-wise) while still being readable/maintainable. A lot of the time, the essentials of the problem can be tested in a few short lines of code.

A while back, I was perusing some Stack Overflow questions for work, and I stumbled upon what I consider one of the many hidden jewels in Python: the timeit module. Given a bit of code, this little guy will handle executing it in several loops and giving you the best time out of three trials (you can ask it to do more than 3 runs if you want). Once it completes its test, it will offer some very clean and useful output.

For example, today I encountered a piece of code that was making a comma-separated list of an arbitrary number of "%s". The code I saw essentially looked like this:

",".join(["%s"] * 50000)

Even though this code required no optimization, I thought, "Hey, that's neat... I wonder if a list comprehension could possibly be any faster." Here's an example of the contender:

",".join(["%s" for i in xrange(50000)])

I had no idea which would be faster, so timeit to the rescue!! Open up a terminal, type a couple one-line Python commands, and enjoy the results!

$ python -mtimeit 'l = ",".join(["%s"] * 50000)'
1000 loops, best of 3: 1.15 msec per loop
$ python -mtimeit 'l = ",".join(["%s" for i in xrange(50000)])'
100 loops, best of 3: 3.23 msec per loop

Hah, the list comprehension is certainly slower.

Now, for other more in-depth tests of performance, you might consider using the cProfile module. As far as I can tell, simple one-liners can't be tested directly from the command line using cProfile--they apparently need to be in a script. You can use something like:

python -mcProfile script.py

...in such situations. Or you can wrap function calls using cProfile.run():

import cProfile

def function_a():
    # something you want to profile

def function_b():
    # an alternative version of function_a to profile

if __name__ == '__main__':
    cProfile.run('function_a()')
    cProfile.run('function_b()')

I've used this technique for tests that I'd like to have "hard evidence" for in the future. The output of such a cProfile test looks something like this:

3 function calls in 6.860 CPU seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    6.860    6.860 <string>:1(<module>)
     1    6.860    6.860    6.860    6.860 test_enumerate.py:5(test_enumerate)
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

This is useful when your code is calling other functions or methods and you want to find where your bottlenecks are. Hooray for Python!

What profiling techniques do you use?

Whoa! Another Reason To Love Vim

I've been struggling with some misconfigured appliances at work for the past couple of days, and I was getting tired of manually diff-ing things. On a whim, I decided to ask Google if there is a better way. Turns out there is, and it uses what I already know and love: VIM. Here's a command that lets you diff two remote file using vimdiff:

vimdiff scp://user@host//path/to/file scp://user@otherhost//path/to/file

This is going to save me so much time! I hope it is as useful to you all as it is to me.