Quick and Dirty Amazon S3 Integration

Unless you’ve been hiding under a rock, you probably know about Amazon S3 by now. As far as I’m concerned, it’s about as cheap and easy to use as cloud-based storage gets. If you’re writing your own application, there’s a very simple API for getting stuff into and out of it. If you’ve got a baked application, it’s a bit more complicated. Sometimes you want to take advantage of the cost savings and scalability of S3, but can’t (or don’t want to) modify the web application to use the s3 API directly.

I came up a quick and dirty workaround for this by installing a little utility called s3cmd to move the files for me. Here are the steps I took. I’m using Ubuntu on this server, but this should really work with minor adaptation on any Unix variant.

The basic theory of operation is this. The files in a local directory will get synced to s3, then your site will redirect (via .htaccss) visitors to the s3 files instead of the local files.

  1. Have the same setup as me: Apache under Ubuntu.
  2. Set up an s3 account.
  3. Install s3cmd:
    sudo apt-get install s3cmd
  4. Configure s3cmd. It will prompt you for your API keys, which you can get from your account page.
    s3cmd --configure
  5. Create a bucket:
    s3cmd mb s3://mah-bucket
  6. Create a script somewhere called s3sync.sh (or whatever you like). Be sure to change “mah-bucket” to something more meaningful.
    1. #!/bin/sh
    2. s3cmd sync –acl-public /path/to/local/content/folder/ s3://mah-bucket
  7. Make sure your script is executable:
    chmod 0755 s3sync.sh
  8. Run your script. All your images will be copied to your s3 bucket:
    ./wps3sync.sh
  9. Make sure apache has mod_rewrite turned on, then edit your .htaccess file so it includes the lines below. Don’t forget to change “mah-bucket” to whatever you called your bucket in step 6. Also change “url/path/to/folder/” to the url path of the foler from step 6. (Note presence of “^” and lack of slash at the beginning.)
    1. <IfModule mod_rewrite.c>
    2. RewriteEngine On
    3. RewriteBase /
    4. RewriteRule ^url/path/to/folder/(.*) http://mah-bucket.s3.amazonaws.com/uploads/$1 [R,L]
    5. </IfModule>

That’s it! Your site will now redirect requests for the original files to your s3 bucket. The only caveat is that you will need to run your sync script every time you upload new files so they get copied to s3. I run mine manually via ssh, but this could become a pain if there was a lot of new files. One option would be to cron it to run every few minutes, but bear in mind that s3 is billed by the request and it will add up over time. A better way of auto-syncing might be to write a script that polls your uploads folder for changes, then calls your s3 script if it finds something different.

Here’s a modified s3sync.sh that uses md5deep to see if the folder contents have changed. You can cron this as often as you like and it will only talk to s3 if there are actually changes.

  1. #!/bin/sh
  2.  
  3. HASH_FILE=/path/to/home/dir/hashes.txt
  4. HASH_DIFF_FILE=/path/to/home/dir/tmp_diff_hashes.txt
  5. LOCAL_DIR=/path/to/local/content/folder
  6. S3_BUCKET=s3://mah-bucket/
  7.  
  8. if [ ! -f $HASH_FILE ]
  9. then
  10.   echo "\nCreating new hash file $HASH_FILE\n";
  11.   md5deep -rl "$LOCAL_DIR" > $HASH_FILE
  12. fi
  13.  
  14. if [ -f $HASH_DIFF_FILE ]
  15. then
  16.   rm $HASH_DIFF_FILE
  17. fi
  18.  
  19. md5deep -x $HASH_FILE -r $LOCAL_DIR > $HASH_DIFF_FILE
  20.  
  21. if [ -s $HASH_DIFF_FILE ]
  22. then
  23.   s3cmd sync –acl-public $LOCAL_DIR $S3_BUCKET
  24.   rm -f $HASH_FILE
  25.   md5deep -rl "$LOCAL_DIR" > $HASH_FILE
  26. fi
  27.  
  28. if [ -f $HASH_DIFF_FILE ]
  29. then
  30.   rm $HASH_DIFF_FILE
  31. fi

And that’s that! I hope someone finds this useful. If you did, let me know!

Deployment-First Development

In the ever-shifting landscape that is developing web applications, there are a few things that have remained relatively constant:

  1. It will always take longer to develop an application than anyone involved thinks it will.
  2. No matter when your deadline is, you will always run short on time towards the end.
  3. No matter what project management methodology you use, (or what software supports it) at crunch time, people will resort to sticking Post-it notes to your monitor. (…but that’s another show.)
  4. No piece of software will ever be perfect, but any “extra time” will generally be eaten by trying to make it such.
  5. Deployment (putting the application into production) will always be a rush job done at the last minute.

The first four things are more or less laws of nature as far as I am concerned. No matter how hard everyone works or how on the ball everyone is, software is hard to write and it’s even harder to say how long it will take. What isn’t a constant, though, is that last one, despite the fact that most people do it that way. In general, there is a fair amount of wiggle room in the area of what order things get done in.

Generally, when you’re building an app, the minute you have a loose idea of what’s getting built you want to build something and get it in front of people so they can start picking it apart. This idea, in fact, is at the core of a lot of very well-meaning project management methodologies. (I’m looking at you, Agile.) The sooner you can show something working, the better the people who sign the checks will feel about your project. (At least at the beginning, anyway.)

The flaw in this approach is that it gives everyone a false sense of security about the schedule. Even as work proceeds at a brisk pace, most people ignore that most crucial of steps: getting the thing to work in “real-life”. Deployment is left for the last possible second, when the project isn’t quite done, but must be pushed live. There’s generally a lot of hand-wringing and brow-mopping and a lot of impatient people at this stage. (A good PM might even build in “go live” time for this, but it will always be eaten up by last minute development.) In the worst case, there will be no actual deployment strategy and someone from the development team will end up dragging files from one machine to another, manually. Things that worked perfectly on all the developers machines will break catastrophically on the live servers. Someone will have forgotten to purchase a license for some critical piece of software the app needs to run. The firewall ports that the web server uses to talk to the database server won’t be open, and neither will the help desk of the hosting company. All of this is, of course, terrible. To paraphrase Scott Hanselman, “You get to drag files from your local machine to a server for deployment once. After that, if you’re not typing ‘deploy’ somewhere to push your application, YOU ARE STUPID.”

The second worst case is that an already under-the-gun developer will be pulled off some other very important task (you know, like, finishing the damned thing) to crap out some way of deploying the app to production. It will often end up being a brittle, undocumented kludge that some poor intern will have to sift through later. Everyone will hate it and it will reflect poorly on the team that spent six months developing an otherwise amazing application.

Why is deployment (and the setup therein) such a loathed task? I would say it’s because it’s a necessary but highly unglamorous task. It’s also one that very few people will get. Show your client the cool new feature you just added to the image gallery and they’ll buy you a pizza. Show them your one-button application deployment and they’ll give you a blank stare. For this reason, deployment has always been a last-minute rush job.

I would like to propose a better way. Deploy first. By “first”, I mean that as soon as you have checked in the very skeleton of your application into your version control system, figure out how and where and by what method you will put the thing into a production environment. Then do it. Do it a bunch of times. Get it so that it’s a completely natural, totally painless process that you don’t even have to think about. While it’s my opinion that a deployment-first strategy is good practice for any project, I think it’s absolutely critical to any project using a new or unfamiliar development platform. (Also, as an aside, if you do not have a version control system, please stop reading this. Clean out your desk, leave your ID card with security, walk out of the building and don’t look back. Seriously. Shame on you.)

To deploy your app, you will generally need to take care of all (or most) of the following:

  • setup and configuration of production machines (be they real or virtual)
  • installation of prerequisite software for your application (web servers, application platforms, RDBMSes, etc.)
  • load balancing
  • network configuration
  • development of a deployment system

If you get this all squared away first, you can merrily write code up until the last possible second, because when you’re done, you push a button and the site goes live. Maybe there will be a speed bump or two in that process, but you won’t be starting from scratch.

Make no mistake. Writing software may be hard, but deployment is always harder. It also (generally) involves coordinating lots of different people, often at different companies, many of whom will not share your sense of urgency. Deploying first exposes a bunch of issues that you can square away long before they become emergencies.

That last item on my list there is often a source of confusion for a lot of people. Why would you “develop” a deployment system? There’s a lot of tools that do this already, right? Well, sure, but you have to set them up and configure them, and in most cases, write scripts that will tell them what to do. This is often a lot less trivial than the makers of those tools would have you believe. Take the time and figure out what you’re going to use and how your going to use it before you even really start developing.

As for how you go about building deployment systems, there are lots and lots of options. The best one for your project will largely depend on what you’re using. In  most cases, you can get your build tool (Ant, Maven, NAnt, Capistrano, etc.) to do it for you. In a lot of cases this can get overly complicated, though. My personal favorite way is to write a short, project-specific shell script (though I recently started using Python for this) with good inline comments. Whatever you use, here’s a loose set of steps for any deployment script:

  • Pull a copy of the latest code from the version control system (never deploy off of your current development tree)
  • Increment the version number (I like using a text file called VERSION in the root of my project)
  • tag (or whatever your VCS calls it) the revision with your version number
  • copy required libraries into the build tree
  • create compressed archive of your build
  • transfer the archive to the production system
  • unpack the archive into the right place (via remote, perhaps with passwordless SSH or Powershell. Some build tools have this functionality as well.)
  • perform any required administration tasks that a new build would need (update database structure, restart web server, etc.)
  • email a notification that a build was pushed out

You may need more steps, you may need fewer. Whatever you can reasonably automate, do it. Every step you make automatic will be one less manual headache at the 11th hour. Also, the more manual steps you remove, the more you reduce the likelihood of human error and dependency on a particular human to do deployment.

I hope this will inspire you to think about incorporating deployment-first development into your next project. I can tell you that doing things this way has greatly improved my life as a developer, especially at crunch time.

The Hundred Pound Countdown

Here’s a picture of me, taken a few days ago:

Here’s a picture of me taken about a year and a half ago:

The first picture is the heaviest I’ve ever been. 280 pounds. The second picture is the thinnest I’ve been in recent history, probably about 240. I was still overweight, but not nearly as bad as I am now.

Why am I telling you this? Because today is the beginning of a new era for me. One in which I am not fat. One where I do not get exhausted after half an hour of moderate activity. One where my back doesn’t hurt all the time. One where my clothes fit. One where I am not cruising headlong for Type 2 Diabetes and early death. I am telling you this because I am going to use fear of public humiliation as my primary weight-loss tool.

The Hundred Pound Countdown

As previously stated, I weigh almost 300 pounds. I really should weigh more like 200 pounds. That’s a difference of (wait for it) like 100 pounds. Just for the sake of argument, let’s call my goal weight 200 pounds, meaning I’ve got to lose like 80. Okay, so not quite 100, but “eighty pound countdown” doesn’t have the same ring to it. So, inspired by this guy, here’s my plan:

  1. I will get down to 200 lbs and stay there.
  2. I will weigh myself every Monday and post it on Twitter. (#100lbcountdown)
  3. I will take a picture of myself every Monday and post it on Twitter. (#100lbcountdown)
  4. I will only snack on fruits and vegetables.
  5. I will eat three meals a day, and they will be awesome and delicious.
  6. I will not eat after dinner.
  7. I will not eat crap in the form of processed food or fast food.
  8. I will walk outside for no less than 30 minutes every day.
  9. I will take a fiber supplement (because, well, you know.)
  10. If I break a rule, I will post the infraction on Twitter. (#100lbcountdown)

Do Me a Favor

Don’t “encourage” me. F@%k encouragement. What I need right now is derision. Tell me what a fat, unhealthy slob I am. Mock me mercilessly when I cock this up (as I inevitably will). This isn’t something fun I’m doing just for the hell of it. I’m fat and sad and I don’t want to live with it or die from it. Here’s hoping my ego is bigger than my paunch.

Fridgetoons: Magnets and Markers

I seem to have a sickness that causes me to start new web projects. The latest symptom of this is something I’m calling fridgetoons. Essentially, I have a bunch of magnetic letters on my fridge next to a whiteboard. My housemates come along and arrange the letters in to a caption. I then illustrate the caption on the whiteboard. I take a picture of the results and post them on the Internet. Once in a while, they end up being funny.

It’s been a pretty satisfying project so far, mainly because I can draw them quickly and they aren’t “precious”. My earlier attempts at comics failed because it took me more than a week to do each page. I suppose if the outcomes were awesome, I could have justified it (there are those that can) but the results were always sub-par. These comics, on the other hand, are intentionally sub-par, which is somehow psychologically liberating, resulting in more output. This is all to say that I’ve produced quite a few of them in a pretty short period of time. There’s even an accidental guest strip by a totally famous web cartoonist who happened to be getting something out of my fridge.

I Can’t F#%king See: Part II

New GlassesWell, the votes are in and I have returned from the optometrist and spectacle monger. I more or less went with #5 from the poll. It’s like my grandfather always said, “Don’t buy glasses until you ask the Internet.” Okay, he never said that, but he would have if he had been around for the Internet. As it stands, he sort of just missed it.

I really like how these look, if I do say so myself. Some kind of weird cross between Henry Rollins in Johnny Mnemonic and Ray Smuckles in every Achewood strip since 2004. I guess I most closely resemble M.C. Serch, which is pretty rad, considering that he has basically been my hero since middle school. He is, even still. Even after The (White) Rapper.

I suppose that while this is clearly a win for fashion, the really awesome part is that I can actually use a computer without getting a pounding headache.

So thanks for telling me which pair of glasses to buy, Internet! I won’t forget this!