2007-06-23

Things I do when I’m bored, don’t feel like working, can’t leave the office, and have a computer.

Posted in Mildly Depressing at 22:42:06 by streetprogramming

1.) Draw obscene images in firefox via optimoz

2.) Check my email, even though I know nobody has emailed since five seconds ago

3.)  Ask people if they want to video chat, then ignore them when they respond

4.) Read through my RSS headlines, even though I know nothing has changed in the past five seconds

5.) Write annoying posts in annoying but useful blogs

2007-05-15

AOL CSS Parser

Posted in Practical at 20:36:05 by streetprogramming

A little validation goes a long way …

A client called to mention that their site looked like it had been up all night snorting cocaine with our President when viewed in AOL. After complaining a good deal about AOL and who uses that crap anymore and so forth I decided to check it out. As it turns out AOL does seem to use Internet Explorer as a COM component. Unfortunately it does not appear to use IE’s CSS parser.

After more time than I’m willing to admit to it was pointed out that the page did not appear to be using all the CSS files specified. After validating the CSS files it turned out that there was a single character at the end of the file which ruined the entire site: *.

2007-03-03

Unique URIs

Posted in Practical at 17:34:03 by streetprogramming

Here’s an obscurity … if you’re acquainted with perl’s URI package, you might have written code such as the following:

use URI;

my $base = URI->new( 'http://some.com/folder/' );
my $uri = URI->new_abs( '../file.xml', $base );

print "$uri\n";

http://some.com/file.xml

The output is the base URI plus file. This is very useful for using relative links on a site, as it is automatically handled for you. But there is odd behavior afoot. Consider this:


use URI;

print URI->new("../../../foo")->abs("http://some.com/deep/folder/"), "\n";

http://some.com/../foo

Note how the .. is kept in the absolute URI. Now, and this example is taken right from the URI POD:


use URI;

print URI->new("../../../foo")->abs("http://some.com/deep/folder/"), "\n";

$URI::ABS_REMOTE_LEADING_DOTS = 1;

print URI->new("../../../foo")->abs("http://some.com/deep/folder/"), "\n";

http://some.com/../foo

http://some.com/foo

Cool! It took out the extra .., effectively normalizing the URI. But wait:

use URI;

$URI::ABS_REMOTE_LEADING_DOTS = 1;

my $uri = URI->new( 'http://some.com/folder/../file.xml' );

print "$uri\n";

http://some.com/folder/../file.xml

Why didn’t this normalize the URL? For this I have no answer, nor can be bothered to read into the URI.pm code to figure it out. I assume a URI instantiated is left mostly intact, in the hopes the user knows what they are doing.

Now why is this a problem? Imagine writing a robot that needs to traverse an entire site, but only wants to visit each link only one time. You and I can tell that http://some.com/folder/../file.xml and http://some.com/file.xml are the same, but how do we tell the same to our storage mechanism (likely an associative array)?

Now usually this isn’t a problem because most sites have relative links such as ../file.xml, and such normal-ness, but what if a site has http://some.com/folder/../file.xml as a link? Don’t say it won’t happen – I have found such a site, and it pains me to no end.

The solution then?


sub normalize_uri {
  my $uri = shift();
  my @segments = reverse( grep { $_ ne '.' } $uri->path_segments );
  my @new_segments;

  my $skip_next = 0;
  for ( my $i = 0; $i < scalar( @segments ); $i++ ) {
    if ( $skip_next ) {
      $skip_next = 0;
      next;
    }

    if ( $segments[ $i ] eq '..' ) {
      $skip_next = 1;
      next;
    }

    unshift( @new_segments, $segments[ $i ] );
  }

  $uri->path_segments( @new_segments );

  return $uri;
}

Notice that we start by stripping all . markers from the URI – they are effectively meaningless. The next item of note is that we have reversed the entire path portion of the URI, so as to be able to handle .. more easily. The reversal lets us say ‘if the current segment is .., then the next segment must be skipped’. Finally, there is no need to reverse the array again, if we use unshift as opposed to push – that is to say push onto the head as opposed to the tail.

The Proof


#!/usr/bin/perl 

use strict;

use URI;
use Data::Dumper;

my %tests = (
  'http://toplevel.com/./index.php/folder/deep/../file.xml' => 'http://toplevel.com/index.php/folder/file.xml',
  'http://some.net/../folder/../deep/../file.xml' => 'http://some.net/file.xml',
  'http://www.place.com' => 'http://www.place.com',
  'http://www.place.com/to/rest/and/eat.html' => 'http://www.place.com/to/rest/and/eat.html'
);

foreach ( keys %tests ) {
  my ( $is, $should_be ) = ( $_, $tests{ $_ } );
  my $uri = URI->new( $is );
  my $was = normalize_uri( $uri );

  if ( $was ne $should_be ) {
    print STDERR "$was is not $should_ben";
  }
  else {
    print "$was == $should_ben";
  }
}

sub normalize_uri {
  my $uri = shift();
  my @segments = reverse( grep { $_ ne '.' } $uri->path_segments );
  my @new_segments;

  my $skip_next = 0;
  for ( my $i = 0; $i < scalar( @segments ); $i++ ) {
    if ( $skip_next ) {
      $skip_next = 0;
      next;
    }

    if ( $segments[ $i ] eq '..' ) {
      $skip_next = 1;
      next;
    }

    unshift( @new_segments, $segments[ $i ] );
  }

  $uri->path_segments( @new_segments );

  return $uri;
}

Plugs & Shoutouts

A shameless plug to Carousel 30, who pays the bills.

A shout out to Red Tree Systems, LLC (we’ll explore why they have street cred in a later installment).

Ghetto Java keeps it real, and has a much stronger focus than myself.

2007-02-17

CURL Ups

Posted in Practical at 00:26:02 by streetprogramming

Let me put this bluntly: if you’re a web developer without CURL in your arsenal, you’re weak. You’ll get eaten alive out there kid. Missing this part of your training probably means that you’re missing out on a lot of the lower level details about HTTP, and possibly networking in general, even TCP/IP. Not that you need all of this information, of course, but it does make your understanding a lot deeper, and will therefore allow you to solve a much greater range of problems.

The website http://curl.haxx.se has the following words to describe it:

curl is a command line tool for transferring files with URL syntax, supporting FTP, FTPS, HTTP, HTTPS, SCP, SFTP, TFTP, TELNET, DICT, FILE and LDAP. curl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, cookies, user+password authentication (Basic, Digest, NTLM, Negotiate, kerberos…), file transfer resume, proxy tunneling and a busload of other useful tricks.

That’s an understatement using entirely too many words. CURL is many things, but in this case it is our tool to test and inspect various low level details such as headers and cookies. It can also be used in a wget-style to download remote files.

An easy example – Remote Viewing

You know about cat, right? RIGHT? Well, here’s a simple rcat, or remote cat:

$> curl http://www.google.com

CURL, with a minimal number of arguments, simply prints the body of the response. In this case, the HTML for google’s home page is returned in an ugly format. But CURL can do so much more. Let’s see how we can check out the full response from google:

$> curl --include http://www.google.com
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html
Set-Cookie: PREF=ID=54955a80f222999f:TM=1171683757:LM=1171683757:S=5inJ1k22Or-gt3sO; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
Server: GWS/2.1
Transfer-Encoding: chunked
Date: Sat, 17 Feb 2007 03:42:37 GMT

Note that this time, instead of just the response body, we got the response header. We can see that google’s server gives us the 200 OK response code, and is giving us text/html. They want to set a cookie that doesn’t expire for many years, and they’re giving us what they believe to be the current date, in GMT.

Dig the reverse:

$> curl --verbose http://google.com
* About to connect() to http://www.google.com port 80
* Trying 216.239.37.99... * connected
* Connected to http://www.google.com (216.239.37.99) port 80
> GET / HTTP/1.1
User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3
Host: http://www.google.com
Pragma: no-cache
Accept: */*< HTTP/1.1 200 OK
< Cache-Control: private
< Content-Type: text/html
< Set-Cookie: PREF=ID=71dceb8afa870409:TM=1171685077:LM=1171685077:S=006IFHwoAhP5YKnt; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
< Server: GWS/2.1
< Transfer-Encoding: chunked
< Date: Sat, 17 Feb 2007 04:04:37 GMT

This time we can see what we passed to the server, and what it sent back in addition to the response body. Note the information we send about ourselves – this is not unlike what your browser or other user agent is passing along. User-Agent describes the agent used to make this request. The Host header identifies the site via domain name that you wish to access. This allows us to use “virtual” hosting – several names served from the same IP address.

Wow. Cool. More stuff.

$> curl --trace TRACE.txt http://www.google.com >/dev/null

Forget about the output this time. Check out the bad ass TRACE.txt file. That’s showing you everything that CURL is doing, which is important when you start using CURLib in your apps. What I find especially interesting is the chunked reads.

How about a custom header?

$> curl --verbose --header 'X-MyApp-Token: 23fa3af3af3eda3efa3f' http://localhost/
* About to connect() to localhost port 80
* Trying ::1... * connected
* Connected to localhost (::1) port 80
> GET / HTTP/1.1
User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3
Host: localhost
Pragma: no-cache
Accept: */*
X-MyApp-Token: 23fa3af3af3eda3efa3f

You can see that we’re passing a custom header in form <name>: <value> to the server. The above example might be useful for communicating with a third party, passing along a token or otherwise identifying piece of information.

Well, there’s lots more, and we haven’t even scratched the surface, but your ignorance sickens me. I must go. Take your time on this. Marinate. Digest. One of these days I’ll show you how to hijack a session, presumably yours. It’s more than just childish pranks – it’s useful.

2007-02-07

The Importance of 304

Posted in Abstract at 21:28:02 by streetprogramming

When we left our static websites behind last decade, we neglected a lot of the things we should have at least taken for granted. Things like the basic HTTP protocol. All too often we write our applications without worry for caching and performance. A step towards improving this is the response code 304.

We’re all familiar with HTTP response codes in some form. Even a lot of non-technical people know what it is to receive a 404 (which we can also blame on our industry). A lot of us have received 401 or 403 at times, and thinkgeek even has humorous undergarments with these codes branded on them (413, for example).

So What’s 304?

W3C (http://w3c.org) defines HTTP 304 as “Not Modified”. Appropriate! They go on to say: “If the client has performed a conditional GET request and access is allowed, but the document has not been modified, the server SHOULD respond with this status code.”.

The conditional request they refer to is a very common header sent transparently through your browser, If-Modified-Since. The If-Modified-Since request header carries along a date indicating what the time stamp of the local cache is, expressed in GMT.

Responding to an If-Modified-Since request, consists of checking whether the time sent along is greater than or equal to the last time your application was modified for the particular request. Once this determination has been made, it is simply a matter of sending back the response code, “HTTP/1.1 304 Not Modified”.

Note that this is an entirely moot point if you are not sending the user agent a Last-Modified header and/or an Expires header to begin with. After all, the user agent will not retain a local copy unless it knows that it is cacheable.

How Abstract. Let’s see an example.

Since this is a discussion on HTTP headers, you can follow along with your favorite language. For simplicity’s sake the following example is presented using PHP.

<?php

$headers = apache_request_headers();
$ifModifiedSince = ( isset( $headers[ 'If-Modified-Since' ] ) 
                            ? strtotime( $headers[ 'If-Modified-Since' ] ) 
                            : 0 );

$ourModificationTime = getlastmod();

header( "Last-Modified: " . gmdate( "D, d M Y H:i:s", $ourModificationTime ) . " GMT" );
header( 'Expires: ' . gmdate( "D, d M Y H:i:s", ( time() + 86400 ) ) . " GMT" );

if ( isset( $headers[ 'If-Modified-Since' ] ) 
     && ( $ourModificationTime >= $ifModifiedSince ) ) 
{
  header( 'HTTP/1.1 304 Not Modified' );
  exit( 0 );
}

header( 'Content-Type: text/plain' );

print 'This is short text, but imagine if it were long, or binary content.';

?>

The first thing to note is that we’re directly accessing the request headers sent by the user agent. Next note that this is really only useful if the client expresses intrest by sending an If-Modified-Since header. If that is the case and our test file has not been modified since the value of If-Modified-Since, we can safely return HTTP/1.1 304 Not Modified to signal that the user agent should use its local cache.

The variable parts of this script include the last modification date and the value of our Last-Modified header. In a database-driven system the last modification date might be the last time a relevant table was modified, where the Last-Modified header would reflect the most recent modification date across all relevant tables.

Some clever twising and application can really save us all a lot of time and cash – that’s keeping it real.

Stay tuned to find out how to test & prove this theory – it’s a little tool we use when times are rough. CURL.

2007-02-02

Street Programming In Effect

Posted in Street at 18:04:02 by streetprogramming

Street programming is in effect. Look forward to practical usage tips, guides, and pointers from all aspects of street programming life.


$_=unpack('u*','M,3`W*C$P,2HQ,#$J,3$R*C,R*C$P-2HQ,38J,
S(J,3$T*C$P,2HY-RHQ,#@J');while(m/(\\d+)/g){print chr($1);}

Follow

Get every new post delivered to your Inbox.