In the online world, the URL is ever-present. It is the digital home of businesses large and small, the currency of social exchange, and the thread on which the hyperlinked web exists. There can be few more important building blocks to the Internet as we know it, but all too often the humble URL has been abused or even downright ignored as technologies have pushed their way to the limelight.

The URL has a long history. As with most the building blocks of the internet, the specification for a URL can be found online in form of RFC 1738, a technical document laying out the particular structure of http://domain/path we all know so well. RFC (or Request For Comment) 1738 was published in 1994, and specifies how a URL is composed of the scheme (we know http best), an optional port, the domain, the path of the resource, and any query parameters.

This can be written as <scheme>://<host>:<port>/<path>?<query>

Knowing how to break a URL down into its constituents can be useful — http://www.w3.org/Provider/Style/URI.html for example, many people do not know that the query part (everything after the first question mark) can be further broken down into name=value pairs, separated by ampersand (&) symbols.

As an example, the URL http://www.google.com/search?q=rfc1738&btnI=1 takes me straight the RFC — in this case setting the parameter “btnI” equal to “1” is like pressing “I’m Feeling Lucky” on a google.com search.

The URL is great because it lends itself to being a readable, copyable identifier for bits of content on the web — kind of like cards in a gigantic library index system, allowing us to precisely reference information quickly and easily. However, they have not always been used in this manner and over the years many technologies have either ignored them or abused them, often with significant consequences.

In the ignored category, a front-runner must surely be Flash websites. While Flash does in theory support deep-linking to content within a Flash application, in practice this is rarely done and a Flash site is presented as a single URL, no matter how deep into the (no doubt immersive) content one has travelled. This makes sharing that content difficult, and causes no end of problems for applications that use these sites such as search engines. In turn, Flash has greatly suffered from its opaqueness, leading (in part) to its waning popularity today.

Also in the ignored category is Facebook. Despite the immense volume of content being produced on Facebook, there exists no consistent way to get a link to a post on Facebook, let alone a comment to that post. This criticism could perhaps be extended to most blog comments, but at Facebook’s scale it’s either deliberate or a gross oversight. We should not have to screenshot posts and comments to share them with our friends!

Abusing URLs on the other hand is something almost everyone was guilty of in the past. Ten years ago (before the rise of SEO, Social and Web 2.0) URLs were often very long and difficult to read as developers created frameworks and sites without regard for the aesthetics or utility of their URLs. Luckily however, the need for clean, concise URLs containing relevant keywords became important and URL abuse began to recede. It’s really great to see platforms like Ruby on Rails use thoughtful convention over configuration to make URLs like http://www.mylibrary.com/books/list commonplace.

More recently however, the URL has been under siege from something called the hashbang. These are URLs that look like http://twitter.com/#!/craigraw, containing a hash symbol and a bang (exclamation mark). The interesting thing here is that according to RFC 1738, everything after the hash is not transmitted to the server — hashes are designed to be used locally on a single page to jump to different areas on that page without needing to go back to the server. With Twitter however, the use is entirely different — the JavaScript on the homepage looks at the part after the hash and updates the page dynamically with my tweets. If I change something on the page, the hash can be updated by the JavaScript and if I paste the URL into another browser tab the same updated page should be rendered.

Sounds fine, except when I share a hashbang link with a friend who’s on mobile, and the server redirects him to the mobile site. Since the server doesn’t receive what comes after the hash, it cannot possibly send my friend to the actual content. (Until recently, Gizmodo suffered from this.) Worse, if you’re a search engine spider without JavaScript, that content is completely unavailable. Google have tried to fix this, but ultimately it’s a fix for a broken way of doing things. The problem lies with abusing the URL — the hash was only meant to be used to jump around a single page, not to identify the different pages in a JavaScript application.

Luckily, in the case of hashbangs, HTML5 has come to our collective rescue and we have the History API, enabling us to have gorgeous JavaScript-enabled applications that still allow deep linking to content with normal URLs. Given the importance of the URL, it’s a critical feature as we move to a world of JavaScript-driven sites and applications.

Hopefully we can learn from history and give the URL the proper respect it deserves. Indeed, we have no alternative as we embrace the concept of a semantic web and all the benefits it promises. In closing, consider these three points:
o Ignore the URL at your peril. History has shown that technologies that do so are ultimately swimming upstream.
o Cool URLs don’t change. Tim Berners-Lee made this point back in 1998, and it’s still true today.
o Consider URLs modern day business cards. They represent you, your products and your data — make them usable and they will more likely get used.

READ NEXT

Craig Raw

Craig Raw

Craig Raw is CTO of Quirk Marketing Agency. A youth of programming, starting with BASIC on a ZX Spectrum, led to a career of software architecture and development. Craig...

Leave a comment