The trouble with URL shorteners
Lately I've been spending some time thinking about URL shorteners. The interest arose over a recent weekend as I sent someone shortened URLs to a couple of sites, that I had bookmarked a few years ago. I'd inadvertently bookmarked the shortend links rather than the long URLs, and my reciepient complained that they couldn't read the content I'd sent them. Some investigation later, I realized that their mail system may have been blocking the URL. In addtion, one of my other shortened links had 'expired'. The short URL no longer worked. That sparked off a series of technical explorations that are now the subject of this blog post, and the next few to follow.
A brief history of link shortening
Wikipedia reveals that TinyURL was the first link shortener, created by Kevin Gilbertson in 2002, to link directly to newsgroup postings that had long and cumbersome URLs. URL shorteners are now a ubiquitious feature of Internet, and in particular the social media landscape. Twitter would be almost unusable without them.
But as Joshua Schachter, the founder of Delicious, pointed out in a blog post, circa 2009, URL shorteners cause a number of problems. In that post, Schachter identified three parties involved in the creation of a shortened link: the reader who clicks on the short link, the intermediary who hosts the short link on their site, and the publisher of the long link to which the short link points. There are two other players implicit in the link interaction: the creator of the short link, and the URL shortening service itself.
With the exception of the last participant, the url shortening service, the other partcipants who rely on a shortened link suffer in some way. And and since 2009, the list woes to those participants has grown longer. But some solutions have also emerged, along with stronger reasons to use URL shorteners.
Problems with URL shorteners
Shortened URLs usually remove any context about what the destination links contain. This fact is often used by spammers to fool the reader into clicking on irrelevant links, or worse, links that contain material that would be pornographic or, in some way illegal in the reader's jurisdiction. As a consequence, readers cannot a priori trust a short link.
Degradation of browsing performance
The insertion of an intermediate link that introduces a redirection, adds several DNS lookups and HTTP round trips to the process of fetching the content of a page. Sometimes,there are several layers of link shorteners, because multiple entities want to track click-throughs on a link. This can slow down link traversal by an order of magnitude or more, compared to clicking on the original long URL.
One of the key innovative leaps of Tim Berners Lee's World Wide Web, was that, unlike hypertext systems that preceeded it, and the focus of academic research of that era, the Web made no attempt to resolve the problem of broken links. It just side-stepped the issue. Hence the famous 404, Not found error. If a link died, the content associated with that link just disappeared. But most links did not die during the lifetime of their use. So the fact that eventually every link would break because the entity serving that URL would cease to exist, did not matter most of the time. The URL shortener unfortunately, dramatically compounds the problem of entity death. Now, not only are you, the reader, susceptible to losing content when the original link ceases to be served, you also lose your content when your URL shortener ceases to function (and this happens with surprising frequency.) If you are unfortunate enough to have multiple URL shorteners in your path the chances of link breakage are much higher.
Short URLs die a for a variety of reasons.
- URL is expired by the shortener service: This may be intentional, and desired by the short link creator. Sometimes, the creator wants the links to expire after a certain time. At other times it might be a consequence of the URL shortener's policy: they may expire links for non-paying customers, etc.
- Shortner service shuts down for economic reasons: Some times the entity operating the URL shortener shuts down or goes of of business. Many people believe that using the URL shortener of a large company like Google's goo.gl shortener will avoid this problem. But as the shutdown of Google Reader showed, even a much loved, popular service, from a big company, is not immune to shutdown, especially if the economics don't work out. And investments by venture capitalists to the contrary, the economics of URL shortening are quite fickle.
- Top-level domain(TLD) issues: At other times the URL shortener's domain registrar withdraws their registration or some other force majure event occurs with the TLD. Many URL shorteners have exotic top-level domains, like bit.ly (.ly is the top level domain for Libya.) is.gd (.gd is the top level domain for Grenada.) This keeps the URL base short. Unfortunately, unlike well known top-level domains like .com, .org, or .edu, less well known TLDs, and country specific TLDs suffer from regulatory instability. In bit.ly's case there was a concern that the Libyan government at the time, which contolled the .ly domaian, would revoke the company's domain registration. With is.gd, there is an ongoing dispute over the registrar for the domain (as of the time of this article's writing.)
A link is blocked when it either cannot be posted to an intermediary's website, or clicking on such a link leads to a browser error for the reader. This can happen for a few reasons:
- Intermediary blocks short URLs: This was not quite a problem in the early days of the URL shortener, but as link spam became a bigger problem, websites like blogs and even email systems that accept and display short URLs, started to block URLs from shortening services that they believed contained spam. This is the easiest way for an intermediary to deal with the spam problem. It is also the most reader unfriendly, and as consequence not good business for the intermediary. Still it's quick and easy, and many sites find it expedient to do this. The only way around for a short link creator who wants to post a link to such sites, is to either use a short link from a URL shortening service that is not blocked, or not use a short link at all.
- Rate limting or throttling by the URL shortening service: To tackle spam, without ham-handedly blocking all URL shorteners, many websites 'unravel' the redirect path to the original long URL and post that. Some websites go even farther and reshorten the URL with a different service, one that they either control, or have approved for use on their site. This has the effect that the URL shortener service sees a lot of traffic from the intermediary website's servers. As a consequence, the URL shortening service may throttle (rate limit) or even block such queries in the mistaken belief that a denial of service attack is in progress.
- Threats to the URL shortener's business model: The act of 'unraveling' a short URL, as described above removes the URL shortener from being in the path for all subsequent clicks to the underlying long link. For URL shorteners whose business model depends on selling click-through statistics, this action on part of the intermediary seriously undermines their existence. They may respond to the threat by blocking the intermediary's servers from querying their shortening service.
Why use a URL shortener
To shorten links duh! Obviously the primary use of a URL shortener is to shorten links. But URL shorteners have evolved to offer many more services. Some of these services are described below:
- Memorable short links: In addition to shortening a link, many URL shorteners allow the user to choose short memorable phrase for their link. E.g. this page's URL http://vepa.in/technology/the-trouble-with-url-shorteners could be shortened to: http://vepa.in/shorturls. This is particuarly effective when combined with a whitelabel shortening service as described below, as it allows the short phrase used by one customer to not conflict with the same phrase used for a different pupose by another customer.
- Tracking click-throughs on links: Although URL shorteners had their origins with link shortening, they are increasingly used by link creators (mostly marketers) to track traffic that passes through links the post on social media. If the link is to a site that the creator does not control, then the URL shortener is cheapest way for them to determine how effective their social media activities are. Websites like bit.ly even provide APIs to allow users to integrate their link traffic data into other metrics.
- QR code tracking: A variant of link tracking is QR code tracking. Quick Response(QR) codes are bar code like images that encode some information (typically a URL,) that can be read by QR code scanner app on any smart phone with a camera. The first figure shows the QR code encoding the URL for home page of this site, while the second figure represents the encoding for the longer URL to this page. As can be seen, the QR code image gets more and more complex as the length of the URL increases (as it has to encode more text in the same area.) Hence using the shortest possible URL is beneficial. This makes short URLs ideal for QR codes. As a result, URL shortening services often, also provide QR code generation services along with URL shortening.
- Content and audience statistics: A sufficiently well known URL shortener is likely to be used extensively by many millions of readers directly on indirectly. This gives the URL shortener the ability to infer readers' content preferences, that can then be sold to marketers. When combined with audience tracking data from other sources, this can make it easier for a marketer to build good user profiles. After the need for short URLs, and to track click-throughs on links, this is by far the biggest reason for using a URL shortener.
- Whitelable shortening services: The URL shortener can offer short links to a branded domain. E.g. the short URL of the New York Times, nyti.ms may actually be powered by bit.ly.
Mitigating problems of URL shorteners
Fixing link opaqueness
The problem of opaque links ultimately boils down to the trustworthiness of the short URL creator. If the creator is trusted by the reader, the latter should not be concerned about clicking on a short URL posted by the former. Unfortunately, only clicking on links posted by trusted people is not a scalable option for the Web. There has to be some way to establish transparancy. There are a few ways, described below.
- Link unraveling by the intermediary on the server side: As discussed before websites can unravel links, but this comes with the risk that the URL shortener will block them.
Fixing broken links
The most effective way of fixing the problem of broken short URLs is for the publisher of the long URL to also be the operator of their own URL shortening service. This ensures that the lifetime of the long URL is exactly the same as the short URL. Large companies typically implement their own URL shorteners. Yahoo!,for example, uses its own URL shortener, yhoo.it, to post links to its own websites on social media. Writing a URL shortener is not complicated, and there are open source libraries like YOURLS that implement URL shortening (YOURLS is a PHP library.) However implementing the full range of analytics provided by URL shortening services may be more difficult. Furthermore, very few publishers will have the scale to gather statistically significant profiles about their readers content preferences.
Fixing degradation in browsing speed
The key source of performance degradation introduced by the URL shortener is the need for additional DNS lookups and HTTP round trips (via a HTTP 302 redirect) that a shortening service imposes. This can be reduced significantly if the publisher implements thair own shortening service as described above. Some magic with the mod_rewrite Apache module might also avoid the need for a HTTP redirect. Although, if naively done, such an implementation might have an adverse impact on rankings in search engines.
URL shorteners create come with a bunch of problems, but they offer compelling value the to the social media ecosystem, and are easy enough to create. Indeed there are nearly a hundred public URL shortening services on the Web today. In subesequent posts I hope to analyze the economics of URL shortening services, and the design and implementation of such a service.