Giz Explains: Why Are URLs Full Of Garbage Characters?

9 years ago

August 5, 2015 at 10:00 am

Giz Explains: Why Are URLs Full Of Garbage Characters?

#?&%! Why do some URLs look like cuss words in comic books? We have the answer.

As you probably know, URLs (uniform resource locators) are basically “addresses” for websites. Type one into your browser, and you’re instantly reading news, watching videos, scheduling sketchy tête-à-têtes and so on.

That’s why URLs were originally supposed to be easy to remember. And yet a lot of today’s web addresses are incredibly long and indecipherable. Why does that happen? And what do all those characters mean?

Anatomy of an URL

It helps to start with the basic structure of an URL. We’ll use http://www.gizmodo.com.au as our first example.

http:// is what’s called the “protocol”, which tells your computer how to interact with the server of the site you want to visit. In this case, HTTP tells your computer to expect to receive data that’s been structured for websites.

www.gizmodo.com.au — this part indicates the name of the server you want to interact with. Think of it like a street address or a phone number.

Now, let’s stop here for a sec. Back in the early days of the internet, these basic URL components were enough. At first, web pages were simple documents that linked to each other. Sean O’Connor, lead application engineer at URL-shortening site Bit.ly, recalls:

In this relatively simply world, only so much information is required to reference one page from another: what protocol should I use (http://), what server should I ask (www.example.com), and what document do I want on that server (/articles/cool-info.html).

However, as the web evolved, so did websites’ capabilities, and thus so did URLs. People wanted their computers to do more interesting, dynamic things beyond fetching static pages. And that’s when URLs started to get more detailed.

Anytime you see a “?” in an URL, for example, the characters that follow it is what’s called “query parameters”. With these extra bits of information, the server can respond dynamically, giving you a web page based on what you want to see. It might automatically put your name into a field, or provide relevant links based on your web search.

Hence nowadays, links can be long and full of apparent gibberish. Indeed, there are so many different symbols now be included in URLs, the Internet Society ginned up a handy directory of them all.

URLs in the Modern World

Let’s look at what’s happening with some example URLs after the .com part of the address.

http://www.gizmodo.com.au/tags/giz-explains — Those slashes (/tags/giz-explains) organise the “path” of the request, or where to go within the many files hosted on the server that hosts gizmodo.com.au. The slashes mark hierarchy within the path, sort of like nested folders.

How about this one? I googled “I like Gizmodo” and here’s what popped up:

https://www.google.com/search?q=i%20like%20gizmodo&rct=j

Now this is where things start to get crazy, but the structure probably also looks familiar, no? This is the kind of URL that appears after you initiate a search, and the parameters you set (like, the keywords you search for) show up in the URL, and each is separated with a plus sign. (Remember, all the search parameters in a given URL follow a question mark.)

But, wait! What if your actual search query already has funky, non-alphanumeric characters in it? What happens to the URL then? Does it explode?

Nope. A different special character simply replaces the original. So if you were to google “What is this?” a new character would replace the question mark. Like a per cent sign. We need that question mark to signal in the URL that what follows are search parameters, remember? This is a process called escaping.

Here’s an example: ?term=what+is+this%3F&public=true

“In this case of the ‘what is this?’ value, the question mark would get confusing, given the meaning of question marks within URLs,” O’Connor explains. He continues:

Accordingly, there’s a process called escaping. When you escape, you replace a meaningful character with an alternate representation that won’t cause trouble, but that can be turned back into the original value. Examples of that here are replacing spaces with plus signs and replacing the value’s question mark with %3F.

You might see numbers in a search results URL too. Like, “%20” sandwiched in between words. That’s a form of escaping too — it represents a space.

If you see any equals signs in an URL, they’re for separating keys from values in any key-value pairs, and ampersands separate different pairs. A key-value pair could be like, “page=5”. Here, we’re talking about the “page” of the website as a key, and “5” is the value, or fifth page.

&rct=j — Let’s look back at “I like Gizmodo”. In some cases, like this one, it’s very possible that it’s impossible to figure out what any one chunk of URL vomit can exactly mean. “That being said, it is pretty common for parameters to be used for keeping track of information that only has meaning to the site that is using to them,” O’Connor says. “Accordingly, they may not be publicly documented or explained.”

#section-result — Finally, the pound sign. (Or hashtag, depending on how old you are.) It’s a URL fragment and acts as a caboose. Says O’Connor: “Everything at the end of a URL after a hash sign is special in that it is never sent to the server and it is exclusively used by the web browser. Often this is used to refer to sections within a document but sometimes it can be used for other purposes.”

Static and Dynamic

Now that we’ve got that cleared up, you should know that URLs can be categorized into two types, according to how many crazy characters are included. The two types are static and dynamic.

Static URLs are those that contain only dots, slashes, dashes and underscores. They tend to traffic better than dynamic URLs and rank higher in Google searches, since they’re easier to read and remember.

The wackier dynamic ones are a grab bag: question marks, ampersands, equal signs, exclamation points, asterisks and other keyboard symbols snake their way into these navigation bars. These URLs are impossible to remember, totally unusable in branding campaigns, and generally see lower click-through rates.

I mean, obviously, no one is going to use a dynamic URL from a search query in some kind of marketing mission or plaster it on a business card. But people want to tweet specialised URLs to very specific content, or share it in a presentation and pesky character limits get in the way. When you shrink URLs with Bitly or TinyURL or Google URL Shortener or Ow.ly, those services aren’t getting rid of the goofy characters in dynamic URLs; they simply store that information somewhere else. When a user clicks on the shortened link, they’re redirected to where the original, longer link leads.

It’s a somewhat complicated system, but not one that’s going away or being replaced any time soon. (The Twitter Age’s link shorteners have been the closest thing to a revolution.) And in the future, our direct contact or familiarity with URLs might decrease further, especially since lots of content, like news articles, are shared on Facebook, and other people access sites by browsing social media feeds. (Or in some cases, content is now being directly published on sites like Facebook, which really eliminates the need to manually type in an URL.)

In the near future, URLs could end up like phone numbers: They’re everywhere, we use them every day, but will only know the important ones off the top of our heads.

See How Andor Crafted Its Adorably Anxious Droid in This Exclusive Bonus Clip

Biden Signs TikTok Ban Into Law, but His Campaign Will Continue Using It

This Westworld Auction Suggests the Show Really Is Over Forever

Bioluminescence Is at Least Half a Billion Years Old

Not Cool, The World’s Getting So Hot, Scientists Needed a New Colour

Kogan Is Currently Your Cheapest Option for an NBN 50 Plan

Circles.Life Is Offering $20 for a Whopping 150GB of Data

Grab a Solid Bargain While Samsung’s Portable SSDs Are up to 54% Off

Today’s Best Australian Tech Deals

Southern Phone Currently Has the Cheapest NBN 1000 Plan

Giz Explains: Why Are URLs Full Of Garbage Characters?

Anatomy of an URL

URLs in the Modern World

Static and Dynamic