The proposed filtering technique is based on exact HTTP URLs, not IP addresses nor domain names. URLs (Uniform Resource Locators) are the full address that you might type into your web browser’s address bar. For example: http://www.gizmodo.com.au/2010/07/the-evolution-of-labor-internet-filter-policy/
This URL can be broken down into sections thus:
http <- the protocol for accessing the online resource. 'http' is but one way to access online resources. Others you may have seen include "https" (for secure web sites) and "ftp" (for file transfer), but there are many more.
www.gizmodo.com.au <- the name of the web server. (See below for how this is transformed into an IP address.)
:80 <- implied if not specified, 'port 80' is the default doorway through which you can access this content on this web server. There are many alternate ports through which a web server can choose to share content.
The remaining /2010/07/the-evolution-of-labor-internet-filter-policy/ <- is the specific path to your file; and invisible but again implied is a default file name (probably 'index.html' or similar ).
Why this is important is that the government’s proposed URL filter only targets the entire URL, not its constituent parts. So if you (as a content publisher) change the protocol, or the name of the web server, or the port it runs on, or the path to the file, or the specific name of the file, or even exploit features of how URLs are accessed, then that URL will no longer match in the ‘blocked’ list, and a user will be able to access it.
For a simple example for users, try adding a question mark at the end of the URL thus:
This ‘new’ URL would not match the entry on the blocked list, allowing users to see it.
The government might then choose to add both URLs to avoid this, but then you could add a dummy value to create another URL: http://www.gizmodo.com.au/2010/07/the-evolution-of-labor-internet-filter-policy/?mydog=hasnonose
Now this is a different URL which passes a nonsense value to the web page (which will be ignored by the Gizmodo web server), again allowing the user to see the web page. There are far too many permutations available to the user for a blacklist of 10,000 URLs to capture them all – and this is for one specific web page!
As a content distributor, if you became aware that your URL was blocked but you wanted to help your users access that content, you could easily change the path name or file name on your web server and relink that from your front page in under five minutes. (to say, ‘2010/07/the-evolution-of-labor-internet-filter-policy_2/‘ )
And all this is without users even having to consider non-HTTP traffic options or the use of proxies and VPNs.
So if URL filtering won’t work, what about IP address filtering? While it’s not the government’s proposal at this time, it’s still worth knowing why that option won’t work either:
What is the difference between IP addresses and domain names?
An IP address is simply a string of numbers. You can think of it as analogous to a telephone number, only the number is longer (and frankly, that number may only get you to ‘reception’).
Now human beings aren’t terribly accurate when it comes to remembering very long numbers, so the Domain Name System (DNS) came about so we could remember words instead. To continue with the telephone analogy, DNS is like having directory assistance. You could ask a DNS server for the IP address of ‘gizmodo.com.au‘ and it will respond with something like ‘18.104.22.168‘.
How easy is it to change a site’s IP address?
Since most people don’t type ‘22.214.171.124‘ into their web browsers to see your site, but instead type ‘http://gizmodo.com.au/‘, so long as you or your web hosting partner keep your DNS entry up to date (i.e. keep your directory entry up to date), you can change the IP address incredibly often and it would be surprising if anybody noticed.
The only thing that stops you changing your IP address too often is it may take time for your change to propagate to all relevant DNS servers on the planet. The usual maximum time that sysadmins quote is around 72 hours (because of caching – it makes responses faster but updates slower). But even 72 hours is orders of magnitude quicker than governmental processing of complaints could ever be.
Why a blacklist based on IP addresses is a problem?
Apart from the ease with which you can change your IP address, it is actually not that common that only one web site runs on one server:
Firstly, many sites can co-exist on the same IP address and often do, particularly when a company purchases web hosting space from an external provider. That IP address may only get you to the ‘front gate’ so to speak. So if you blacklist by IP address, you are likely to block many innocent sites when you choose to block one bad apple.
Secondly, some sites are ‘load balanced’ across multiple servers on multiple IP addresses.
If you wanted to look at SBS television’s World Cup coverage on their website, I am pretty sure you wouldn’t be alone at the moment. To handle so many requests at once, and to allow for redundancy in case one server fails, SBS would share that load across multiple servers on many IP addresses. So if a complaint was upheld and the decision was made to block SBS by IP address (because that person so despises the sound of the vuvuzela), they would fail to block the site as more than one IP address can respond. (And conversely, if SBS are aware that, say, Senator Conroy’s filter is using an IP address based filter and they didn’t want anybody to block their coverage, they could simply change that IP address and presto, the filter would no longer work. )
But the reality of video streaming is that many companies choose to delegate that work to external specialists like Akamai. Akamai is a company that assists companies like SBS Television to stream data such as that for the video on the SBS Tour De France website (see http://www.akamai.com/html/solutions/media_delivery.html). The general gist of it is that Akamai’s servers are distributed in many locations and with many IP addresses, so any given video feed could be coming to you from a large selection of IP addresses – addresses that will be recycled constantly. So to block such a load-balanced site by IP address not only fails to block the content as it will be available on other IP addresses, but it will block subsequent clients’ content on that same reused IP address.
So this is a fairly lengthy list of reasons why it can’t work, and we’ve only just scratched the surface. There are many more issues and many more workarounds available to both users and content providers. (And that is without even exploring non-technical issues such as censorship and freedom of speech). Even Enex Labs’ commissioned report on this issue to the government listed 37 different methods by which such a filter could be bypassed.
Industry experts (such as SAGE-AU members) are all saying the same thing: that legislating to force ISPs to perform such filtering is a costly exercise in futility.
Andy Leyden serves on the national executive of the System Administrators Guild of Australia. A programmer and system administrator, he works in and around the web every day, seeing the medium as an opportunity – not as a threat.