What Did People Use Before Google to Search the Web?

What Did People Use Before Google to Search the Web?
Illustration: Angelica Alzona, Gizmodo
To sign up for our daily newsletter covering the latest news, features and reviews, head HERE. For a running feed of all our stories, follow us on Twitter HERE. Or you can bookmark the Gizmodo Australia homepage to visit whenever you need a news fix.

The year is 1997. You’re wearing whatever people wore back then — some kind of jean jacket, I’m guessing — and talking to your friend about your new favourite movie, the recently-released Mike Myers vehicle Austin Powers. You’re quoting the movie, and your friend thinks this is hilarious. Then things take a dark turn. “I thought Randy Quaid was excellent,” your friend says. “Randy Quaid?” you think, trying hard not to punch the wall. “Randy Quaid wasn’t in Austin Powers.” You try explaining this to your friend — “I believe,” you say tersely, “that you’re thinking of Clint Howard” — but your friend is adamant. To settle this dispute, and salvage what remains of your friendship, you boot up your 41 kg computer tower. Forty minutes later, you have made it onto the internet. The question now is: where do you go? How, before Google, did people settle asinine disputes, and/or find other sorts of information? For this week’s

Christine L. Borgman

Distinguished Research Professor, Information Studies, University of California Los Angeles, and the author of Big Data, Little Data, No Data: Scholarship in the Networked World

In the ‘90s, Yahoo and Altavista did pretty well. But computerised information retrieval is a very old field, dating back at least to the 1950s. The first commercial online remote access systems date back to the early 1970s.

Google did not invent information-retrieval by any means — it built on very old methods of documentation, such as those of Paul Otlet, who invented the Universal Decimal Classification in the 1930s, and was among the parents of modern information science.

The history of online information-retrieval is discipline-specific — very deep specialist indexing in the fields of medicine, metallurgy, materials science, chemistry, engineering, education, the social sciences. We had very good databases online by the early 1970s that were commercially available — you paid by the connect minute.

Some of Google’s most basic principles come out of td-idf, or Text Frequency Times Inverse Document Frequency, a notion that came out of a Cambridge doctoral dissertation in 1958 by Karen Spärck Jones. Her method involved looking for the frequency of a term in a body of work, and dividing that by the inverse of how often the documents occur. She’s really a pioneer, and would later consult for Google, along with many other notable information scholars. Page and Brin were definitely deeply schooled in this history.

Google came out of the Digital Libraries Initiative, a project led by the National Science Foundation and involving 8 or 10 different federal agencies. I had funding from it, and recall the all-hands meeting, at which Brin and Page had a poster proposing Google. I remember thinking: this is really cool, they’ve reinvented bibliometrics for the Web.

Bibliometrics is a means to create links between documents and then follow the network. This method is especially useful to pursue topics where terminology changes over time. For example, if you wanted to find what preceded modern abortion discussions, you’d go to a Roe v. Wade discussion from the mid-1970s and look for everything it cited and everything that cited it, so you can go in both directions.

The Science Citation Index, also begun in the 1950s, brought old principles of library science to modern technology. Bibliometrics and citation indexing are ideas that may be traced back centuries to developments like biblical annotation.

Safiya Umoja Noble

Associate Professor of Information Studies and Co-Director of the UCLA Centre for Critical Internet Inquiry at UCLA, and the author of Algorithms of Oppression: How Search Engines Reinforce Racism

One of the most important dimensions of early internet information sharing was that subject matter experts, from librarians to scholars to expert hobbyists, were harnessed to cultivate and organise knowledge. What this did was make the humans involved in these practices visible, even as AI and search tools were developed. We understood that people power is what made sharing happen online, and we sought to figure out what was credible based on pockets of websites managed by organisations, especially by universities and research organisations.

The first search engines were, in fact, virtual libraries, and many people understood the value of libraries as a public good. As automation increased, and librarians and experts were replaced with AI, we lost a lot. The public good that could have been realised was replaced by massive advertising platforms, like Yahoo! and Google.

Now, expertise is outsourced and often optimised content, paid for by the highest bidder in AdWords. This has led to a big gap between knowledge and advertising in search engines, especially when trying to understand complex issues. In some ways, search has undermined our trust in expertise and critical thinking, backed by investigated facts and research, and left us open to manipulation by propaganda. Search engines may be great in helping us find banal information, but they have also desensitised us to the value of slow, deliberate investigation — the kind that makes for a more informed democracy.

Ian Milligan

Associate Professor, History, University of Waterloo, and the author of History in the Age of Abundance: How the Web is Transforming Historical Research

Google was, of course, not the first search engine for the web. Dating back to 1993, there was the Wandex (or World Wide Web Wanderer) which measured the web and led to a searchable index; to Lycos and Infoseek in 1994 and directories like Yahoo! in 1995.

A lot of these early search engines or directories, however, were fairly clunky. If you were a website creator, you would in many cases have to fill out a form to be added to the directory, or would need to insert fairly cumbersome meta tags into your HTML. By the mid-1990s, as more and more people began to create websites and host them on third-party platforms, they did not always register their sites.

Part of this is because early websites could rely on hyperlinks–far more so than we do today, in our age of search–to bring visitors to their sites.

The WebRing is a great example of this. The WebRing was developed in 1995 by a young software developer named Sage Weil. WebRings were groups of websites that were topically unified. So, people interested in old cars would join an automobile enthusiast WebRing, cat lovers a cat-focused WebRing, and so forth. On the bottom of these pages would be a WebRing interface, encouraging users to go to the “next” site or the “previous” site, or even to an overall index of everybody who had joined the ring.

This was a pretty democratic and accessible method for discovering sites. Anybody could start a web ring, anybody could join one if the administrator thought they fit into the community. Crucially, they formed a new way to connect people. The heyday of WebRings lasted until around 2000, when the technology ended up in the hands of Yahoo! and some management changes ended up alienating users.

I don’t want to be unduly nostalgic: I wouldn’t want to go back to a world where we discovered content mostly through hyperlinks, and I use Google as much as anybody else. But the way that Google works, thanks to PageRank, is that the more links that a site has coming into it from influential venues, the higher up in the search results pages it goes. This has the effect of funelling traffic to a few big winners. If I search for “cats,” I might explore the top dozen or so of almost four billion results. Somewhere in those billions of pages there are undoubtedly cool homepages by people who just really love their cats. In 1998, clicking through a webring, there was a chance I would have serendipitously discovered some fascinating content, or began to feel some community through finding like-minded people. That’s harder to find with Google.

Ethan Zuckerman

Associate Professor of the Practice in Media Arts and Sciences at MIT Media Lab, Director of the Centre for Civic Media at MIT, and the author of Digital Cosmopolitans: Why We Think the Internet Connects Us, Why It Doesn’t, and How to Rewire It

Well, in those dark days, we used several different search engines, which ran on two different philosophies: TFIDF and human curation.

TF-IDF stands for “Term Frequency Inverse Document Frequency.” What that means is that a search engine took your query — “mule power” — and looked for documents that contained the term. But it also considers how common the term is across the corpus as a whole, to avoid overmatching on very common terms. So in searching for “mule power”, a TF-IDF engine is likely to prefer documents that mention mules over those that mention power, because power is a more common word than mule.

TF-IDF is vulnerable to a very specific sort of hacking. If I want to sell you my new mule-powered web browser (they were all the rage in the early 1990s…), I just post a web page that says “mule power” over and over. There’s no document on the web that’s a better match than that for your query, so I’ll come up #1 every time. That’s the weakness that led Larry Page and Sergey Brin to work on Page Rank. The idea was that pages like my spam page would be unlikely to be linked to, whereas helpful pages would have lots of incoming links. Google basically married TF-IDF to Page Rank to launch their initial search engine. (People figured out how to game page rank as well, creating farms of webpages that all said “mule power” and linked to one another. Google created more complex algorithms in response. Progress. People stopped using mule powered browsers and the steam browser became the new hotness. Literally — you could burn yourself really badly on one if you weren’t careful.)

Lycos, which I briefly worked for after they bought Tripod, the company I helped launch, ran on TF-IDF, as did Excite, HotWired and Altavista, which I remember as being the best of the bunch.

TFIDF never worked especially well. As time went on, smart search engines discovered that 30%-50% of queries could be solved with hand-curated search pages. For instance, if you searched “mule race results,” finding you a page that prominently mentioned that phrase was probably not helpful — sending you to the front page of the AMF (the American Muleracing Federation) would be a better result. Lycos served at least 30% hand-crafted results pages when I left in 1999.

Yahoo, by contrast, initially ran on a completely human curated basis. It wasn’t a search engine, but a directory. When you searched for “mule racing”, it would show you where mule racing fit in various hierarchies:

Sports -> Sports Leagues -> Racing -> Mule Racing

and then link to AMF, OOM (Only Ornery Mules) and ESPN (Entertainment and Livestock Programming Network)

Law -> Animal Abuse -> Mule Racing

and then to PET’eM (People for the Ethical Treatment of Mules)

What was great about this is that it could show you how one entity (AMF) fit into the larger world of mule-racing. It was particularly terrific if you were researching companies, as you could quickly find potential competitors or different suppliers. But it was a royal pain in the arse to build, requiring actual human taxonomists to look at sites and figure out where they landed in the hierarchy. And god help you when someone invented something new, like the steam-powered racing mule. Does that go under mule racing, or steam power? Both? Or a new category entirely to recognise the advent of new sporting leagues like NASCAR (National Active Steam Cattle Associated Racing)?

Yahoo! worked really well for the first few years of the web, but it was unwieldy and breaking down by 1997 or so — they began outsourcing their search to other companies (Excite at first… Bing now.) I do miss it, if only because it was fascinating to see the ways people had chosen to organise the whole of human knowledge. (Melvil Dewey assigned the 200s to “religion” and then dedicated 220-280 to various different topics about the Bible. The 290s are about “other religions”… including Buddhism, Hinduism, etc.)

It’s hard to imagine Yahoo coming back — it’s just too much damned work. In a sense, human-curated search pages have made something of a comeback. Much of the Google results page is not a TF-IDF type of web search but a page constructed out of various database queries – search for the weather, and Google uses geolocation to determine where you are and finds local weather news from a db. I actually think pages curated by humans – librarians working together Wikipedia style, for instance – might be a great solution to how to handle rapidly emerging topics that tend to be hijacked by political extremists or disinfo merchants.

As for what I miss: I miss the mules. My mule-powered Netscape browser was slow, but I miss those gentle rhythms of grazing the web.

Do you have a burning question for Giz Asks? Email us at [email protected]