Think about the early 90s, late 80s, pretty much any time between the beginning of Compuserve and that of Google. You’d log on through what was probably AOL and find content waiting for you – their content. The walled garden systems of old assumed that your ISP knew what kind of content you should consume. They taxonomized services according to the categories they knew and billed by the hour for what they provided.
It took a while for the idea of a free, open pipe to the rest of everything to catch on. I was late to the Internet and aren’t sure when. Probably somewhere between the fall of AOL and the rise of search over the web directory (another kind of walled garden in its own sense), people naturally gravitated towards entering a term they made up into a search box to scan for relevant content. No more trying to self-pigeonhole into the established taxonomy. Finding what you want became a game of querying and adjusting, figuring out the best way to tell the Internet what you wanted.
There were probably many reasons why this happened, but I’m going to assume that the content was just better. Not necessarily because it was objectively better (though PageRank probably helped that too), but because a prebuilt taxonomy has some natural limits to its capacity to capture the interests and wants of human beings. If your interests happened to lie perpendicular to the web directory taxonomy, chances are you’d find results that were maybe 10% or 20% relevant in a related category – but you’d not only be sifting through tons of irrelevant junk, but you’d miss out on everything the curator didn’t specifically decide to put into a category barely relevant to your own. So if you happened to like, say, Kabuki theater, you’d probably peruse the theater section for anything relevant and be lucky to find an article or two that you really wanted. On a search engine, however, you could specify exactly what you should get and know that the voracious Googlebot would have crawled nearly every accessible page.
So problem solved, right? We have Google, Bing, Yahoo, DuckDuckGo… plenty of ways to search. Old news.
Old news – that’s exactly the problem. Search is built on the same assumption as browsing – that you’re the one moving around, and the content is going to sit and wait to be found. So we can search, but when the content stops being perennially valuable, those results are probably stale before we see them.
Now think about what tabs you have open. I can tell you mine. Facebook, Twitter, HackerNews, Gmail, 7 Youtube videos (mostly music), a couple Wikipedia pages that I checked before I wrote this, a map, and a Google search. The Wikipedia pages do their jobs as well as could be expected – I have something I need to look up, and I can find it. Youtube I’m sort of misusing as a lazy man’s Spotify/Pandora hybrid, but that’s a whole other issue.
My concern is with the feeds – the 4 tabs of news and social media I’m constantly refreshing, plus the 10 or so blogs/webcomics that I generally close and re-open. I find 3 articles that look interesting on HackerNews. One of them is relevant to one of my interests (startup fundraising) but not very useful. Another is emotional linkbait that might’ve been interesting to know, but probably isn’t important to what I’m doing. The 3rd… same deal. The rest of the pool are clearly a mix of technical and newsworthy topics that I don’t have any particular interest in. So not only am I sifting through 30 articles that are mostly, obviously irrelevant, but the fact that I can’t through headlines means I’m bound to get halfway through a post before I really know what it’s about. Facebook and Twitter are much worse – while HN is probably about 10% relevant on average, the social networks are showing random selections of content taken from people I friended/followed because… we’re friends. I have 8 “outstanding” requests for games I don’t want to play, the stereotypical cat and baby pictures, a couple checkins to bars in cities I don’t live in, and maybe somewhere in there an event I actually wanted to go to but might not even see. In theory, I could pick the people I want to hear from by carefully managing subscriptions – but not the topic or quality of their posts.
Furthermore, these updating media represent 4-10 tabs in my normal browsing. There is no unified way to all the new stuff about Scala (a programming language in which I’ve had recent interest). I could maybe hand-aggregate a bunch of the blogs and news sites on that particular topic, spending hours on aggregration that could be machine-driven and missing sources I just don’t have the time to find, and then plug them into Google reader. And what if the topic is more specific or obscure – what if I’m really only interested Scala for a particular feature (such as parallel collections) and don’t really care about all the theoretical debates on programming style? My RSS feed would be full of crap.
I could of course try to Google the subject. This takes me back to that static web – I get 3 relevant results on the 1st page – the official website for the language, the Wikipedia page, and the site for the popular Lift web framework. All of these I’ve read before – generally to the point of boredom.
Then there are Google News Alerts – alternatively trap.it or DailyPerfect. These work pretty well for finding the news. I just don’t always want “the news,” which seems to have narrowly become defined as content published by a certain collection of media and journalism outlets. I want blog posts. I want Tweets linking to them, interesting new discussions on forums, and release notes when they appear. And I want this content searched and organized – don’t give me 10 articles on MongoDB scalability because the RSS reader’s regex filter found “scala” embedded in “scalable,” and don’t make me handpick 50 feeds that might have relevance when there are 10,000 of them out there on the web.
There’s a tradeoff. Freshness, relevance, or automation: pick 2.
The post-2.0 web, that of updates and user-generated content, hasn’t yet caught on to this notion of an open Internet. We are still in the world of privately walled gardens – Twitter, Facebook, Google+, Bloomberg News, Github, CNN… each with its own content stream, separate from and frequently incompatible with everyone else’s.
We (finally) have basic search on each of these streams, individually. I can look for hashtags or keywords on Twitter, or on Google+/Sparks.
We have real-time search: Topsy, Greplin, etc. – pick your provider really. But search isn’t quite right – again, it’s based on this idea that my interest in what I’m looking for lasts for the hopefully-less-than-a-minute span between when I realized I had to look it up and when I found it. That’s not what I’m trying to do when I “search” Twitter.
What we need is an analog to how search is a portal. When I go on the internet with a new interest, search gives me a best-case-scenario way of saying what I want and getting it – that’s why it initially beats the crap out of categorized directories. It’s like somebody is making up a category on the spot, and that category corresponds to precisely what I am hoping to find.
We need to do that totally personalized category thing, but turn it around. The web no longer sits and waits to be discovered – it happens. I’m the one waiting. I want to know when that super-insightful blog entry tells me exactly how to hack Scala 10x better, or when my friend 2 blocks away checks in to that cool jazz bar I’ve been meaning to investigate. I don’t want to “search” for this information, because it would get tedious – imagine keeping a list of 10 Google searches to make every hour just to know what’s happening.
So instead of searching for new/momentary interests across the web of pages that rarely change, I have relatively constant interests and want to receive news based on them. So just flip things around. I enter a “search.” I leave it. Items accrue to that search as they happen. I check back in 5 seconds, 5 hours… 5 days – it shouldn’t matter. I’ll rely on the system to keep track for me. It should know what I’ve seen, what I’ve missed, what I want and what I care about. And when I check again, it should present it like a secretary.
Does this exist? Can it? It’s trying to sometimes. It’s close. But I while I don’t think it does yet, I think it most eminently can.
I don’t know how much we’re missing out on due to the non-existence of software that does what I just described. Maybe it’s merely an additional feature on top of the growing notion of “aggregation” – a fusion of Tweetdeck, RSS, and Greplin. But maybe there’s something more here. Maybe we are missing not just a few pieces of Internet, but a substantial fraction of the entire thing. Maybe we have all grown accustomed to consuming that which is 10% relevant, because we’ve never known anything else.