The main problem with search engines isn't with the engines themselves, or with any of their methodologies. It's with the haystack we call the Web. And the fact that it's a haystack instead of a directory.



Take every directory you can name. The yellow pages. The Sears catalog. The list of companies in the foyer of a high-rise. The library card catalog. The inventory at a book store. The student listings at a school.



The Web has nothing like any of them. Beyond DNS, it has no directory structure, and no directory. It's a haystack.



Of course, this is a virtue in many ways. The absence of a directory structure — a required way to organize everything to the right of the first single slash of every URL — is one of the graces that allows the Web to grow and persist as a wild and wooly place. Thanks to the absence of a directry, the Web has no hierarcies beyond the lengths of path names (/yada/yada/yada/etc.), which are made less hierarchical by the hyperlink. (Which subvert hierarchy, Dr. Weinberger famously says.)



So we have search engines that go through 3-billion stalk piles of hay, looking for needles. And doing a fine job, considering.



That's our model, right?



It's still the model whose functions Marc seeks to expand when he writes A new kind of people search is needed. A sample:



I want a new kind of search engine that combines the full-text approach that Feedster.com uses with the inbound-link analysis that Technorati does. So, I want to see webloggers who are most likely to talk about quilting in the future.



There's an interesting difference, however, between giant heap of hay we call the World Wide Web, and the small corner of that heap where blogs live. This is the corner Technoraticalls the Live Web (a term Allen first coined for GlobeAlive). That difference is RSS. Simply put, the Live Web is syndicated. It sends out live notifications when something is published.



And, it seems to me, syndication implies a directory of some kind, at least at the source side of the notifications.



See, Technorati's spiders don't go out and crawl anything until they receive a notification that something has just been published. Aside from the efficiencies involved (Technorati doesn't waste bandwidth or server patience searching for stuff that may not have changed), it's interesting to me that most sources of notifications are organized and archived. This makes them different in kind from the Wide parts of the Web.



To show what I mean, consider the difference between a blog's archive and the complete absence of any kind of history at most of the places on the Wide Web we call "sites." Same goes for publications syndicated by Scoop, PHP, Slash, Drupal and similar content management systems. They have history. And they organize that history. The schemas and naming conventions may not be the same; but they are all chronological, and they all respect the need to save archives, and the enormous importance of those archives. (Hmm.. makes me think of blogs as The World Deep Web.)



Interesting, no?



By the way, everything I know about directories I learned from Craig Burton, who has been a voice in the wilderness against the proliferation of namespaces for as long as I've known him (which dates back to when he was kicking Microsoft's — and everybody else's — ass, with great humor, at Novell in the '80s).



At Novell, and later at The Burton Group, Craig and Jamie Lewis (who still runs TBG) decided they needed to change the network conversation from one about who had the best silo of private "pipes and protocols" to one about types of services, and how to make them interoperable. And they succeeded. Whether we know it or not, we all talk today inside the structure of terms Craig and Jamie laid on the world with the Network Services Model (NSM) for understanding what a network is, and what it does.



The NSM sees the network in terms of services, rather than in terms of wiring, protocols and other network mechanics. Services in the old days included file, print, messaging, management, directory and security. Today Web (hypertext) is a service too. There is no limit to how many we can have. But we won't have them until they're ubiquitous parts of the Net's infrastructure.



So an interesting irony of the Net is that it lacks many of the services that were taken for granted on LANs back when Craig and Jamie came up with the Network Services Model in the first place. For example, we have messaging for mail (SMTP, POP, IMAP, MIME, etc.). But little at all for directory, file, print or security (no, firewalls don't count).



More than three years ago, Craig put out a new version of the Network Services Model, for the Net. He called it The Internet Services Model, and provided it as a way of measuring progress, as well as a way to organize understanding.



To describe the dawn state of progress where we languished then, he coined the term Web Noir.



Whe I look around at how far we've come, it looks to me like we're still there. But I do believe RSS, the concepts of notification and syndication, and the growing size and importance of the Live Web, bring the dawn at least a little bit closer.



[Doc Searls]