<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
        "http://www.w3.org/TR/REC-html40/strict.dtd">
<HTML LANG="en-US"><HEAD>
<TITLE>Search Engine Snob: Teoma reviewed</TITLE>
<base href="http://www.bauser.com/websnob/engines/Teoma.html">
<META NAME="DESCRIPTION" CONTENT="A review of the Teoma search engine.">
<meta name="KEYWORDS" content="Teoma, search engine">
<meta name="DCTERMS.created" scheme="DCTERMS.W3CDTF" content="2002-05-05">
<!--#exec cgi="../cgi/head.pl"-->
<link rel="DC.subject" href="http://www.teoma.com/">
<script src="pop.js" type="text/javascript"></script>
</HEAD><BODY>
<p class="breadcrumbs"><a href="/websnob/" rel="home">Websnob</a> &gt;
<a href="/websnob/engines/">Search Engine Snob</a> &gt;
<strong>Teoma (review)</strong></p>

<p class="advert"><!--#exec cgi="../adverts/ad_1.pl"--></p>

<H1>Search Engine Review: Teoma.com</h1>

<p>A beta version of the <a href="http://www.teoma.com/">Teoma search
engine</a> was unveiled in May 2001 by the company of the same name.
Founded by researchers from <a href="http://www.rutgers.edu/">Rutgers
University</a>, both Teoma the company and Teoma the search engine were
"built to be sold", and immediately began looking for more established
company to sell out to. Four months after unveiling their beta site, Teoma
was purchased by <a href="http://www.ask.com/">Ask Jeeves</a>. Jeeves
finally took Teoma out of beta in April 2002, incorporating Teoma's
technology in the Ask Jeeves site, shuttering their previous acquisition,
DirectHit.com, and relaunching Teoma.com in the media.</p>

<p>Teoma's "hook" is what it calls "Subject-Specific Popularity", which
rests on the notion that quality pages about a subject will always link to
other quality pages about the same subject. Pages that are linked
<em>from</em> many similar pages move up in the Teoma index, while sites
that link <em>to</em> many similar pages get sifted out and identified as
"resource sites" for a subject. In practice, the first function of
Subject-Specific Popularity works a lot like <a
href="http://www.google.com/technology/index.html">Google's
PageRank</a>.</p>

<p>Teoma is the latest search engine to be promoted as <a
href="/websnob/engines/#fourth">the Fourth Coming</a>, especially by that
sect of search engine optimizers who resent being marginalized by <a
href="http://www.google.com/">Google</a>'s dominance of the industry. The
popular press, however, has been mixed, and some searchers (including this
websnob) remain unimpressed by Teoma.</p>

<h2 id="webmasters">The Webmasters' Side</h2>

<p>I normally cover the searchers' side of a search engine first, but Teoma
has a lot of webmaster issues that I feel effect the end-user as well, so
we're going to discuss those issues first. To be blunt, Teoma's got a lot
of problems that are going to keep it out of Google's league.</p>

<h3>Teoma has no free submission</h3>

<p>That's right. If you want to submit a site to Teoma, you have to pay.
(Paid sites also get spidered more often than free sites. <em>A lot</em>
more often; see the next complaint.) Some non-paid sites do enter the
database through random crawling by Teoma's robot, but free entries are
relatively rare.</p>
  
<h3>Teoma doesn't spider often enough</h3>

<p>Although Teoma has a robot looking for new web pages, it doesn't send
the robot out very often. As a result, Teoma's search results for sites
that haven't paid for spidering appear to be five months (or more) out of
date. Think I'm kidding? Take a look at these search results for "Google",
as captured on 5 May 2002:</p>

<blockquote><img src="../bin/Teoma1.png" height="200" width="450"
alt="Happy Holidays from Google Web Images Groups Directory ? Advanced
Search ? Preferences ? Language Tools Advertise with Us -
Add..."></blockquote>

<p>The phrase "Happy Holidays from Google" in that sample is the <a
href="http://www.w3.org/TR/html401/struct/objects.html#alternate-text"
>alternate text</a> from Google's logo in December 2001. That's right: It's
<span lang="es">Cinco de Mayo</span>, but Teoma hasn't spidered the Number
2 site on the Web since Christmas. Do those look like fresh results to
you?</p>

<h3>Teoma doesn't support the robot exclusion protocols</h3>

<p>I'm not completely sure of this one, but so far the evidence is that
Teoma's spider (when it actually ventures out onto the Web), doesn't honor
the usual <a href="http://www.robotstxt.org/wc/exclusion.html">robot
exclusion protocols</a>. Teoma doesn't mention them anywhere on its site,
and <a href="http://www.robotstxt.org/wc/active/html/teoma_agent1.html">the
latest information about Teoma at robotstext.org</a> reports that the Teoma
robot ignores robots.txt.</p>

<p>While there's no law requiring that a web spider follow these protocols,
it's highly unusual for a major site (or someone who wants to be a major
site) to ignore them completely. It suggests that Teoma either doesn't care
about good citizenship, or isn't planning to do much actual spidering. The
latter, of course, would just be another sign that Teoma intends to
concentrate on sites that pay to get in the database.</p>

<h3>Teoma has problems with <acronym title="Extensible HyperText Markup
Language">XHTML</acronym></h3>
 
<p>Here's another search result from Teoma, also captured on <span
lang="es">Cinco de Mayo</span>. It shows what happens when Teoma indexes an
XHTML page:</p>

<blockquote><img src="../bin/Teoma2.png" height="135" width="450"
 alt="Don't feel bad. He doesn't know who you are, either. ... ?xml
version=&quot;1.0&quot;? michael &#64; bauser .com Third Person Michael Bauser is a web
provocateur..."></blockquote>

<p>The phrase between question marks isn't part the page's human-readable
text, it's the page's XML declaration. Search engines should not be
indexing those.</p>

<h2 id="users">The Users' Side</h2>

<p>Teoma's home page follows the trend towards plain-and-simple search
pages, offering a lone search box with one search option (phrase
searching). Teoma's search results pages are more complex.</p>

<p>At the top of the results are "Sponsored Results", currently taken from
<a href="http://www.overture.com/">Overture</a>. The same Sponsored Results
repeat on each page of search results. The descriptions of Sponsored
Results are written by the sponsors (advertisers).</p>

<p>Teoma's main results follow underneath the Sponsored Results. The site
descriptions combine the description from sites' <a
href="http://www.w3.org/TR/html401/struct/global.html#h-7.4.4.2">meta
tags</a> with the leading text of the pages themselves.</p>

<p>The right margin of Teoma's results pages contain the features that are
Teoma's "hooks". The "Refine" menu suggests additional searches related to
your original search. Refined searches are based on analysis of the pages
that appear in your original search (Essentially, Teoma looks at all the
pages which it found for your original search, identifies phrases that
appear more often than average, and suggests those phrases as ways to
narrow a search), and can be a hit-or-miss proposition. In my tests,
nonsensical phrases like "Product People" and "Free, Get" often show
up.</p>

<p>Teoma's "Resources" listings point to pages that contain links to many
of the pages listed in your search result. Teoma considers such pages to be
potentially useful resources on the topic you're researching. So far, the
"Resources" results seem more reliable than the "Refine" results.</p>

<h3>Teoma's index is too small</h3>

<p>Normally, I don't pick on new sites for having small indexes, but
Teoma's run by a crew that won't shut up about challenging Google. By most
estimates, Teoma's total database is about 200 million pages. That's
<em>ten percent</em> the size of Google's database. Even AltaVista has a
larger database than Teoma, and they're the search engine we all consider a
lumbering dinosaur.</p>
 
<p>Why is Teoma's database so small? Because there's no free submission and
it never spiders anything, that's why! (Now you know why I listed webmaster
issues first.) As long as Teoma concentrates all its growth on paid-for
listings, it's going to have smaller, less representative database than
free-ranging sites like Google and <a
href="http://www.altavista.com/">AltaVista</a>.</p>
 
<h2 id="conclusions">Conclusions</h2>

<p>As of May 2002, Teoma.com is nowhere near being the engine that will
unseat Google. It's index is too small and stale and it's ability to expand
with the web is limited. Teoma.com has made a lousy first impression.</p>

<p>First impressions are important, especially if you're waiting for that
Fourth Age of Search Engines to start. The engines of the first three ages
(<a href="http://www.yahoo.com/">Yahoo</a>, AltaVista, and Google, in that
order) weren't necessarilty the first or largest sites of their ages. They
were the search sites that most impressed the Web's early adopters, The
Geeks. The Geeks are the ones who recommend search engines to the rest of
us, and search engines that don't get word of mouth from the Geeks have
much steeper mountain to climb if they want to reach the top.</p>

<p>Teoma is not impressing the geeks. At best, they've said it looks
promising. At worst, they've already dismissed it as a wannabe. Even forums
that aren't devoted to search engines have taken pot shots at Teoma; witness
the dismissals from <a
href="http://www.dotcomscoop.com/article.php?sid=295">Dotcom Scoop</a>, <a
href="http://www.lextext.com/icann/2002/04/17.html#a287">icann.Blog</a>,
and <a
href="http://slashdot.org/article.pl?sid=02/04/01/0127218">Slashdot</a>.
Teoma is rapidly accumulating bad karma with the most influential members
of the Web audience.</p>

<p>Search sites that don't impress the geeks of the Web have too choices:
Live off a partnership with a major-leaguer that can drive them traffic (as
<a href="http://www.looksmart.com/">Looksmart</a> lives off its partnership
with <a href="http://www.msn.com/">MSN</a>) or die. Unfortunately for
Teoma, their only major partner is Ask Jeeves, who've already proven
they can do more harm to own search engines (look at Direct Hit)
than they can to the competition's.</p>

<p>Unless Teoma finds a way to radically improve its relevancy, or manages
to partner with some better sites, it's always going to be an also-ran with
delusions of grandeur. I can't recommend investing time or money in Teoma,
whether you're a searcher or a webmaster.</p>

<p class="advert"><!--#exec cgi="../adverts/ad_2.pl"--></p>
<!--#exec cgi="../cgi/menu.pl"-->
<!--#exec cgi="../cgi/2002"-->
</BODY></HTML>

