luckyrobot.com header image 2

Search, context and the user revolution

February 22nd, 2008 · 1 Comment

I had the opportunity to give a keynote this week at the annual Fast Search and Transfer (soon to be a part of Microsoft) partner event. The overall topic was “The User Revolution” and my specific piece focused on unlocking the next generation of user satisfaction through the application of semantic context to the search equation.

My underlying premise is that what we currently know as search isn’t powerful enough nor is it well suited to solve the problem we all face every day: out of the 6,803,098 documents relevant to your personal quest, which ones matter? Which ones contain key information that will allow you to find not only WHAT you want but WHY it’s important to your need.

Going beyond that, how can you find out what’s related, what’s important that you’re NOT looking for but need to know? These are the things that lead to true insight - and we haven’t improved, as an industry, in over 10 years. Major indictment for all of us proud folks who have building search products on the web for 10+ years.

The answer is simple but not easy - we need technologies that empower users to sift through the information and actually traverse the concepts (structure) that underly the vast amount of free text (unstructured).

That’s where text analytics, semantic analysis and the whole realm of metadata and tagging come in.

There’s a logjam, though. Metadata is not easy to produce, and it’s never consistent. It’s not even part of the CMS workflow frequently.

The problem has been vexing. Until now.

Bang your content against the Calais API and it comes back tagged in W3C-standard RDF. Use it for SEO, use it for navigation. Use it to bring in related feeds to retain users on your site (pageviews) and to target your ads (ad revenue). It’s all great.

I have to give a huge shout-out to Tom Tague who is the mastermind and driver, as well as Barak Pridor and the entire Reuters/Clearforest team working on this.

So, if you’re interested, here’s the FastForward08 presentation, and here’s a link to my interview afterwards. I will put up the link to the full length video as soon as it’s available.

Tags: Calais · Speaking

1 response so far ↓

  • 1 VikrantGoswami // Mar 3, 2008 at 8:46 am

    Well, this is surely a defining step in the right direction, for tackling the challenge of unstrcutured content.

    However I don’t think that this is enough!
    Agreed that you have extracted the relevant tags from a particular article, where does that take you, and how different is

    that from the current situation!! I guess not too far……. as the context is still missing, though you have content anchor

    points defined( which in itself is not mean task, which Opencalasi facilitates)

    As you have emphasised “context” is the key to the discovery of content, you need to have the environment defined as

    ontologies whith relationships and rules on top of them. Once you do this , then extraction of the context may be possible

    with the help of extracted entities.

    e.g. Let’s take an example of Sentiment analysis using OpenCalais and some domain ontologies

    Take a new item such as “FDA Approves Avastin For Breast

    Cancer”(http://www.orglex.com/permalink/0ba63fbdfdf3985/fda-approves-avastin-for-breast-cancer/).

    Using OpenCalis you can extract entities such as FDA, Avastin, Breast Cancer , Genentech Inc from the text(and probably that

    Avastin is owned by Genetech). Now you need the domain ontology to make sense out of the same.

    In the domain ontology , “FDA” is “defined”(relationship) as the “administrative” “organization” for “pharma” industry. You could define a simple rule (on SWRL or Semantic Web Rule Language ) stating that any drugs approvals from FDA would imply a “positive” sentiment for the parent organization.

    This is surely a much more actionable deduction. Moreover, you could caliberate the strength of the derived “positive” sentiment, based on other data, such as market size, company annual turnover, competitor’s drug landscape or gaps

    and other relevant structured data. This structured data could be brought in as required using the ontology models as mapped to the existing databases ( One such vendor is Metatomix ). This could be done only if you have a clearly defined domain ontology.

    Thus a combination of OpenCalais , domain ontologies and rules on top can provide a more potent capabilities, which can be

    used by algorithemic trading systems or for better search to humans.

    Capturing domain as ontologies and rules is another critical step we need to take to moce closer to making sense out of unstructured content and to bring unstrucutured and structured world togather.

You must log in to post a comment.