luckyrobot.com – Gerry Campbell header image 2
levitra venta sildenafil espana viagra requiere receta medica viagra rezeptfrei usa viagra farmacia similares comprar viagra autentica precio tadalafil viagra rezeptpflicht viagra ventas viagra barata kamagra jelly kaufen viagra auf rechnung bestellen viagra preis in apotheke viagra filmtabletten cialis venta viagra generika kaufen kamagra bestellen viagra kosten kamagra 100 mg cialis generika wirkung foro kamagra vardenafil generico mexico cialis seriös kaufen generika cialis rezeptfrei viagra ohne rezept erfahrungen venta viagra generica tadalafil generico colombia viagra necesita receta levitra 20 mg generico viagra sustituto medicamento cialis levitra rezeptfrei bestellen erfahrung viagra generika cialis 20mg filmtabletten preis cialis viagra levitra kaufen rezeptfrei preise viagra kamagra rezeptpflichtig viagra woman kaufen sildenafil shop viagra bestellen ohne rezept venta viagra españa compra cialis españa kamagra günstig kaufen preise cialis cialis comprimidos rezeptpflichtig viagra kamagra kaufen forum sustituto viagra viagra nombre de la droga precio viagra 25 mg comprar generico viagra cialis lilly icos precio viagra generico cialis mallorca levitra farmacias del ahorro viagra farmacias del ahorro viagra sicher kaufen rezeptpflicht viagra viagra generika deutschland viagra precio farmacia viagra andorra viagra bestellen per nachname comprar cialis por telefono cialis 20 mg precio viagra woman bestellen cialis 5 mg 28 comprimidos precio viagra preise deutschland masticable viagra kamagra tabletten erfahrungen firmel levitra viagra frauen viagra niederlande rezeptfrei generika viagra cialis kamagra oral jelly schweiz sildenafil citrato cialis tadalafil preis vendo cialis generico comprar viagra cialis super kamagra online bestellen viagra torte rezept kamagra verkauf viagra comprar cialis farmacia del ahorro precio viagra andorra super kamagra billig foro cialis generico viagra o similares comprar cialis sevilla kamagra preisvergleich precio viagra 50 precio viagra argentina precio levitra en farmacia cialis kaufen forum comprar viagra generico en españa potenzhilfe precio levitra 10 mg generic tadalafil generika viagra kaufen in thailand cialis lilly preis potenz steigern cialis de 5 mg viagra generika online kaufen comprar cialis original jelly viagra generika potenzmittel foro viagra generica viagra por correo sildenafil citrate tablets cialis generico barato tadalafil 20mg kaufen comprar cialis en andorra viagra precios kosten viagra rezept cialis generico en monterrey cialis generika deutschland venta viagra tadalafil preis viagra türkei kaufen cialis instrucciones comprar viagra fiable viagra kaufen mit rezept kamagra eu precio cialis andorra preisvergleich viagra 100 mg cialis necesita receta medica apotheke kamagra cialis marca erfahrung viagra bestellen kamagra per nachnahme precio de sildenafil viagra natural receta viagra online contrareembolso comprar viagra de marca precio cialis en farmacia comprar levitra online comprar cialis farmacia comprar viagra foro levitra 20 precio comprar cialis barcelona cialis kaufen rezeptfrei cialis donde comprar viagra kaufen türkei viagra venta libre kamagra wo kaufen comprar cialis españa venta viagra andorra kamagra oral jelly apotheke viagra kaufen apotheke viagra tschechien comprar cialis generico contrareembolso erektion viagra nombres comerciales viagra online kaufen cialis generico contrareembolso kamagra in berlin kaufen cialis o viagra cual es mejor comprar levitra 20 mg comprar levitra generico viagra rezeptfrei online cialis 5 precio compra levitra comprar viagra pfizer precio levitra farmacia kamagra soft cialis 20mg preisvergleich cialis generika forum viagra donde conseguir precio farmacia viagra viagra online bestellen forum levitra receta kamagra para mujeres viagra preise in deutschland vendo viagra original cialis farmacias similares viagra 25 mg kaufen vendo viagra sevilla kamagra kaufen deutschland cialis 5mg filmtabletten precios viagra cialis levitra andorra viagra cialis im preisvergleich viagra 50 comprar viagra en farmacia günstig kamagra kaufen versand apotheke viagra rezeptfrei rezeptfrei viagra kaufen viagra kaufen in deutschland viagra verpackung viagra rezeptfrei in holland cialis generico en peru viagra ohne rezept holland comprar viagra feminino viagra apotheke rezeptfrei viagra rezeptfrei in spanien viagra laboratorio pfizer comprar viagra internet viagra on line viagras sin recetas cialis madrid kamagra oral jelly kaufen cialis 20mg viagra farmacias super kamagra preisvergleich comprar viagra por internet cialis no me funciona levitra similar viagra 50mg ohne rezept generika tadalafil cialis viagra nur auf rezept cialis generico en guadalajara kamagra ohne rezept cialis 5 mg precio cialis kaufen deutschland comprar cialis sin receta pfizer viagra preis la viagra viagra generico españa pastillas sildenafil cialis 5mg kaufen viagra wo bestellen viagra generica funciona vardenafil hci viagra magnus comprar levitra original farmacia viagra sin receta cialis receta viagra pille preis preis viagra sildenafil kamagra oral jelly billig viagra einkaufen tabletas cialis levitra madrid levitra foro vendo viagra concepcion viagra rezeptfrei online bestellen pille viagra generika potenzmittel andorra cialis sildenafil 50mg levitra 20 mg precio viagra medicamento comprar cialis por internet comprar viagra sin receta cialis generika aus europa comprar cialis 10 mg precio cialis original viagra liquida viagra 50 mg preis kamagra rezeptfrei viagra ventajas y desventajas cialis comprar madrid holland viagra viagra kaufen holland comprar viagra generica en españa viagra bestellen paypal pfizer viagra 100mg cialis verkauf precio cialis españa medicamento levitra kamagra en venezuela compro cialis apotheke viagra viagra bestellen erfahrungen kamagra online kaufen viagra nombre comercial viagra en farmacias viagra versandapotheke kamagra 100mg kaufen kamagra kaufen paypal cialis generika apotheke comprar viagra masticable vardenafil 20mg viagra natural foro cialis tadalafil lilly viagra rezept kosten cialis generika eu günstig kamagra bestellen cialis marketing strategy levitra donde comprar cialis 5mg filmtabletten 28 stück venta tadalafil viagra online apotheke viagra 100 mg pfizer viagra se puede comprar sin receta comprar viagra a contrareembolso comprar levitra en andorra cialis andorra viagra pfizer 100mg pastilla levitra viagra original kaufen viagra im internet viagra generika forum viagra rezeptfrei rechnung viagra preisliste cialis se vende con receta cialis generika levitra sin receta el viagra viagra kaufen angebote viagra niederlande viagra bestellen erfahrung cialis generica viagra ohne rezept comprar cialis generico cialis rezeptfrei kaufen vendo viagra barcelona viagra rezeptfrei auf rechnung comprar levitra por internet viagra sofort kamagra billig viagra original online cialis generico foro tadalafil rezeptfrei viagra kaufen in der apotheke mejor viagra comprar viagra barata sildenafil 100mg venta cialis barcelona viagras comprar similar viagra levitra generico contrareembolso viagra rezeptfrei aus deutschland ventajas cialis viagra generico en españa cialis farmacias guadalajara viagra apotheke preis cialis 20mg filmtabletten preisvergleich viagra kaufen test viagra rezept erektionsstorungen preiswert cialis comprar viagra natural cialis farmacias del ahorro vendo viagra santiago viagra generika indien viagra se compra con receta viagra tabletten kaufen viagra kaufen in hamburg cialis generico precio vardenafil hcl levitra venta libre kamagra kaufen cialis 5 mg preis viagra natural contrareembolso viagra natural barcelona comprar viagra online españa viagra farmacia precio levitra per nachnahme viagra rezeptfrei kaufen cialis precios venta viagra españa comprar viagra españa preis viagra preisvergleich comprar viagra sin receta en barcelona levitra precios cialis levitra comparison viagra generika preisvergleich viagra kaufen auf rechnung tadalafil generika pille rezeptfrei viagra foros cialis media pastilla viagra bestellen

Semantics, Search and Big Honking Databases

December 5th, 2008 · Comments

*

In 2003 when I had been heading up AOL Search for a while, we began to bang structured data into search results pages on a query-by-query basis. We used pre-formatted javascript templates that were selected based on keywords, and filled from the freshest, most relevant data we could find.

In today’s parlance we had widgetized search. We called them widgets back then, too. The entire system had to be built from scratch, and it was both costly and time consuming to build.

It was worth it. For the user, this meant that a search for “Turkey Recipe” would pull up – amazingly – a turkey recipe right at the top of the page. A search for “Austin Powers” would take your zipcode plus Moviefone data and present you with reviews and showtimes with a single click to purchasing tickets at the theaters near you.

Users and press raved.

It really was a big deal. In fact, we had this “search programming” on 20% of all queries, across all known categories – sports, autos, entertainment… This was the “Google and More” plan that allowed AOL to go into a relationship with the future juggernaut with confidence – we were going to use real content and clever editors to build an experience well beyond what the bluelinks could provide. It was great being Google’s biggest partner as they entered the world of paid search.

This search editorial program wasn’t an accident. I, as well as several of my contemporaries had been working on the opportunity for several years by then. (In fact, you could claim that when Ram Sriram’s Junglee announced “the Internet Is the Database” this all began.)

It grew out of a few different streams of activity that had been going on – at AltaVista the Web Search team had been doing a small version of this and I was working on the getting shopping data into results completely structured and pre-widgetized. Lycos had been trying things out as well. (AltaVista and AOL people may remember our friend Tim Robinson who was a major visionary behind this)

This program is exactly why I went to AOL. AOL had just merged with TimeWarner and an entirely new range of content – content from a REAL media company – would be available for search enrichment. What an amazing opportunity. My biggest frustration at AltaVista was the lack of content resources to enrich the experience – to keep people from flowing straight through the system and out to the Web without adding real value.

To make an already too long story shorter, we (AOL) made a boatload of cash on search with Google and the TW merger cratered. I moved on to implement a similar vision within the bounded vertical of Finance and News. That left the world’s most evolved and enriched version of search – AOL’s Fullview – at the hands of aggressive costcutting and Fullview was determined to be off strategy. No sour grapes at all. Just a missed opportunity.

Since then, former colleague Jason Calacanis has gone on to create Mahalo under a similar premise, Wikipedia has evolved into an amazing resource, and Google is still pumping out bluelinks, plus just a little more.

Anyway, the title promises that this post is about semantics, and it is. This is just necessary background.

The opportunity is still there, whether in search or in the online world at large to create a virtual fabric of content that can be experienced (browsed and searched) – and even more importantly assembled on-the-fly – based on its relatedness.

What do I mean by that? Whether in search or in socially-relevant widgets or in feed aggregation, we need to link and connect content based on its MEANING and not keyword similarity. Specifically – When I am looking at a Microsoft earnings report and I see related links to “gates” I want it to be about the person, not the thing. The same applies to java, apple and about six million other things. We live in an ambiguous world.

For this to become reality, two things need to happen:

  1. Content needs to be accessible in a format that is native to its type of data. For example, the fundamental information about Microsoft (P/E, market cap, etc) will be in fields, just like a spreadsheet. MSFT’s price history is going to be formatted into a huge list of Bid, Ask and transaction prices (gross generalization). News about the latest earnings will be in text blobs. You can’t index this data with a traditional crawler, and it can’t all be mushed into a single format without losing the unique value.
  2. There needs to be a consistent way, across formats of data, to call out and associate similar items. MSFT is related to Microsoft is related to Steve Ballmer is related to Bill Gates. With this type of linking, we can then understand the interrelatedness of things. In its simplest form, this is semantics.

Now, to the point of this post.

If you look above, there are two things that need to happen for the content/information experience on the Web to be dramatically improved: we need content in a universally accessible repository (or repositories) and we need a technique for connecting it all together. Also, the web is evolving and we now have the challenge of making that all socially aware and realtime.

Let’s take those two chunks separately.

Big Honking Database of Content – In the last 18 months we have seen an amazing set of resources applied against this.

  • Freebase is the first company of note. It promises to be a huge content stash in the sky and is funded to do it. Very very promising.
  • Amazon released public datasets this week. So now if you want economics and scientific data it’s there. And you can make your own data available via AWS if you allow it to be freely accessible. This is a HUGE step.
  • Fluidinfo and Terry Jones. It seems fashionable to say “I know Terry Jones” these days. Here’s why: Terry has quite possibly created the database to handle structured, semistructured, tagged and attributed, social and realtime data. In its native format. That’s why pundits like Tim O’Reilly and Robert Scoble are openly excited about Terry and Fluidinfo. Terry and I have spent many hours together contemplating this. (see, I know Terry Jones too!)

Semantic technologies – There is more work to be done here, but it’s on the way. First of all, to fit into the broad model I have laid out, the semantic tagging technology needs to be at the tool, or platform level. This rules out most of the activity in the Semantic space.

Here’s what I mean: If you want to create, say, a music fan-site application that pulls together artist bios, discography, tour reviews from the Web, user generated content and the ability to purchase both tickets and CD’s, you would assemble the content and then you’d need a semantic tool to tag and generate the connections between bands, releases, tour dates and purchasing.

You can’t do that with a semantic application that only provides related links or delivers search results on only the information in its own index/database. You need a tool you can run on all of the sources to generate consistent metadata. Not that Hakia, Twine, Powerset (now MSFT) and Zemanta aren’t useful, but they’re individual applications on top of a semantic engine. Builders need access to the engine itself in order to build a wide range of products and open up the power of the technology.

Unsurprisingly, I am highly in favor of the OpenCalais approach by ThomsonReuters.

The best part is that v 4.0 of Calais will be releasing the “Linked Data Cloud.” It goes after this in a truly powerful way, providing users not only the ability to get their own tags, but to see how those tags relate conceptually to other things in the OpenCalais data model. Rocket Science.

But this post isn’t about OpenCalais either.

This post is about finding opportunity in the world that is evolving.

If I haven’t lost you yet, and you can agree that content+semantic tagging is useful, you can see that there are some problems and opportunities.

  • Completeness of data – DMOZ (aka The Open Directory) used to be under my domain at AOL. I never could invest in it because the community management model was flawed: communities only want to curate the things they’re interested in (thanks to Andrew Cohen for the analogy). Investment in the infrastructure was only going to feed the weediness and patchiness of the garden. Freebase is showing signs of content spottiness and AWS will too. It’s an issue of primary importance. So I think there is an emerging opportunity to provide curation on top of these open services. Like Redhat to Linux.
  • Quality of data – Just like completeness, If anyone can publish to the datasets, there’s a risk of problematic information. This is where branding, and the associated quality control comes in. The opportunity is for companies who create content to establish and promote their brand as a sign of quality. Quality wins out over crap time and time again.
  • Universality of tags – Zemanta is admirably trying to get a tagging standard adopted across semantic engines. Whether by agreement (standard) or market leadership (default) the emerging content world will benefit from consistent tags to operate on. More things will be “connectable.”
  • Applications Applications Applications! – This is where I get excited. Really excited. After 15+years of helping people find what they’re looking for using technology, I scan the horizon and see the building blocks to finally get it done. We (the tech community at-large) now have raw content feeds, open and free databases, functionality APIs, open source platforms and development methodologies that free up our minds to think about how users really want their content. We are right on the edge of being able to build what we can imagine – quickly and cheaply. We’ve got the tools to measure it and the social context to present it in with personal relevance.

Jason Calacanis recently posted about the responsibility we all have to push forward through this downturn with 120% effort… The part I found specifically valuable was where he calls entrepreneurs and those with the resources to get out there and start something. I can agree with that.

So, the tools are there and hopefully I’ve given you at least one way to think about it… I am definitely making my bets on where this is going and will probably join in on the app-building side soon.

And yes, this ties in with the Splintering of Media. I’ll get to that soon…

(* Photo “Sound and Vision” copyright Rogiro from Flickr)

Tags: Uncategorized · search

Viewing 12 Comments

    • ^
    • v
    I read your post and was amazed how our work close to what you describe here as semantic technology. We have developed a technology for semantic search and text analysis which leverage Wikipedia knowledge to derive concept meaning and relationships. To recent moment Wikipedia has grown into a massive up-to-date database of such relationships. We would like to show our technology to you as it implements nearly everything that you discribed in your post: disambiguation, semantic tagging, semantic similarity to find related content/concepts and more. Could you please email me at maxim@grinev.net and I will reply with more details. Thank you.
    • ^
    • v
    How about the burgeoning cloud of RDF based Linked Data?

    Links:
    1. http://virtuoso.openlinksw.com/images/dbpedia-l...
    2. http://esw.w3.org/topic/SweoIG/TaskForces/Commu...
    3. http://dbpedia.org/resource/Linked_Data - cross linked with Freebase and many other structured data spaces
    • ^
    • v
    With the 2010 Census on the horizon, I'm thinking about applying for a job with the Census just to see if I can help make sense of it all. Wouldn't it be great if the Census data actually provided us with data that all Americans could actually benefit from? Your discussion of tagging data is extremely important in this regard.

    As it relates to tagging words, isn't this just XML? And don't we also need to tag whole phrases and not just words?
    • ^
    • v
    Gerry's right about 'semantics' being an overly used expression.

    Terry's argument regarding the objective definition of meaning refers to the term's traditional philosophical usage, whereas Gerry and direwolf are talking about contextual ambiguity.

    I believe pursuing semantics (philosophical) in computing is a futile endeavor until machines are able to feel the wind on the their faces.

    Resolving contextual ambiguity, however, is a much more attainable and in many ways more useful goal. How many times have you Googled something only to be returned hundreds pages with the 'other' use of your key word?

    Whether progress is made via changes in representation, better algorithms or even some sort of stochastic analysis is largely irrelevant (to me).

    The key point is that whoever makes progress in this space will, as the VCs like to say, take away a lot of pain.
    • ^
    • v
    Does the nature of the task (and this discussion) change if we talk about it as codifying *relationships*? That's really where I am going.

    I am not sure it makes any difference at all WHAT the thing is, it's more about the interrelatedness of one word to other words. In that case, the ambiguity is represented in a set of linkages that are more or less exclusive.

    For example - the linkages to gates the thing vs gates the person would be different. Even in the case of that double entendre, the two sets could be statistically separable.
    • ^
    • v
    and can't we use co-occurrence, etc to establish that relatedness...
    • ^
    • v
    Gerry, I would say the goal is to *infer* context rather than codify it.

    For example, in a document that it tagged as about MSFT, references to Gates are statistically more likely to refer to the person rather than the object. So, instead of tagging (codifying) each individual reference to Gates in the document, context can be inferred from one single tag, and hence the ambiguity resolved.
    • ^
    • v
    Great post Gerry and I'm dealing w/some of these issues today in the video space. Before I forget, you should list Dbpedia as one of the publicly accessible databases, which is an RDF normalized version of Wikipedia which now enables programmatic use of Wikipedia information.

    Combining your comments w/some of what Terry said below about our ambiguous world, I'm reminded of a magazine story a long time ago describing Microsoft's poor behavior which was titled "The Gates of Hell". Now if you consider the use of "gates" here, it's a double entendre. It's both a play on the doorway and on the person. Not very clean semantics ;)

    Great points you're making and now the business questions have to be satisfied as well as the financial incentives for the participants (content creators or curators) to do all of this work. When I think of all the work that sites have done in the past 5 yrs for SEO purposes, it's been all about findability. Making themselves more clearly indexed by the search engines and in turn more findable by people using search engines.

    Yesterday I spoke w/a stealth startup that is facilitating companies' ability to more easily make their data accessible to apps but w/business rules and metering services layered into their platform. I love that because companies have data that they s/b making more easily accessible for various uses, but also need to have an ability to monetize that and control to whom and how it is made available. Enabling easy access to it, but still having a spigot to open, filter and close its access to apps seems like the balance needed to open things up faster.

    In a world where data access is opened up because the right business rules can be put into place, and findability continues to be important to companies for the content, products and services they offer, the justification for properly marking up their content does exist. The challenge, which as you're pointing out is slowly being remedied, is "a chicken and the egg" one. Until the apps exist that make use of this marked up content in rich and useful ways, publishers and merchants do not want to go through the trouble of doing all of this work. On the other side of the house, app developers feel stifled to do a lot of work since they don't easy access to rich content sources for cool apps. An interesting company that is creating some good justifications for doing the mark-up work is Dapper in the semantic advertising space.

    It is getting easier, and I know that in my company's case, we're increasingly finding open access data sources to incorporate into our processes and apps, but there's still a long way to go. One of the best data sources we make use of for company and product information, does not make their stuff accessible in an RDF or even XML formatted way. Hence, we have to convert their data and upload it into our databases to then make it useful. It would have been nicer if they played a more open game, but they're small and have a hard time justifying the work to do this.

    I'm a lil' all over the place in this comment, but in essence, I agree w/Terry's comments which echo aspect of yours. Making a light weight technology to make content easily accessible at an organizational level, not so much in a big honking database, is really the way to go. It reminds me of the late '99/'00 time frame when RSS was still very early in its use. The company I was working with had chosen to develop syndication tools for the ICE (Information Content Exchange) syndication protocol which Vignette was supporting. It was more secure and reliable than RSS, and for the pro content providers of the time (ie. Reuters), this was very important to them as they warmed up to distributing their content to online publishers. ICE was a bulky technology however. RSS by contrast was light weight and was easy for almost anyone to use. While it didn't offer much in the way of security, it turns out that this didn't matter. As well, it's growth was secured the same way that eBay's and YouTube's was, by individuals w/a need (ie. bloggers and their readers). After garnering so much attention, the professional content providers realized that they needed to make their content available via RSS if they were to remain relevant (as has happened w/pro merchants on eBay and pro video content providers on YouTube). I believe the same could happen w/a light weight technology that helps make content providers' make their content available quickly and easily in these marked up ways.

    However, where publishers and merchants won't do the work at all, then a big honking database (a la Frebase) could take the work out of their hands and enable someone else to benefit fm the value of aggregating and structuring all of this information for meaningful uses. The big search engines have an obvious advantage here since they could theoretically start doing work to make their aggregated info available in interesting formats. Microsoft's acquisition of Powerset may have aspects of that to come, but may be not.

    Anyway, I'll concur w/your thesis that immense biz opportunities do exist to those who figure out how to pull all of this together.
    • ^
    • v
    Hi again Gerry

    I think you probably know most of my thoughts on all this, but I'll summarize quickly.

    I guess I hate semantics. In fact I don't think there's any such thing as meaning, or understanding. Those are just words. We poor humans take comfort from imagining that they correspond to some underlying "thing" (if I were Husserl or Heidegger, I'd use another word than "thing"), but they do not. As you might put it, we live in an ambiguous world. Deeply ambiguous. I also don't think you can ever answer any question that starts with the word "Why", but that's another (closely related) subject.

    Ahem.

    What I do believe is that if you want to try to build applications that give the illusion of semantics :-) then you should build them on the most flexible architecture possible. Because as your application becomes increasingly heavily used, or as you increasingly realize that you didn't really know what you were doing in the first place, you're really just starting to plumb the depths of ambiguity and if your underlying architecture runs out of flexibility, then you have a problem.

    I also think the base architecture has to be dead simple. Even if it's dead simple it's going to be very hard to build it properly and have it scale. Freebase and SimpleDB didn't come into existence overnight. SimpleDB is certainly not the answer to any of what you're imagining. Freebase is much more interesting. Fluidinfo's FluidDB has some pretty striking departures from Freebase (which I'll save for now). It's safe to say that we're going after different regions of the same space. And it's a very big space. One difference is that Freebase are really into big honking datasets, whereas that's not my initial interest at all.

    I would also add Google's BigTable to the mix, as well as Neo4j (http://neo4j.org/) - let me know if you want an intro. Again, there are differences in emphasis, again within a big honkin' space of possibilities and value (except in the case of my company, which is apparently unfundable :-)).

    Thanks for taking all the time to write this up. Your experience is bigger than N for almost all values of N. I hope Fluidinfo can move quickly enough on the tech side that we'll find a way to do something together before you're off into some other irresistible project.

    Terry
    • ^
    • v
    I love a vigorous debate, but actually there isn't one here. I pretty much agree with you.

    We choose to state it differently (I think).

    The word "semantics" has been used to cover so many things that it may have lost some precision in its definition. I may be guilty of using a broadened version here.

    I think of it this way: Google solved one of the vexing problems of search: Out of millions of results, which one comes first in the rankings? They created PageRank to approximate what *most humans* would find to be the best result. That was a hard problem and they stepped up to OWN the solution, even creating the "I'm Feeling Lucky" to emphasize that they had a solution to the problem.

    But it's still wrong a portion of the time for a large percentage of users... So they leave the other 999,999,999 results just in case.

    That simple approximation, and the willingness to accept some error while committing to improvement has revolutionized search.

    The exact thing applies to understanding meaning. If we can accept error, and use words like "semantics" or otherwise to describe what we're trying to do, we can make progress.

    If you want to coin a new term for this I'll gladly use it and give you attribution. ;-)

    What we seem to agree on is that there's no room for purity and absolutism here...
    • ^
    • v
    Hi again Gerry

    I wasn't being very nuanced in my original comments. That's partly due to lack of time, partly due to liking a more colorful debate. So here are a few more thoughts, and some pointers.

    Consider Artificial Intelligence and its pursuit of intelligence. We once thought it took real intelligence to play chess (for example). But as we got better and better at engineering, and we thought up smarter (but completely mechanical and non-mysterious) algorithms, we moved the goalposts. I.e., we decided that actually you didn't need to be "intelligent" to play chess after all.

    I don't believe that "intelligence" corresponds to any "thing" either, just like I think "meaning" and "understanding" are also just words. What I do believe however is in engineering and tool-building. We're primates, and primates are pretty good tool builders. So I often suggest to people that they spend less time (and investment monies :-)) on chasing abstract words and more time on building tools.

    The lesson of AI seems clear. If your tools are good enough, you can give the illusion of intelligence up to and beyond (i.e., beyond grandmaster) where it matters in any practical sense. The computer plays chess so well that you might as well say it's intelligent, or not - it just doesn't matter anymore.

    And I believe the same is true of semantics, and going after meaning and understanding. Those things can perfectly well not really exist while at the same time we can practically achieve them (i.e., the convenient and practical illusion, as with intelligence for the purposes of chess playing) by just focusing on engineering and tools.

    Make sense?

    From that POV, I argue that huge strides can be made by improving representation. If you get representation right, things that look like problems can simply go away. If you get the representation right, you may not even need a clever algorithm. Can you do an end-run around Google's armies of PhDs by changing representation? I.e., don't challenge them on the algorithm front, where you're bound to lose, but change the ground under them. You wont be surprised to hear that I think the answer is yes. I'm not talking about "beating" Google as a company, but of taking search - and how we work with information in general - to a new level.

    I wrote about this at some length, back before it was so fashionable to be me :-)

    The main posting is http://www.fluidinfo.com/terry/2007/03/19/why-d...

    And there are several others, including some that give very simple examples of why representation is so important, at http://www.fluidinfo.com/terry/category/represe...

    In summary, I don't think the words matter much. I think we can achieve amazing results (things that look like real intelligence, real understanding, that somehow capture meaning, etc) simply by focusing on engineering. My best bet about where to focus is on representation. What are the implications of the various new ways of representing information that we're exploring? I've been pondering that for over a decade! :-) My own bet, via Fluidinfo, definitely has some strong advantages and some strong weaknesses. It's a tradeoff, like so many things in computer science. Other approaches represent different tradeoffs. It's far from clear what will "win". But as I said in my earlier comment, it's a vast space we're starting to explore, and, I like to imagine, there's plenty of value to go around.

    I hope that's a clearer and a more useful answer.
    • ^
    • v
    Oh Gerry, I just flashed through the above before heading off to see my kids. Now I'm going to have to read it at length!

    I'll comment properly tonight. You make me smile & I hope vice-versa.
 

Trackbacks

(Trackback URL)

close Reblog this comment
blog comments powered by Disqus
viagra anxiety Order Viagra
taking viagra woman
"));

google

google

asus