Saturday, May 31, 2008

Internet Theory

Internet: an electronic communications network that connects computer networks and organizational computer facilities around the world (webster.com)

All that binds the Internet together is communication protocols.

So what are we communicating?
Communication: the imparting or interchange of thoughts, opinions, or information by speech, writing, or signs. (dictionary.com)

So... everything. How is everything organized? That's quite an impossible question. Everyone seems to have their own particular views on the world, and it's also a moving target. So how is it we are supposed to organize our communications over the Internet if the state of the information is in constant flux?

We approximate using keywords, semantics, and behavioral models mapping some input set of information to some output set of information.

Why do we do this? So we can find the information we need, or find the places information needs to go.

In the early stages of the Web, keyword was it we extracted surface data from the input and surface data from the output. Behavioral models focused on improving what was known about a particular set of inputs and was a huge step forward because it made computers slightly smarter than the person. Even better it's a model that gets better with use and is capable of self-correction provided the inputs don't change radically. Eventually these techniques were applied to improve the analyzed output data as well (see PageRank from Google).

This works most of the time. However, it often crops up when trying to deal with ambiguous or uncommon inputs. The answer to this is supposedly to apply semantics to both the input and the output set. This makes sense as we are trying to detect the deeper meaning of what is provided.

What doesn't make sense is that we are imposing an artificially created sense of meaning. This becomes very obvious when looking at slang terminology or even regional dialects. Technically we could encode all of those dialects into our semantic model, but to what purpose? They are going to continuously change no matter what.

With semantic technology we teach the computer how to relate various chunks of text to one another. This is where semantics changes what is the Internet. Before we were simply communicating documents, emails, web pages, etc... With semantics, we can communicate meaning. We can communicate what it is about a particular document, web page, or email that has useful information to the end user without actually ever showing the the original source of information.

Rather than input and output, we have a network of information related by it's meaning rather than a set of information mapped into an organizational scheme by some model.

Of course, if creating semantics on the computer were easy it would have already been done....