Financial Times, March 4, 2008
Edited by Andy Ross
A new wave of AI technology, based on a collection of technologies that
includes natural language processing, image recognition, and expert systems,
may lead to intelligent machines. Thinking Machines founder Danny Hillis: "I
had some hope you could just put everything into some big neural network
that would just start to think, but it doesn't take long working in AI to
realise it's much more complex than that."
The basic building block
for this new technology movement is the semantic web, the brainchild of Sir
Tim Berners-Lee, who invented the present World Wide Web. Sir Tim imagined a
new web formed by linking the data contained inside the documents. That way
the data, not just the documents, would become accessible to machines.
This semantic web is the product of a set of core standards promoted by
the World Wide Web Consortium, the organisation that Sir Tim leads. Now some
supporters say enough pieces are in place to make the first semantic web
services a reality.
But there are some big obstacles. At the heart of
the problem is the need to make information on the web "understandable" to
machines, so that it can be extracted, processed and made useful. To make
this possible, machine-readable "tags" need to be attached to each piece of
data to describe what type of information it represents.
these tags to every piece of information on the web is a huge task. Without
new semantic services, there is no incentive to undertake the laborious work
of tagging data, but creating the services is pointless unless the data
exist in the first place. To try to overcome the problem, the semantic web
depends on a set of "ontologies", or dictionaries that help to create common
definitions that can be universally applied. These are designed to establish
a basic common level of understanding about language to allow machines to do
A technology first developed for use in AI is natural
language processing. Even simple words or concepts can mean very different
things to different people, and their meaning changes depending on
circumstances. While the human mind can make the necessary adjustments,
computers that follow strict rules about language find it hard to grasp the
many context-specific meanings.
Companies trying to employ natural
language processing maintain that technical advances in recent years have at
last given it a level of practical application. By using software to "read"
text, services such as Powerset aim to add tags to data automatically. The
natural language approach also raises the possibility of new applications,
for example being able directly to answer questions posed by a user.
Powerset is using technology licensed from Parc to try to solve the problems
of natural language processing. The software is based on similar ideas to
those in quantum physics. A number of potential meanings for all the
elements in the text are allowed to co-exist as equally accurate during the
"reading", until the most likely answer is singled out at the end.
Combining this approach with other techniques of data analysis can lift the
accuracy level further. One method relies on predicting the meaning of a
word based on the probabilities of its proximity to other words in the text.
As words do not appear in random sequences, the fact that one word has been
used in a sentence increases the chance that a particular other word will
also turn up.
Most expect the impact of the technology to be felt in
stages. The early advances are likely to be incremental improvements. Search
engines should return higher quality results, and services that rely on
personalization should make better guesses about your preferences, while
targeted advertising systems should become more accurate.
The Charms of Wikipedia
The New York Review of Books
Volume 55, Number 4, March 20, 2008
Edited by Andy Ross
Wikipedia: The Missing Manual
By John Broughton
Wikipedia is just an incredible thing. It has 2.2 million articles and it's
very often the first hit in a Google search. It was constructed, in less
than eight years, by strangers who disagreed about all kinds of things but
who were drawn to a shared, not-for-profit purpose.
It worked and
grew because it tapped into the heretofore unmarshaled energies of the
uncredentialed. This was an effort to build something that made sense apart
from one's own opinion, something that helped the whole human cause roll
Wikipedia was the point of convergence for the self-taught
and the expensively educated. All everyone knew was that the end product had
to make legible sense and sound encyclopedic. The need for the outcome of
all edits to fit together as readable, unemotional sentences muted natural
antagonisms. Wikipedians see vandalism as a problem, but a Diogenes-minded
observer would submit that Wikipedia would never have been the prodigious
success it has been without its demons.
Co-founder Jimmy "Jimbo"
Wales: "The main thing about Wikipedia is that it is fun and addictive."
John Broughton: "This Missing Manual helps you avoid beginners' blunders
and gets you sounding like a pro from your first edit."
SAP NetWeaver TREX
TREX search engine
TREX is a search engine in the SAP NetWeaver integrated technology platform
produced by SAP AG. The TREX engine is a standalone component that can be
used in a range of system environments but is used primarily as an integral
part of such SAP products as Enterprise Portal, Knowledge Warehouse, and
Business Intelligence (BI, formerly SAP Business Information Warehouse). In
SAP NetWeaver BI, the TREX engine powers the BI Accelerator, which is a
plug-in appliance for enhancing the performance of online analytical
processing. The name "TREX" stands for Text Retrieval and information
EXtraction, but it is not a registered trade mark of SAP and is not used in
TREX supports various kinds of
text search, including exact search, boolean search, wildcard search,
linguistic search (grammatical variants are normalized for the index search)
and fuzzy search (input strings that differ by a few letters from an index
term are normalized for the index search). Result sets are ranked using term
frequency-inverse document frequency (tf-idf) weighting, and results can
include snippets with the search terms highlighted.
text mining and classification using a vector space model. Groups of
documents can be classified using query based classification, example based
classification, or a combination of these plus keyword management.
TREX supports structured data search not only for document metadata but also
for mass business data and data in SAP business objects. Indexes for
structured data are implemented compactly using data compression and the
data can be aggregated in linear time, to enable large volumes of data to be
processed entirely in memory.
Sir Tim: Google could be superseded
By Jonathan Richards
Times Online, March 12, 2008
Edited by Andy Ross
Google may eventually be displaced as the pre-eminent brand on the internet
by a company that harnesses the power of next-generation web technology,
says Tim Berners-Lee. The web of the future would allow any piece of
information — such as a photo or a bank statement — to be linked to any
Tim Berners-Lee said that in the same way, the "current craze"
for social networking sites would eventually be superseded by networks that
connected all types of things. The semantic web will enable direct
connectivity between much more low-level pieces of information — a written
street address and a map, for instance — which in turn will give rise to new
Tim Berners-Lee: "Using the semantic web, you can build
applications that are much more powerful than anything on the regular web.
Imagine if two completely separate things — your bank statements and your
calendar — spoke the same language and could share information with one
another. You could drag one on top of the other and a whole bunch of dots
would appear showing you when you spent your money."
invented the World Wide Web in 1989 while a fellow at CERN in Switzerland.
Asked about the type of application that the Google of the future would
develop, he said it would likely be a type of mega-mashup, where information
is taken from one place and made useful in another context using the web.
Tim Berners-Lee is now a director of the Web Science Research
Initiative, a collaborative project between the Massachusetts Institute of
Technology and the University of Southampton.