Google Translator: The Universal Language
At the end of the 19th century, L. L. Zamenhof proposed Esperanto; it was intended as a global language to be spoken and understood by everyone. The inventor was hoping that a common language could resolve global problems that lead to conflict. Esperanto as a planned language might have had some success, but today, English is much more universal. 30 countries have it as an official language, and in many other countries it is taught in school and understood fairly well. The internet can be suspected to further increase the adoption of English.
Still, many people can’t speak English. The collected, shared knowledge that makes up the web is therefore only partly accessible to them. The reverse, of course, is true as well. When you surf the web, you will sometimes come across languages and characters you don’t understand – like Chinese, Arabic, Korean, French, German, Italian, Spanish, or Japanese. Would you be able to fluently read these languages, those sites wouldn’t be a dead end for you. You would discover a wealth of knowledge, and more importantly, opinions. If you’re an US citizen, how many Arabic, German or French sources do you read to get a good understanding of how the world sees the US? How many blogs do you read in foreign languages? Probably not many, unless you’re fluent in those languages.
At the recent web cast of the Google Factory Tour, researcher Franz Och presented the current state of the Google Machine Translation Systems. He compared translations of the current Google translator, and the status quo of the Google Research Lab’s activities. The results were highly impressive. A sentence in Arabic which is now being translated to a nonsensical “Alpine white new presence tape registered for coffee confirms Laden” is now in the Research Labs being translated to “The White House Confirmed the Existence of a New Bin Laden Tape.”
At I.B.M., That Google Thing Is So Yesterday
Suddenly, the computer world is interesting again. The last three months of 2004 brought more innovation, faster, than users have seen in years. The recent flow of products and services differs from those of previous hotly competitive eras in two ways. The most attractive offerings are free, and they are concentrated in the newly sexy field of "search."
Google, current heavyweight among systems for searching the Internet, has not let up from its pattern of introducing features and products every few weeks. Apart from its celebrated plan to index the contents of several university libraries, Google has recently released "beta" (trial) versions of Google Scholar, which returns abstracts of academic papers and shows how often they are cited by other scholars, and Google Suggest, a weirdly intriguing feature that tries to guess the object of your search after you have typed only a letter or two. Give it "po" and it will show shortcuts to poetry, Pokémon, post office, and other popular searches. (If you stop after "p" it will suggest "Paris Hilton.") In practice, this is more useful than it sounds.
Microsoft, heavyweight of the rest of computerdom, has scrambled to catch up with search innovations from Google and others. On Dec. 10, a company official made a shocking disclosure. For years Microsoft had emphasized the importance of "WinFS," a fundamentally new file system that would make it much easier for users to search and manage information on their own computers. Last summer, the company said that WinFS would not be ready in time for inclusion with its next version of Windows, called Longhorn. The latest news was that WinFS would not be ready even for the release after that, which pushed its likely delivery at least five years into the future. This seemed to put Microsoft entirely out of the running in desktop search. But within three days, it had released a beta version of its new desktop search utility, which it had previously said would not be available for months.
Meanwhile, a flurry of mergers, announcements and deals from smaller players produced a dazzling variety of new search possibilities. Early this month Yahoo said it would use the excellent indexing program X1 as the basis for its own desktop search system, which it would distribute free to its users. The search company Autonomy, which has specialized in indexing corporate data, also got into the new competition, as did Ask Jeeves, EarthLink, and smaller companies like dTSearch, Copernic, Accoona and many others.
I have most of these systems running all at once on my computer, and if they don't melt it down or blow it up I will report later on how each works. But today's subject is the virtually unpublicized search strategy of another industry heavyweight: I.B.M.
From Factoids to Facts
What is the next stage in the evolution of internet search engines? AltaVista demonstrated that indexing the entire world wide web was feasible. Google's success stems from its uncanny ability to sort useful web pages from dross. But the real prize will surely go to whoever can use the web to deliver a straight answer to a straight question. And Eric Brill, a researcher at Microsoft, intends that his firm will be the first to do that.
Dr Brill's initial crack at the problem is a system called "Ask MSR" (MSR stands for Microsoft Research). This program uses information on web pages to respond to questions to which the answer is a single word or phrase — such as "When was Marilyn Monroe born?" Ask MSR starts by manipulating the question in various ways: by identifying the verb, for example, and then changing its tense or moving it into different positions in the sentence ("Marilyn was Monroe born", "Marilyn Monroe was born" and so on). The resulting phrases are then fed into a search engine, and documents containing matching strings of words are retrieved. It sounds a promiscuous strategy, but gibberish phrases produce few matches, so, as Dr Brill puts it, "being wrong is very cheap."
Summarizer Gets the Idea
The flow of a document, including the topics covered and the ways those topics relate to each other, is clear to people. It would be useful if computer systems that process documents could also learn how to consider topic information.
Teaching a computer to discern a document's topics and create a summary that puts the topics in the correct order is a bit like teaching it how to put together the pieces of a jigsaw puzzle. Current methods focus on finding the right match for a given piece.
MIT and Cornell University researchers have developed a system that does the equivalent of putting pieces that show parts of a mountain and pieces that show parts of the sky into separate groups, and putting the sky pieces above the mountain pieces.
After training on subject-specific sets of documents and document summaries, the researchers' automatic classification algorithm, or content model, can extract the topic structure of a group of related topics. It selects and orders topics to generate to summary.
"Aristotle" (The Knowledge Web)
(DANNY HILLIS:) I have always envied Alexander the Great, because he had Aristotle as a personal tutor. In those days, Aristotle knew pretty much everything there was to know. Even better, Aristotle understood the mind of Alexander. He understood which topics interested Alexander, what Alexander knew and did not know, and what kinds of explanations Alexander preferred. Aristotle had been a student of Plato, and he was himself a great teacher. We know from his writings that he was full of examples, explanations, arguments, and stories. Through Aristotle, Alexander had the knowledge of the world at his command.
Of course no one today knows all that is known, in the sense that Aristotle did. Now there is far too much knowledge for that to be possible. The scientific revolution, and the technological revolution that followed it, led to a self-reinforcing explosion of knowledge. The explosion continues. Today not even the most highly trained scientist, the most scholarly historian, or the most competent engineer can hope to have more than a general overview of what is known. Only specialists understand most of the new discoveries in science, and even the specialists have trouble keeping up.
This problem isn't new. In 1945, Vannevar Bush wrote an essay for Atlantic Monthly about out the problem of too much knowledge. He wrote,
Researchers develop computer application to 'read' medical literature, find significant data relationships
Until recently, researchers and their assistants spent countless hours poring over seemingly endless volumes of journals and scientific literature for information pertinent to their studies in fields such as cancer, AIDS, pediatrics and cardiology.
But thanks to new software developed by bioinformatics researchers at UT Southwestern Medical Center at Dallas, scientists can now easily identify obscure commonalities in research data and directly relate them to their studies, saving money and speeding the process of discovery.
The computer application is unique because it "emulates the scientific thought process" in researching data, said Dr. Harold "Skip" Garner, professor of biochemistry and internal medicine, who with former graduate student Dr. Jonathan Wren developed the system.
Software paraphrases sentences
We paraphrase all the time, often without thinking about it. Try to give a computer the means to reword a sentence, however, and it becomes apparent that figuring out how to say it differently is complicated.
Researchers at Cornell University have tapped a pair of unlike sources -- on-line journalism and computational biology -- to make it possible to automatically paraphrase whole sentences. The researchers used gene comparison techniques to identify word patterns from different news sources that described the same event.
The method could eventually allow computers to more easily process natural language, produce paraphrases that could be used in machine translation, and help people who have trouble reading certain types of sentences.
Pick a Language, Any Language
Like the elite group of government agents on the 1960s television show, a group of computer scientists and natural language experts were given a "mission" earlier this week: within a month, build a program that translates between English and a randomly chosen language.
The project, funded by the Defense Advanced Research Projects Agency, challenges researchers to quickly build translation tools when unforeseen needs arise.
Intel talks up lip-reading software
Intel has released software that lets computers read lips, a step forward that could lead to better voice recognition applications.
The power of voice
CHEAP STORAGE MAKES it feasible to save voice recordings of many of our meetings, teleconferences, interviews, and other conversations. In some environments -- call centers and certain sectors of finance and government -- that already happens. But audio surveillance isn't yet routine, and the thorny legal, social, and cultural issues it raises haven't yet been widely debated. That's because, until now, there was no practical way to mine voice data.
As with other forms of practical obscurity, this artificial barrier was bound to topple, and now it has. Fast-Talk Communications' revolutionary phonetic indexing and search technology brings the magic of full-text search to the formerly opaque realms of audio recordings and video soundtracks. If you consider the way in which Google has already become everyone's indispensable "outboard brain," and extrapolate that to all the voice data that exists -- and to the vast quantities that soon will exist -- it's hard to avoid the conclusion that Fast-Talk is one of the most disruptive technologies in the pipeline.
Jane: an experiment in audio-based pro-active, remote collaboration
A common concept in wearable computing is the audio-only interface. We attempted a "Wizard of Oz" study in order to assess the effectiveness of such an interface. Results of this study revealed more about the limitations of human collaboration over audio channels than the hypothetical computer assistant we wished to study. This paper explores the implications of using audio-only communication to enable a mobile ad hoc collaboration between a user and a remote, pro-active human assistant.
Computing | Human interface | Input interface | Natural language | Output interface
OpenCyc.org
| Name: | OpenCyc.org | |
| URL: | http://www.opencyc.org/ | |
| Categories: | Natural language | Knowledge representation | Expert systems | AI | |
| Referred: | 509 | |
GATE text processing
| Name: | GATE text processing | |
| URL: | http://www.gate.ac.uk/ | |
| Categories: | Natural language | |
| Referred: | 322 | |
Speech Recognition Follies
Speech recognition software is stymied by word combinations that sound alike (homophones), says columnist David Pogue.
Prosody and speech recognition
Computers will really understand what you say when they know how you feel when you say it.
Sometimes it's not what you say, but how you say it. That's a truism most people can relate to--but computers can't. While speech recognition software has gotten quite good at understanding words, it still can't discern punctuation like periods and commas, or choose between ambiguous sentences whose meanings depend on the speaker's emotion. That's because such software still can't make sense of the
