Search concepts, not keywords, IBM tells business
IBM plans to give away key search technologies for corporate data retrieval that use concepts and facts instead of simpler "keyword" searches relied upon by consumer Web companies such as Google Inc., the world's largest computer company said on Monday.
While simple but powerful keyword searches have revolutionized how Internet users locate and retrieve information, IBM is looking to transform how office workers sift through the piles of data stored inside organizations.
"I don't see any of the major players moving into this area," Arthur Ciccolo, head of search technology at IBM Research, said of how major consumer Internet search companies such as Google, Yahoo Inc. and Microsoft have focused on the public Internet instead of private record data retrieval.
IBM plans to openly offer other software developers its Unstructured Information Management Architecture (UIMA), a technology that can analyze text within documents and other media to understand latent meanings, relationships and facts.
At I.B.M., That Google Thing Is So Yesterday
Suddenly, the computer world is interesting again. The last three months of 2004 brought more innovation, faster, than users have seen in years. The recent flow of products and services differs from those of previous hotly competitive eras in two ways. The most attractive offerings are free, and they are concentrated in the newly sexy field of "search."
Google, current heavyweight among systems for searching the Internet, has not let up from its pattern of introducing features and products every few weeks. Apart from its celebrated plan to index the contents of several university libraries, Google has recently released "beta" (trial) versions of Google Scholar, which returns abstracts of academic papers and shows how often they are cited by other scholars, and Google Suggest, a weirdly intriguing feature that tries to guess the object of your search after you have typed only a letter or two. Give it "po" and it will show shortcuts to poetry, Pokémon, post office, and other popular searches. (If you stop after "p" it will suggest "Paris Hilton.") In practice, this is more useful than it sounds.
Microsoft, heavyweight of the rest of computerdom, has scrambled to catch up with search innovations from Google and others. On Dec. 10, a company official made a shocking disclosure. For years Microsoft had emphasized the importance of "WinFS," a fundamentally new file system that would make it much easier for users to search and manage information on their own computers. Last summer, the company said that WinFS would not be ready in time for inclusion with its next version of Windows, called Longhorn. The latest news was that WinFS would not be ready even for the release after that, which pushed its likely delivery at least five years into the future. This seemed to put Microsoft entirely out of the running in desktop search. But within three days, it had released a beta version of its new desktop search utility, which it had previously said would not be available for months.
Meanwhile, a flurry of mergers, announcements and deals from smaller players produced a dazzling variety of new search possibilities. Early this month Yahoo said it would use the excellent indexing program X1 as the basis for its own desktop search system, which it would distribute free to its users. The search company Autonomy, which has specialized in indexing corporate data, also got into the new competition, as did Ask Jeeves, EarthLink, and smaller companies like dTSearch, Copernic, Accoona and many others.
I have most of these systems running all at once on my computer, and if they don't melt it down or blow it up I will report later on how each works. But today's subject is the virtually unpublicized search strategy of another industry heavyweight: I.B.M.
Summarizer Gets the Idea
The flow of a document, including the topics covered and the ways those topics relate to each other, is clear to people. It would be useful if computer systems that process documents could also learn how to consider topic information.
Teaching a computer to discern a document's topics and create a summary that puts the topics in the correct order is a bit like teaching it how to put together the pieces of a jigsaw puzzle. Current methods focus on finding the right match for a given piece.
MIT and Cornell University researchers have developed a system that does the equivalent of putting pieces that show parts of a mountain and pieces that show parts of the sky into separate groups, and putting the sky pieces above the mountain pieces.
After training on subject-specific sets of documents and document summaries, the researchers' automatic classification algorithm, or content model, can extract the topic structure of a group of related topics. It selects and orders topics to generate to summary.
"Aristotle" (The Knowledge Web)
(DANNY HILLIS:) I have always envied Alexander the Great, because he had Aristotle as a personal tutor. In those days, Aristotle knew pretty much everything there was to know. Even better, Aristotle understood the mind of Alexander. He understood which topics interested Alexander, what Alexander knew and did not know, and what kinds of explanations Alexander preferred. Aristotle had been a student of Plato, and he was himself a great teacher. We know from his writings that he was full of examples, explanations, arguments, and stories. Through Aristotle, Alexander had the knowledge of the world at his command.
Of course no one today knows all that is known, in the sense that Aristotle did. Now there is far too much knowledge for that to be possible. The scientific revolution, and the technological revolution that followed it, led to a self-reinforcing explosion of knowledge. The explosion continues. Today not even the most highly trained scientist, the most scholarly historian, or the most competent engineer can hope to have more than a general overview of what is known. Only specialists understand most of the new discoveries in science, and even the specialists have trouble keeping up.
This problem isn't new. In 1945, Vannevar Bush wrote an essay for Atlantic Monthly about out the problem of too much knowledge. He wrote,
Researchers develop computer application to 'read' medical literature, find significant data relationships
Until recently, researchers and their assistants spent countless hours poring over seemingly endless volumes of journals and scientific literature for information pertinent to their studies in fields such as cancer, AIDS, pediatrics and cardiology.
But thanks to new software developed by bioinformatics researchers at UT Southwestern Medical Center at Dallas, scientists can now easily identify obscure commonalities in research data and directly relate them to their studies, saving money and speeding the process of discovery.
The computer application is unique because it "emulates the scientific thought process" in researching data, said Dr. Harold "Skip" Garner, professor of biochemistry and internal medicine, who with former graduate student Dr. Jonathan Wren developed the system.
A Fountain of Knowledge
The great strength of computers is that they can reliably manipulate vast amounts of data very quickly. Their great weakness is that they don’t have a clue as to what any of that data actually means.
Computer scientists have been laboring for decades to eliminate that weakness, with some limited successes in some limited domains. Now, IBM Corp. appears to have made a major breakthrough in the field of machine understanding. The results could spell big business not just for IBM but for data miners, content providers, retailers, political consultants, market analysts, and any other group that relies on information as part of its stock in trade. IBM’s breakthrough is called WebFountain—half a football field’s worth of rack-mounted processors, routers, and disk drives running a huge menagerie of programs. All this hardware and software is dedicated to one purpose: making sense of the churning ocean of information, opinion, and falsehood that roils the Internet every second of every day.Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering
Bow (or libbow) is a library of C code useful for writing statistical text analysis, language modeling and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document retrieval (arrow) and document clustering (crossbow).
Intertwingularity
Intertwingularity is not generally acknowledged - people keep pretending they can make things deeply hierarchical, categorizable and sequential when they can't. Everything is deeply intertwingled.
- Ted Nelson
The power of voice
CHEAP STORAGE MAKES it feasible to save voice recordings of many of our meetings, teleconferences, interviews, and other conversations. In some environments -- call centers and certain sectors of finance and government -- that already happens. But audio surveillance isn't yet routine, and the thorny legal, social, and cultural issues it raises haven't yet been widely debated. That's because, until now, there was no practical way to mine voice data.
As with other forms of practical obscurity, this artificial barrier was bound to topple, and now it has. Fast-Talk Communications' revolutionary phonetic indexing and search technology brings the magic of full-text search to the formerly opaque realms of audio recordings and video soundtracks. If you consider the way in which Google has already become everyone's indispensable "outboard brain," and extrapolate that to all the voice data that exists -- and to the vast quantities that soon will exist -- it's hard to avoid the conclusion that Fast-Talk is one of the most disruptive technologies in the pipeline.
Big Idea, Bad Idea
Is it possible to catalogue every human idea? Japan-based researcher Darryl Macer thinks so, and last month he proposed in the journal Nature to count the number of human ideas and map them. This plan, while a clever attention grabber, will not succeed and demonstrates a worrisome mode of thinking.
Macer, an associate professor of biological sciences at the University of Tsukuba in Japan, writes that "although the human mind appears to be infinitely complex ... I would propose that the number of ideas that human beings have is finite, and call for a project to map the ideas of the human mind."
