Data-mining

Need body text here.

Key ideas:

Benefits
Discovery of security threats
Discovery of health threats

Costs
Need to investigate false positives
Threat to privacy

Search concepts, not keywords, IBM tells business

IBM plans to give away key search technologies for corporate data retrieval that use concepts and facts instead of simpler "keyword" searches relied upon by consumer Web companies such as Google Inc., the world's largest computer company said on Monday.

While simple but powerful keyword searches have revolutionized how Internet users locate and retrieve information, IBM is looking to transform how office workers sift through the piles of data stored inside organizations.

"I don't see any of the major players moving into this area," Arthur Ciccolo, head of search technology at IBM Research, said of how major consumer Internet search companies such as Google, Yahoo Inc. and Microsoft have focused on the public Internet instead of private record data retrieval.

IBM plans to openly offer other software developers its Unstructured Information Management Architecture (UIMA), a technology that can analyze text within documents and other media to understand latent meanings, relationships and facts.

Data-mining | Expert systems | Knowledge management | Knowledge representation | Semantic web | Technology | Topic maps | Efficiency

IBM expands corporate search ambitions

IBM's mission to spice up corporate search and become a "Google for the enterprise" continues in earnest.

By the end of the year, Big Blue intends to release an update to its corporate information-management tools, which are designed to bring order to potentially thousands of data sources in a company's network.

Code-named Serrano, the product will use technologies including artificial intelligence and data mining to derive more meaning from corporate documents. It will also have a revamped search engine and front-end tool designed to make hunting for company information as straightforward as searching the Web, according to IBM.

Agents | Data-mining | Knowledge management | Productivity | Technology

At I.B.M., That Google Thing Is So Yesterday

Suddenly, the computer world is interesting again. The last three months of 2004 brought more innovation, faster, than users have seen in years. The recent flow of products and services differs from those of previous hotly competitive eras in two ways. The most attractive offerings are free, and they are concentrated in the newly sexy field of "search."

Google, current heavyweight among systems for searching the Internet, has not let up from its pattern of introducing features and products every few weeks. Apart from its celebrated plan to index the contents of several university libraries, Google has recently released "beta" (trial) versions of Google Scholar, which returns abstracts of academic papers and shows how often they are cited by other scholars, and Google Suggest, a weirdly intriguing feature that tries to guess the object of your search after you have typed only a letter or two. Give it "po" and it will show shortcuts to poetry, Pokémon, post office, and other popular searches. (If you stop after "p" it will suggest "Paris Hilton.") In practice, this is more useful than it sounds.

Microsoft, heavyweight of the rest of computerdom, has scrambled to catch up with search innovations from Google and others. On Dec. 10, a company official made a shocking disclosure. For years Microsoft had emphasized the importance of "WinFS," a fundamentally new file system that would make it much easier for users to search and manage information on their own computers. Last summer, the company said that WinFS would not be ready in time for inclusion with its next version of Windows, called Longhorn. The latest news was that WinFS would not be ready even for the release after that, which pushed its likely delivery at least five years into the future. This seemed to put Microsoft entirely out of the running in desktop search. But within three days, it had released a beta version of its new desktop search utility, which it had previously said would not be available for months.

Meanwhile, a flurry of mergers, announcements and deals from smaller players produced a dazzling variety of new search possibilities. Early this month Yahoo said it would use the excellent indexing program X1 as the basis for its own desktop search system, which it would distribute free to its users. The search company Autonomy, which has specialized in indexing corporate data, also got into the new competition, as did Ask Jeeves, EarthLink, and smaller companies like dTSearch, Copernic, Accoona and many others.

I have most of these systems running all at once on my computer, and if they don't melt it down or blow it up I will report later on how each works. But today's subject is the virtually unpublicized search strategy of another industry heavyweight: I.B.M.

Agents | Collective intelligence | Data-mining | Expert systems | Intelligence amplification | Knowledge management | Knowledge representation | Natural language | Semantic web | Technology | Efficiency

Google Is Adding Major Libraries to Its Database

Google, the operator of the world's most popular Internet search service, plans to announce an agreement today with some of the nation's leading research libraries and Oxford University to begin converting their holdings into digital files that would be freely searchable over the Web.

It may be only a step on a long road toward the long-predicted global virtual library. But the collaboration of Google and research institutions that also include Harvard, the University of Michigan, Stanford and the New York Public Library is a major stride in an ambitious Internet effort by various parties. The goal is to expand the Web beyond its current valuable, if eclectic, body of material and create a digital card catalog and searchable library for the world's books, scholarly papers and special collections.

Altruism | Collective intelligence | Cooperation, competition, conflict | Copyright | Data-mining | Digital divide | e-books | Enlightened self-interest | Intellectual property | Intelligence amplification | Knowledge management | Memetics | Openness | Progress | Technology | Technology and Society | Efficiency | Extropy

Summarizer gets the idea

The flow of a document, including the topics covered and the ways those topics relate to each other, is clear to people. It would be useful if computer systems that process documents -- like search engines and programs that generate summaries of news articles -- could also learn to consider topic information.

Teaching a computer to discern a document's topics and create a summary that puts the topics in the correct order is a bit like teaching it how to put together the pieces of a jigsaw puzzle. Current methods focus on finding the right match for a given piece.

Researchers from the Massachusetts Institute of Technology and Cornell University have developed a system that does the equivalent of putting pieces that show parts of a mountain and pieces that show parts of the sky into separate groups, and putting the sky pieces above the mountain pieces, said Lillian Lee, an associate professor of computer science at Cornell University.

Data-mining | Knowledge management

Summarizer Gets the Idea

The flow of a document, including the topics covered and the ways those topics relate to each other, is clear to people. It would be useful if computer systems that process documents could also learn how to consider topic information.

Teaching a computer to discern a document's topics and create a summary that puts the topics in the correct order is a bit like teaching it how to put together the pieces of a jigsaw puzzle. Current methods focus on finding the right match for a given piece.

MIT and Cornell University researchers have developed a system that does the equivalent of putting pieces that show parts of a mountain and pieces that show parts of the sky into separate groups, and putting the sky pieces above the mountain pieces.

After training on subject-specific sets of documents and document summaries, the researchers' automatic classification algorithm, or content model, can extract the topic structure of a group of related topics. It selects and orders topics to generate to summary.

Computing | Data-mining | Knowledge representation | Natural language | Semantic web

"Aristotle" (The Knowledge Web)

(DANNY HILLIS:) I have always envied Alexander the Great, because he had Aristotle as a personal tutor. In those days, Aristotle knew pretty much everything there was to know. Even better, Aristotle understood the mind of Alexander. He understood which topics interested Alexander, what Alexander knew and did not know, and what kinds of explanations Alexander preferred. Aristotle had been a student of Plato, and he was himself a great teacher. We know from his writings that he was full of examples, explanations, arguments, and stories. Through Aristotle, Alexander had the knowledge of the world at his command.

Of course no one today knows all that is known, in the sense that Aristotle did. Now there is far too much knowledge for that to be possible. The scientific revolution, and the technological revolution that followed it, led to a self-reinforcing explosion of knowledge. The explosion continues. Today not even the most highly trained scientist, the most scholarly historian, or the most competent engineer can hope to have more than a general overview of what is known. Only specialists understand most of the new discoveries in science, and even the specialists have trouble keeping up.

This problem isn't new. In 1945, Vannevar Bush wrote an essay for Atlantic Monthly about out the problem of too much knowledge. He wrote,

AI | Cooperation, competition, conflict | Creativity | Data-mining | Expert systems | Futurology | Groupware | Human interface | Intelligence amplification | Knowledge management | Knowledge representation | Learning | Mental enhancement | Mind mapping | Natural language | PDAs | Problem-solving | Semantic web | Serendipity | Technology | Technology and Society | The Arrow of Morality | Topic maps | Troubleshooting | Ubiquitous computing | Visualization | Efficiency | Extropy

Scirus

"Scirus is the most comprehensive science-specific search engine on the Internet. Driven by the latest search engine technology, Scirus searches over 167 million science-specific Web pages, enabling you to quickly: • Pinpoint scientific, scholarly, technical and medical data on the Web. • Find the latest reports, peer-reviewed articles and journals that other search engines miss. • Offer unique functionalities designed for scientists and researchers. Scirus has proved so successful at locating science-specific results on the Web that the Search Engine Watch Awards voted Scirus 'Bes
Data-mining | Knowledge management

In search of the deep Web

When Yahoo announced its Content Acquisition Program on March 2, press coverage zeroed in on its controversial paid inclusion program, whereby customers can pony up in exchange for enhanced search coverage and a vaunted "trusted feed" status. But lost amid the inevitable search-wars storyline was another, more intriguing development: the unlocking of the deep Web.

Those of us who place our faith in the Googlebot may be surprised to learn that the big search engines crawl less than 1 percent of the known Web. Beneath the surface layer of company sites, blogs and porn lies another, hidden Web. The "deep Web" is the great lode of databases, flight schedules, library catalogs, classified ads, patent filings, genetic research data and another 90-odd terabytes of data that never find their way onto a typical search results page.

Today, the deep Web remains invisible except when we engage in a focused transaction: searching a catalog, booking a flight, looking for a job. That's about to change. In addition to Yahoo, outfits like Google and IBM, along with a raft of startups, are developing new approaches for trawling the deep Web. And while their solutions differ, they are all pursuing the same goal: to expand the reach of search engines into our cultural, economic and civic lives.

Data-mining | Digital divide | Economics | Knowledge management | Sociology | Technology and Society | Transparency and Privacy

Researchers develop computer application to 'read' medical literature, find significant data relationships

Until recently, researchers and their assistants spent countless hours poring over seemingly endless volumes of journals and scientific literature for information pertinent to their studies in fields such as cancer, AIDS, pediatrics and cardiology.

But thanks to new software developed by bioinformatics researchers at UT Southwestern Medical Center at Dallas, scientists can now easily identify obscure commonalities in research data and directly relate them to their studies, saving money and speeding the process of discovery.

The computer application is unique because it "emulates the scientific thought process" in researching data, said Dr. Harold "Skip" Garner, professor of biochemistry and internal medicine, who with former graduate student Dr. Jonathan Wren developed the system.

Cooperation, competition, conflict | Cyc | Data-mining | Knowledge management | Knowledge representation | Natural language | Semantic web | Technology | Topic maps | Efficiency

A Fountain of Knowledge

The great strength of computers is that they can reliably manipulate vast amounts of data very quickly. Their great weakness is that they don’t have a clue as to what any of that data actually means.

Computer scientists have been laboring for decades to eliminate that weakness, with some limited successes in some limited domains. Now, IBM Corp. appears to have made a major breakthrough in the field of machine understanding. The results could spell big business not just for IBM but for data miners, content providers, retailers, political consultants, market analysts, and any other group that relies on information as part of its stock in trade.

IBM’s breakthrough is called WebFountain—half a football field’s worth of rack-mounted processors, routers, and disk drives running a huge menagerie of programs. All this hardware and software is dedicated to one purpose: making sense of the churning ocean of information, opinion, and falsehood that roils the Internet every second of every day.

AI | Data-mining | Knowledge management | Knowledge representation | Reputation | Semantic web | Topic maps | Transparency and Privacy | Efficiency

Utopia theory

From theories of pedestrian movement and traffic flow to voting processes, economic markets and war, researchers are striving towards a physics of society

"It may be", said US sociologist George Lundberg in 1939, "that the next great developments in the social sciences will come not from professed social scientists, but from people trained in other fields." Take a look at any issue of a physical-sciences journal in the past five years and you will see one such field staking its claim vigorously. Physics is muscling its way into social science. Not content with explaining the behaviour of atoms and electrons, semiconductors, sand and space-time, physicists are now setting out to understand the behaviour of people.

Complexity | Data-mining | Game theory | Group behavior | Simulation | Technology and Society | Empathy

On the Web, Research Work Proves Ephemeral

Electronic Archivists Are Playing Catch-Up in Trying to Keep Documents From Landing in History's Dustbin.

It was in the mundane course of getting a scientific paper published that physician Robert Dellavalle came to the unsettling realization that the world was dissolving before his eyes.

The world, that is, of footnotes, references and Web pages.

Cooperation, competition, conflict | Data-mining | Knowledge management | Technology | Efficiency

Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering

Bow (or libbow) is a library of C code useful for writing statistical text analysis, language modeling and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document retrieval (arrow) and document clustering (crossbow).

Data-mining | Knowledge representation | Language | Semantic web | Software platforms
XML feed