Information Society Technologies

Search In Audio Visual Content Using Peer-to-peer IR

IST FP6 Project


The Vision


Existing web search technologies are limited to text-based search, yet still 99% of the information on the web consists of audio-visual content that is searchable only by associated metadata and not by its actual content. This amazing restriction has raised the question of how search technologies can tap into the potential reservoir of information. SAPIR aims to develop cutting-edge technology that will break the barriers and enable search engines to search large-scale, audio-visual information, by content. Furthermore, SAPIR's distributed P2P technology will open the doors to smaller search providers rather than relying on a very few search giants that can afford to run centralized services. Finally, SAPIR also has the foresight to make this type of information available on many devices, including mobile devices. This will maximise the availability and ease of distribution of information.

The broad scope of SAPIR is to develop theories and technologies for next-generation search techniques that would effectively and efficiently deliver relevant information in the presence of exponentially growing (i.e., dynamic) volumes of distributed multimedia data. Fundamental to our approach is the development of scalable solutions that address the requirements of future generations of massively distributed data produced in a variety of applications. The scale of the problem can be gauged from the fact that almost everything we see, read, hear, write, and measure will soon be available to computerized information systems.

While structured search methods apply to attributed-type data that yield records that match the search query exactly, SAPIR offers a more modern approach to searching information through similarity searching, which is used in content-based retrieval for queries involving complex data such as images, videos, time series, text documents, and DNA sequences. The notion of similarity has been studied extensively in the field of psychology and it has been found to have an important role in cognitive sciences as characterized by the following quotation: "An ability to assess similarity lies close to the core of cognition. The sense of sameness is the very keel and backbone of our thinking. An understanding of problem solving, categorization, memory retrieval, inductive reasoning, and other cognitive processes require that we understand how humans assess similarity." (Goldstone, R.L. Similarity. In R.A. Wilson & F.C. Keil (eds.) MIT Encyclopedia of the Cognitive Sciences, Cambridge, MA, MIT Press 2006, pp. 763-765.)

Similarity search is based on gradual rather than exact relevance using a distance metric that, together with the database, forms a mathematical metric space. The obvious advantage of a similarity search is that the results can be ranked according to their estimated relevance. However, current similarity search structures, which are mostly centralized, reveal linear scalability with respect to the data search size, which is not sufficient for the expected data volume dimension of the problem. With the increasing diversity of digital data types covering practically all forms of fact representation, computerized data processing must provide adequate tools for similarity searching.
Google-like web search engines utilise specialised search mechanisms for web pages, based on text indexes and link indexes. Since less than 1% of web data is in textual form, while the rest of the data has a multimedia/streaming nature, we need to extend our next-generation search technology to accommodate these heterogeneous types. We believe that the diversity and uncertainty of terminologies and schema-like annotations will make precise querying elusive if not hopeless, both on the web and within large-scale federations of intra- and inter-organisational data sources. Consequently, the following innovative technologies are required to support similarity-based multi-modal searches of large volumes of distributed audio-visual content:

  • Content analysis and salient feature extraction for indexing (unlike text-only methods used today).
  • An architecture that can support processing of content and indexing close to the data (unlike the centralized architecture used today).
  • Ability to pose queries by example and similarity measures that compute similarity between the sample content and the indexed content as well as taking into account contextual clues and user annotations (unlike most text-based queries today).
  • An architecture that enables sharing of data and pushing content to an index (unlike the "pull"-based methods that are common today).
  • Methods to upload as well as search for content from Mobile devices (unlike the mostly browser-based interfaces in use today).

The SAPIR consortium is built of a mix of academic and research institutes as well as top leading industrial companies including an SME.
SAPIR will provide major innovations for powerful P2P search on audiovisual content. It will be based on a scalable, completely decentralized, largely self-organizing P2P system with unique capabilities for query routing and advanced querying and ranking using similarity measures over combinations of several media. SAPIR will combine user and community annotations with features that are automatically extracted from multimedia content. Audio-visual content produced from mobile devices by end-user peers will be indexed and searched for content in real-time across the P2P network while respecting IPR and protecting against spam.


 
 

Partners

  
IBM Haifa Research Laboratory
IBM Haifa Research Laboratory
 
 
ISTI - CNR
ISTI - CNR
 
 
Max-Planck Institute for Informatics
Max-Planck Institute for Informatics
 
 
University of Padova
University of Padova
 
 
Eurix
Eurix
 
 
Xerox Research Centre Europe
Xerox Research Centre Europe
 
 
Masaryk University
Masaryk University
 
 
Telefonica Investigacion y Desarrollo
Telefonica Investigacion y Desarrollo
 
 
Telenor
Telenor