Data complexity and its diversity have been rapidly expanding over the last years, spanning from large amounts of unstructured and semi-structured data to semantically rich available knowledge, and driven by ever increasing sophistication in data management requirements. Numerous applications in various domains such as social-media, healthcare, telecommunication, e-commerce and web analytics, business intelligence, and cyber-security, require new methods and tools for collecting and extracting entities and their relationships from unstructured and heterogeneous data sources to be transformed into useful knowledge and provide insights. While lots of useful facts are being added on a daily basis on multitude web and enterprise data sources, they are still hidden behind barriers of language constraints, data heterogeneity and ambiguity, and the lack of proper query interfaces. In addition, novel search and data mining methods are required to provide expressive and powerful discovery capabilities, yet intuitive enough, for exploring the large amounts of entity-relationship data.



This workshop shall serve as an open forum for discussing the new research challenges in search and mining of large scale ER data extracted from multitude of unstructured and semi-structured data sources, driven by recent industry trends and requirements in various domains and increasing academic interest. The workshop will bring together researchers from different communities working on similar problems in the context of ER and other semantic data, allowing for cross-fertilization between areas. During the workshop, we will identify common problems and their various solution approaches in DB, KM, and IR.



  • 08:50-09:00 Gathering and Welcome
  • 09:00-10:00 Invited Talk - "Knowledge for/from People for/from Computers", Michael Witbrock
  • 10:00-10:30 Full paper presentation: "Adding Smarter Systems Instead of Human Annotators: Re-ranking for System Combination"
  • 10:30-11:00 Coffee break
  • 11:00-12:00 Invited Talk - "Finding Related Entities", Krisztian Balog
  • 12:00-12:30 Full paper presentation: "Nut Case: What does It Mean?: Understanding Semantic Relationship between Nouns in Noun Compounds through Paraphrasing and Ranking the Paraphrases"
  • 12:30-14:00 Lunch
  • 14:00-15:00 Invited Talk - "Is This Entity Relevant to Your Needs?", David Carmel
  • 15:00-15:30 Full paper presentation: "Enabling Type/Condition-Specified Entity/Fact Retrieval Using Semantic Knowledge Extracted from Wikipedia"
  • 15:30-16:00 Coffee break
  • 16:00-16:30 Poster session
  • 16:30-17:20 Open panel
  • 17:20-17:30 Closing remarks

Invited Talks


  • Knowledge for/from People for/from Computers
    Michael Witbrock
    Vice President for Research at Cycorp

    Abstract: To enable true human/computer collaboration, knowledge needs to be freely communicated between the forms that each finds most useful (text, speech and images, notably for people; logic, program fragments, probabilities, databases and numeric values, notably for machines). Free flow between these kinds of representation has not yet been achieved, but we are making progress. In this talk, I will focus on elements of this progress at Cycorp, where a partial ability to map between logical and textual representations, sometimes interactively, is beginning to significantly enhance our ability to build broad-coverage, reasoning-based applications.

    Bio: Michael has a PhD in Computer Science from Carnegie Mellon University, and currently is Vice President for Research at Cycorp. Before joining Cycorp, in 2001, to direct its knowledge formation and dialogue processing efforts, he had been Principal Scientist at Terra Lycos, working on integrating statistical and knowledge based approaches to understanding web user behavior, a research scientist at Just Systems Pittsburgh Research Center, working on statistical summarization, and a systems scientist at Carnegie Mellon on the Informedia spoken document information retrieval project. He also performed dissertation work in the area of speaker modeling. He is author of numerous publications in areas ranging across neural networks, parallel computer architecture, multimedia information retrieval, web browser design, genetic design, computational linguistics and speech recognition

  • Finding Related Entities
    Krisztian Balog
    Norwegian University of Science and Technology

    Abstract: Over the past decade, increasing attention has been devoted to Information Retrieval tasks that go beyond document retrieval. Indeed, a large fraction of search queries are better answered by returning specific objects (or their properties) instead of just any type of documents that merely mention them.

    In 2009, the Entity track was launched at the Text REtrieval Conference (TREC) with the aim to evaluate entity-oriented search tasks on the Web. The main task investigated is related entity finding (REF): given a source entity, a relation and a target type, identify entities that stand in the required relation with the source entity, while satisfying the target type constraint.

    In this talk I will present an overview of the TREC Entity benchmarking effort, highlight prominent approaches to the REF task, and also discuss challenges and future research directions in entity-oriented search.

    Bio: Dr. Krisztian Balog is a postdoctoral researcher at the Computer and Information Science Department of the Norwegian University of Science and Technology (NTNU) in Trondheim, Norway. Previously, he was a postdoc at the Information and Language Processing Systems (ILPS) group at the Informatics Institute of the University of Amsterdam, where he received his PhD in Computer Science in 2008 under the supervision of prof.dr. Maarten de Rijke. He holds M.Sc. degrees in Computer Science from the E?tv?s Lor?nd University, Budapest (2005) and from the Vrije Universiteit, Amsterdam (2004). He received the Best Doctoral Consortium Paper Award at the ACM SIGIR Conference in 2007 for his dissertation topic, and was awarded with the Victorine van Schaickprijs for his PhD dissertation in 2009. He also holds the Best Paper Award from the 2011 ECIR conference. Balog is a leading author behind influential models for expertise retrieval. His general research interests lie in the use and development of information retrieval, information extraction, and machine learning techniques for intelligent information access tasks. His current research concerns entity retrieval and formal models, specifically probabilistic models, such as language models.

  • Is This Entity Relevant to Your Needs?
    David Carmel
    IBM Research - Haifa

    Abstract: Relevance is a fundamental concept, though not completely understood, in Information Science as well as Information Retrieval (IR). While there is still no clear consensus on the meaning of this concept, many successful IR models have been developed for ranking search results based on their ``relevance likelihood''. Most existing approaches for relevance approximation are based on measuring some kind of similarity between the user's query and retrieved items -- an approach that was found to be extremely successful for retrieving relevant data to the user's needs. The blurriness of the relevance concept also arises in new emerging IR domains such as searching over Entity Relationship Data (ERD). Search in this domain is driven by the identification, extraction, and exploitation of real-world entities and their relationships, as represented in unstructured or semi-structured textual sources. What makes such entities relevant to the user? Is it the same question that the IR community deals with for many decades? Can we adopt exiting IR models into this new domain in a straight forward manner? Does similarity measurement between entities and the user's query is enough for identifying relevant items? In this talk I'll provide an overview on some approaches that deal with relevance approximation in several related areas such as question answering, faceted search, and XML search. Then I'll raise some research directions that are related to the fundamental questions mentioned above in the ERD domain. Relevance in this domain can be approximated by measuring the proximity between items in the entity graph, integrated with traditional similarity measurements between items. I'll argue that, in general, using only one of these two fundamental approaches is inferior to the hybrid approach for relevance approximation. I will also argue that for many information needs in the ERD domain, exploratory search is essential as users should interactively explore the rich and complicated domain for relevant entities, either by restricting the search results to specific facets such as the entity type or other entity attributes, or through graph navigation.

    Bio: Dr. David Carmel is a Research Staff Member at the Information Retrieval group at IBM Haifa Research Lab. David's research is focused on search in the enterprise, query performance prediction, social search, and text mining. For several years David taught the Introduction to IR course at the CS department at Haifa university. At IBM, David is a key contributor to IBM enterprise search offerings. David is a co-founder of the Juru search engine which provides integrated search capabilities to several IBM products, and was used as a search platform for several studies in the TREC conferences. David has published more than 80 papers in IR and Web journals and conferences, and serves on the editorial board of the IR journal and as a senior PC member or an Area Chair of many conferences (SIGIR, WWW, WSDM. CIKM). He organized a number of workshops and taught several tutorials at SIGIR, and WWW. David is co-author of the book "Estimating the Query Difficulty for Information Retrieval", published by Morgan & Claypool in 2010, and the co-author of the paper "Learning to estimate query difficulty" who won the Best Paper Award at SIGIR 2005. David earned his PhD in Computer Science from the Technion, Israel Institute of Technology in 1997.



  • Haggai Roitman, IBM Research - Haifa, Israel
  • Ralf Schenkel, Saarland University and Max-Planck-Institut Informatik - Saarbrücken, Germany
  • Marko Grobelnik, J. Stefan Institute, Department for Intelligent Systems, Slovenia.

Program Committee


  • Krisztian Balog, Norwegian University of Science and Technology, Norway
  • Roi Blanco, Yahoo! Research, Spain
  • David Carmel, IBM Research, Israel
  • Kevin C.C. Chang, University of Illinois at Urbana-Champaign, USA
  • Lise Getoor, University of Maryland, USA
  • Yosi Mass, IBM Research, Israel
  • Einat Minkov, Haifa University, Israel
  • Oren Kurland, Technion, Israel
  • Pavel Serdyukov, Yandex LLC, Russia
  • Kavitha Srinivas, IBM Research, USA
  • Martin Theobald, Max-Planck Inst., Germany
  • Sivan Yogev, IBM Research, Israel
  • Elad Yom-Tov, Yahoo! Research, USA
  • Cong Yu, Google, USA

Accepted Papers


  • Enabling Type/Condition-Specified Entity/Fact Retrieval Using Semantic Knowledge Extracted from Wikipedia , Sofia J. Athenikos
  • ''Nut Case: What does It Mean?'': Understanding Semantic Relationship between Nouns in Noun Compounds through Paraphrasing and Ranking the Paraphrases , Derry Tanti Wijaya, Philip Gianfortoni
  • Adding Smarter Systems Instead of Human Annotators: Re-ranking for Slot Filling System Combination , Suzanne Tamang
  • Context and Target Configurations for Mining RDF Data , Ziawasch Abedjan, Felix Naumann
  • An Authorization Model for Entity Search , Haggai Roitman, Sivan Yogev
  • Enabling ER Search and Data mining using the Historical Thesaurus of English , Jean Anderson, Marc Alexander, Christian Kay, Muhammad Sarwar

Topics of Interest


The workshop has two main themes. The first is search and discovery over rich entity-relationship data. The second is entity-relationship data mining methods. More specifically, the following list of topics are covered by this workshop:

  • ER data collection methods.
  • ER data extraction, cleansing, representation, and processing.
  • ER data resolution and disambiguation.
  • Efficient Indexing methods.
  • Query languages and interfaces (keyword-based, semantic, hybrid, visual), query processing and optimization.
  • Ranking methods and top-k queries over ER data.
  • Similarity and proximity search.
  • Context-based retrieval over ER data.
  • Temporal aspects in ER search and data mining.
  • Exploratory search and faceted search over ER data.
  • Personalized search over ER data.
  • ER data mining (e.g., feature extraction, clustering, classification, authority and link analysis, trust, recommendation, etc).
  • ER data fusion, integration, and lineage.
  • Privacy models for ER search and data mining.
  • Large scale ER search and data mining methods.
  • Search and data mining over incomplete or noisy ER data.
  • Search and data mining over multilingual ER data.
  • Novel applications using ER search and data mining.
  • Evaluation methodologies.
  • Usability methods for ER data exploration.

Submission Guidelines


We invite you to submit both long (6 pages) and short (2 pages) papers in ACM format. Long papers will be presented in a session of talks. All papers (long and short) will be further presented in a poster show over (if local setup allows) or right after lunch, possibly including demos of systems.

Manuscripts should be formatted using the ACM camera-ready templates (both for MS word and Latex) available at http://www.acm.org/sigs/pubs/proceed/template.html. There are two styles on the website. Both the Strict Adherence to SIGS and the Tighter Alternate style are allowed. Papers cannot exceed 6 pages in length for long papers and 2 pages for short papers. Accepted papers will be published at ACM Digital Library

Manuscripts should be submitted using the following EasyChair link: https://www.easychair.org/conferences/?conf=smer2011



  • Oct 3, 2011: Workshop final programme was added

  • Aug 4, 2011: Accepted papers and invited talks were added

  • June 29, 2011: Paper submission deadline extended to July 9

  • May 11, 2011: Submission guidelines were added.

  • May 9, 2011: Welcome to SMER'11 workshop webpage.
    Details of paper submission and keynote will be added soon.



For any questions, please mail to Haggai Roitman (firstname@il.ibm.com)