Research Unit: University of Trento

Department of Information and Communication Technology

 

Research Program of the Unit (model B)

Research Program Coordinator of the Unit

Prof. Bouquet Paolo

Department of INFORMATION and COMMUNICATION TECHNOLOGY
Faculty of ECONOMY
University of TRENTO

Via Sommarive, 14 - 38100 Trento, Italy
Tel  :+39 0461 882088
Fax :+39 0461 882093

E-mail: bouquet@dit.unitn.it
Home page: http://dit.unitn.it/~bouquet

Participants to this Research Unit
 
Participant
Department
Qualification
BOUQUET PAOLO Dep. of Information and Communication Technology Researcher
BLANZIERI ENRICO Dep. of Information and Communication Technology Researcher
GIUNCHIGLIA FAUSTO Dep. of Information and Communication Technology Full Professor
ROSCHELOVA ALBENA Dep. of Information and Communication Technology Ph.D. Student
SHVAIKO PAVEL Dep. of Information and Communication Technology Ph.D. Student
ZANOBINI STEFANO Dep. of Information and Communication Technology Ph.D. Student

Specific Title of the Research Program of this Unit

Languages, models,techniques and tools for discovering, representing and managing semantic mappings across heterogeneous, distributed domain ontologies and schemas.

 
Description of the Research Program of this Unit

One of the key challenges in the development of open distributed systems, including today's World-Wide Web, organizational intranets, and the emerging Semantic Web, is enabling the exchange of meaningful information across applications which may use autonomously developed schemas for organizing locally available data. Typical examples are databases using different schemas, and document repositories using different classification structures. Interoperability among these applications depends critically on the ability to discover/use
mappings between these heterogeneous schemas. Today, mappings are still largely done by hand, in a labor-intensive and error-prone process. As a consequence, semantic integration issues have now become a key bottleneck in the deployment of a wide variety of information
management applications. The high cost of this bottleneck has motivated numerous research activities on methods for describing mappings, manipulating them, and generating them automatically (or semi-automatically).
As to the last problem, the general problem can be formally defined as the problem of generating mappings between elements (or set of elements) belonging to heterogeneous schemas. A mapping M is a set of triples (m,n,R) (called mapping-elements), where m is a node in a schema S, n is a node in a schema S' and R is the relation holding between the two nodes.
The approaches proposed in the literature can be analyzed along three dimensions: the general architecture, the techniques used for generating mappings and the returned results.

THEME 1


The main objective of THEME 1 is the study of proposed solutions for semantic representation of the contents of information resources on the Web, particularly referring to data-intensive resources and resources with scarcely structured contents.
The representation and integration of such information resources requires devising a language for expressing domain ontologies, with the main goal of using them for query execution and answering (THEME 3).
With respect to THEME 1, the unit at the University of Trento will be involved in the following activities:


PHASE 1
Contribution to the report on the state of the art of languages and emerging standards for representation of ontologies and classifications (deliverable D1.R1). The unit will focus
especially on the representation of concept hierarchies (taxonomies), as they are very common on the web (see e.g. the Web directories of Google or Yahoo, or the directory structure of web sites), and the discovery of mappings across them is a special (but very relevant) case, which is of special use in document sharing and knowledge
management applications.

PHASE 2
Contribution to the definition of a language for representing domain ontologies and classifications (deliverable D1.R2).
The unit will focus on the part of the specification which deals with concept hierarchies and semantically annotated taxonomies in general. This work will build on the definition of CTXML, an XML-based language proposed by Bouquet, Magnini, Serafini, Zanobini in the AAAI-02 workshop on Meaning Negotiation (Edmonton, CAnada, July 2002).
Such a language must be compatible with the W3C standard (XML, RDF, RDFS, XML Schema, OWL) and with the query languages defined in THEME 3.

PHASE 3
Design and development of a tool for the automatic population of predefined classifications (deliverable D1.P5). This tool will provide a simple way of associating documents to a pre-defined classification schema, and will be used to add a non-structured web resources to the system via its orgaization in homogeneous clusters of documents on the same topic. Used techniques will include natural language processing, text mining and case-based reasoning methods.

THEME 2


The main aim of the THEME 2 is the design and development of techniques for (semi-) automatic generation of mappings holding between domain ontologies.

PHASE 1
Report on the state of the art of languages and techniques for mapping domain ontologies (deliverable D2.R1). The objective of this activity is twofold:
* the definition of a common framework for domain ontology mapping, including a common definition of what a mapping should include;
* the analysys and comparison of state of the art techniques for mapping ontologies, including an assessment of the contribution that each technique may provide to the discovery of the kind of mappings defined in the previous item.

PHASE 2
In the second phase, new techniques for creating mappings across domain ontologieswill be elaborated (deliverable D2.R2). In particular, this will lead to:
a. the definition of a language for representing complex mappings between heterogeneous ontologies and classifications. Such a language will be used to represent the kind of mapping defined in the common framework (previous item);
b. the specification of an algorithm for allowing the (semi-) automatic generation of complex mappings between heterogeneous comain ontologies. The algorithm will be based of the CTXMATCH algorithm elaborated at the University of Trento.

PHASE 3
Development of a platform for discovering and managing semantic mappings across domain ontologies (deliverable D2.P1). The platform can be viewed as a service which can be invoked to generate mappings across domain ontologies, namely a highly modular and domain independent system, where single components can be plugged, unplugged or suitably customized.
Depending on the architectural choices made in the project, the platform may be used as a shared service at a global level, or used at a local level in a peer-to-peer attitude.

THEME 3


The main objective of this theme is the development of techniques for query eleboration based on the use of domain ontologies (THEME 1) and of mappings across them (THEME 2).

PHASE 1
Contribution to the analysis of the query languages and of query rewriting techniques based on ontologies and classifications (deliverable D3.R1).

PHASE 2
Contribution to the definition of a query language and of some rewriting techniques based on ontologies (deliverable D3.R3).
In particular, the contributions of our unit will be on the definition of a notion of 'semantic distance' between a concept used in a query based on an ontology T1 and other concepts (belonging to different ontologies) onto which the original concept is linked through
semantic mapping. This distance will be one of the thresholds used to define the notion of a "good" answer to a query, especially when the execution of such a query requires the usage of mappings across different domain ontologies.