One of the key challenges in
the development of open distributed systems, including today's
World-Wide Web, organizational intranets, and the emerging Semantic
Web, is enabling the exchange of meaningful information across
applications which may use autonomously developed schemas for
organizing locally available data. Typical examples are databases
using different schemas, and document repositories using different
classification structures. Interoperability among these applications
depends critically on the ability to discover/use
mappings between these heterogeneous schemas. Today, mappings
are still largely done by hand, in a labor-intensive and error-prone
process. As a consequence, semantic integration issues have
now become a key bottleneck in the deployment of a wide variety
management applications. The high cost of this bottleneck has
motivated numerous research activities on methods for describing
mappings, manipulating them, and generating them automatically
As to the last problem, the general problem can be formally
defined as the problem of generating mappings between elements
(or set of elements) belonging to heterogeneous schemas. A mapping
M is a set of triples (m,n,R) (called mapping-elements), where
m is a node in a schema S, n is a node in a schema S' and R
is the relation holding between the two nodes.
The approaches proposed in the literature can be analyzed along
three dimensions: the general architecture, the techniques used
for generating mappings and the returned results.
The main objective of THEME 1 is the study of proposed solutions
for semantic representation of the contents of information resources
on the Web, particularly referring to data-intensive resources
and resources with scarcely structured contents.
The representation and integration of such information resources
requires devising a language for expressing domain ontologies,
with the main goal of using them for query execution and answering
With respect to THEME 1, the unit at the University of Trento
will be involved in the following activities:
Contribution to the report on the state of the art of languages
and emerging standards for representation of ontologies and
classifications (deliverable D1.R1). The unit will focus
especially on the representation of concept hierarchies (taxonomies),
as they are very common on the web (see e.g. the Web directories
of Google or Yahoo, or the directory structure of web sites),
and the discovery of mappings across them is a special (but
very relevant) case, which is of special use in document sharing
Contribution to the definition of a language for representing
domain ontologies and classifications (deliverable D1.R2).
The unit will focus on the part of the specification which deals
with concept hierarchies and semantically annotated taxonomies
in general. This work will build on the definition of CTXML,
an XML-based language proposed by Bouquet, Magnini, Serafini,
Zanobini in the AAAI-02 workshop on Meaning Negotiation (Edmonton,
CAnada, July 2002).
Such a language must be compatible with the W3C standard (XML,
RDF, RDFS, XML Schema, OWL) and with the query languages defined
in THEME 3.
Design and development of a tool for the automatic population
of predefined classifications (deliverable D1.P5). This tool
will provide a simple way of associating documents to a pre-defined
classification schema, and will be used to add a non-structured
web resources to the system via its orgaization in homogeneous
clusters of documents on the same topic. Used techniques will
include natural language processing, text mining and case-based
The main aim of the THEME 2 is the design and development of
techniques for (semi-) automatic generation of mappings holding
between domain ontologies.
Report on the state of the art of languages and techniques for
mapping domain ontologies (deliverable D2.R1). The objective
of this activity is twofold:
* the definition of a common framework for domain ontology mapping,
including a common definition of what a mapping should include;
* the analysys and comparison of state of the art techniques
for mapping ontologies, including an assessment of the contribution
that each technique may provide to the discovery of the kind
of mappings defined in the previous item.
In the second phase, new techniques for creating mappings across
domain ontologieswill be elaborated (deliverable D2.R2). In
particular, this will lead to:
a. the definition of a language for representing complex mappings
between heterogeneous ontologies and classifications. Such a
language will be used to represent the kind of mapping defined
in the common framework (previous item);
b. the specification of an algorithm for allowing the (semi-)
automatic generation of complex mappings between heterogeneous
comain ontologies. The algorithm will be based of the CTXMATCH
algorithm elaborated at the University of Trento.
Development of a platform for discovering and managing semantic
mappings across domain ontologies (deliverable D2.P1). The platform
can be viewed as a service which can be invoked to generate
mappings across domain ontologies, namely a highly modular and
domain independent system, where single components can be plugged,
unplugged or suitably customized.
Depending on the architectural choices made in the project,
the platform may be used as a shared service at a global level,
or used at a local level in a peer-to-peer attitude.
The main objective of this theme is the development of techniques
for query eleboration based on the use of domain ontologies
(THEME 1) and of mappings across them (THEME 2).
Contribution to the analysis of the query languages and of query
rewriting techniques based on ontologies and classifications
Contribution to the definition of a query language and of some
rewriting techniques based on ontologies (deliverable D3.R3).
In particular, the contributions of our unit will be on the
definition of a notion of 'semantic distance' between a concept
used in a query based on an ontology T1 and other concepts (belonging
to different ontologies) onto which the original concept is
semantic mapping. This distance will be one of the thresholds
used to define the notion of a "good" answer to a
query, especially when the execution of such a query requires
the usage of mappings across different domain ontologies.