Home

Themes
  Theme 1
  Theme 2
  Theme 3

Research Units
1) University of Modena and Reggio Emilia
2) University of Trento
3) University of Bologna
4) University of Roma Tre

Meetings

Deliverables

Publications

THEME 1

DESIGN AND EXTENSION OF A DOMAIN ONTOLOGY

Units involved in this Theme

University of Modena and Reggio Emilia

University of Trento

University of Bologna

University of Roma Tre


Goals of this Theme

The first goal is the definition of an ontology language able to express the structural and semantic descriptions of a source contents, in terms of metadata, compliant with the W3C (XML, RDF, RDFS, XML Schema, OWL) standard. In particular, in order to be able to support the individuation of useful sources to solve a query, the language has to support the synthetic characterization of the source contents (instances). The study of emerging standards will be focused on the issue of managing the evolution of an ontology, and on the integrated management of heterogeneous and autonomously developed ontologies. Another issue is the generation of a wrapper to extract data from the source. To address this issue, innovative and scalable techniques for automatically generating a wrapper will be developed; in particular, algorithms for inferring the schema of a data intensive web site will be used to generate a set of wrappers for extracting data from the whole site. For web sites offering unstructured contents, the research will concentrate on the classification of documents in hierarchical representations of concepts (taxonomies) and on the discovery of mappings among taxonomies.

 

 
Working Phases

Phase 1 (6 months: December 1, 2004 - May 31, 2005)

During the first phase the research activities will be jointly conducted by all the research units. The objective of this phase is to study the solutions proposed in the literature about languages for the definition of ontologies (deliverable D1.R1). The study of emerging standards will be focused on the issue of managing the evolution of an ontology, and on the integrated management of heterogeneous and autonomously developed ontologies.

Phase 2 (6 months: June 1, 2004 - November 31, 2005)

During this phase, the involved research units will define the language for the specification of the domain ontology (deliverable D1.R2). The ontology language will be developed from the ODLI3 language, which has been defined in MOMIS. The proposed language will satisfy the following requirements: first, it will be compliant to the W3C standards; second, it will be sufficiently expressive to allow the integrated management of heterogeneous data sources; also, it will be able to represent extensional concepts in order to ease the task of querying relevant sources. Another issue studied during this phase will be the evolution of the ontology domain due to the insertion of a new information source.
The insertion of a new information source implies two main issues. The first issue is the generation of a wrapper to extract data from the source. To address this issue, innovative and scalable techniques for automatically generating a wrapper will be developed; in particular, the research unit of Roma Tre will define algorithms for inferring the schema of a data intensive web site (deliverable D1.R5); the schema will describe the main classes of pages offered by the site, and it will then be used to generate a set of wrappers for extracting data from the whole site. For web sites offering unstructured contents, the research will concentrate on the classification of documents in hierarchical representations of concepts (taxonomies) and on the discovery of mappings among taxonomies.
The second issue to address when adding a new information source is the management of the inconsistencies that the new source can introduce. In particular, a change in the concepts of the ontology can introduce inconsistencies both among related concepts, both with respect to the mappings to other sources. the reasearch activities will focus on the design and development of prototype to integrate a new source by means of a lexicon-based semi-automatic process (deliverable D1.P1).
A further activity will be the development of techniques to produce "content summaries", in order to provide a "profile" of the information sources. Such a profile will focus on statistical properties that characterize the information source for querying purposes (deliverable D1.R3).
Finally, a critical analysis of techniques for the extraction of lexical chains will be conducted. The aim is to develop semantic methods that improve the effectiveness of traditional keyword-based search engines (deliverable D1.R4).

Phase 3 (12 months: December 1, 2005 - November 31, 2006)

The first activity consists of semantically enriching the scheme of data-intensive web site. It will be based on the technique of lexical chains for which novel linear algorithms will be developed to efficiently represent web documents (deliverable D1.R6).
In the last phase of Theme 1, four software prototypes are released. The first prototype will implement probing (querying) techniques of sources and it will produce content summaries of results obtained (deliverable D1.P2); it will consider the ontological information of sources and the constraints they entail. The second prototype will aim at building lexical chains extracted from the analysis of website (deliverable D1.P3). The third prototype will associate loosely structured web documents to given classifying schemes (deliverable D1.P5). The fourth prototype will automatically infer the scheme of a data-intensive web site (deliverable D1.P4).


 

 

Mailing List :  wisdom@dbgroup.unimo.it

Webmaster: Enrico Ronchetti