UofT DBGroup  Flavio Rizzolo, Ph.D.
  Home | Projects | Publications  

  Active Projects
  • DDI: The Data Documentation Initiative (DDI) is an effort to create an international standard for describing data from the social, behavioral, and economic sciences. Expressed in XML, the DDI metadata specification now supports the entire research data life cycle. DDI metadata accompanies and enables data conceptualization, collection, processing, distribution, discovery, analysis, repurposing, and archiving. DDI 4.0 will have an abstract data model in UML (Unified Modelling Language) together with different implementations in XML Schema, RDF/OWL Ontology, relational database schema and other languages. This next generation DDI, developed within the DDI Moving Forward project, will be easier to use and better able to quickly adapt to changing future needs.

  • CSPA: The Common Statistical Production Architecture (CSPA) provides a reference architecture for official statistics. It complements and uses pre-existing frameworks (like GSBPM and GSIM) by describing the mechanisms to design, build and share components with well-defined functionality that can be integrated in multiple processes easily. CSPA focuses on relating the strategic directions of the HLG to shared principles, practices and guidelines for defining, developing and deploying Statistical Services in order to produce statistics more efficiently. CSPA will facilitate the sharing and reuse of Statistical Services both across and within statistical organizations. CSPA also provides a starting point for concerted developments of statistical infrastructure and shared investment across statistical organizations. CSPA is sometimes referred to as a "plug and play" architecture. The idea is that replacing Statistical Services should be as easy as pulling out a component and plugging another one in. CSPA is a project of the High-Level Group for the Modernisation of Statistical Production and Services (HLG).

  • GSIM: The Generic Statistical Information Model (GSIM) provides a common language to describe information that supports the whole statistical production process, from the identification of user needs through to the dissemination of statistical products. It is designed to bring together statisticians, methodologists and IT specialists to modernize and streamline the production of official statistics. GSIM is a reference framework of internationally agreed definitions, attributes and relationships that describe the pieces of information (called “information objects” in GSIM) that are used in the production of official statistics. By defining objects common to all statistical production, regardless of subject matter, GSIM enables statistical organizations to rethink how their business could be more efficiently organized. GSIM is a project of the High-Level Group for the Modernisation of Statistical Production and Services (HLG)..

  Past Projects
  • Contextual data quality for business intelligence: Business analytics applications require data quality assessment at high levels of abstraction, where subjectivity, usefulness, sense and interpretation play a central role. From this perspective, the meaning and quality of the data are context dependent. In our framework, the context is given as a system of integrated data and metadata of which the data source under quality assessment is a particular and special component. In addition to the data under assessment, the context can have an expanded schema, additional data, or even be virtually defined as a system of integrated views. Clean answers to queries posed to the data under assessment will be relative to what is available in the context. More details on our contextual framework can be found here. This project  is part of the Business Intelligence Network.

  • User-centric, model-driven data integration: Data warehouses were envisioned to facilitate reporting and analysis by providing a model for the flow of data from operational systems to decision support environments. Typically, there is an impedance mismatch between the conceptual, high-level view of business intelligence users (and tools) accessing the data warehouse and the physical representation of the multidimensional data, often stored in DBMSs. To bridge these two levels of abstraction we developed the Conceptual Integration Modeling (CIM) Framework. The CIM tool takes the user's high-level visual specification and compiles it into a complex set of mappings and views to be used at runtime by business analytics applications, as described here. The CIM models and system architecture are described in CoRR (arXiv:1009.0255).  CIM can also be integrated with a business layer by providing data to business processes and key performance indicators via mappings, as described here. This project  is part of the Business Intelligence Network.

  • Papyrus: A multinational European project for building a cross-discipline digital library engine that draws content from one domain and makes it available to a community of users who belong to a totally different discipline. Some ontology management research issues in this context include modeling concept evolution and semantic updates, and support for dynamic attributes (attributes for which domains and ranges are specified declaratively). 

  • DescribeX: A Framework for Exploring and Querying XML Web Collections. My PhD thesis introduced a framework that supports constructing heterogeneous XML synopses that can be declaratively defined and manipulated by means of regular expressions on XPath axes. The tool implementing this framework is tailored to data intensive applications in information integration, XQuery/XPath evaluation, XML retrieval and Web services. The thesis can be downloaded from CoRR (arXiv:0807.2972) and the University of Toronto Libraries TSpace website

  • Temporal XML: A proposal for modeling and querying temporal data in XML. Our implementation validates temporal XML documents against the temporal constraints imposed by the model and summarizes metadata by adding the time dimension to structural path summaries.

  • ToX (the Toronto XML Server): A repository of XML data and metadata that provides the key functions in document management, including registering documents, indexing document structure (with ToXin), defining logical views of distributed data sources, and querying document content and structure. My master's thesis introducing ToXin can be downloaded from here and the University of Toronto Libraries TSpace website.

Last Updated December, 2014