Using data documentation standards - DDI and NADA

Geoffrey Greenwell, OECD

The Data Documentation Initiative (DDI) is a metadata specification for the social and behavioural sciences. The DDI is an international standard for describing data resulting from research (data collections) in the social sciences. There are two development lines of the DDI: the DDI-codebook and the DDI-Lifecycle. The DDI-codebook is the original schema which continues to be supported and has been continually added to since its inception in (Version 1.0) in 2000. The current version stands at 2.5. This is the schema that is used by the International Household Survey Network (IHSN) and implemented through software tools developed by the IHSN. The IHSN is specifically interested in the archiving and dissemination of household survey microdata. These tools represent an “input” XML Editor developed by NESSTAR called the NESSTAR Metadata Editor. This is a free standalone desktop software designed to be used by data archivists and curators to document survey data into the DDI format. The “output” tool is an open-source software developed by the IHSN called the National Data Archive or NADA. The preservation of the DDI codebook has been part of an international effort led by the World Bank to retain the DDI as a core standard used by national statistical offices worldwide. The tools have been disseminated in over 70 countries and have been effective in promoting good practices in data documentation and dissemination and contributed to the Data Revolution. National Data Catalogs are available on-line with searchable metadata. They serve to inform researchers and policy makers of survey data that is available. Over 4,000 surveys have been documented using the DDI since the establishment of the IHSN in 2006.

This presentation is designed to provide an initial exposure to the DDI-codebook; the documentation tool developed and supported by NESSTAR and the output tool known as the National Data Archive. It will examine the possibilities of using the National Data Archive as a preferred tool for federating the documentation effort in the national statistical system to independent producers. The ability to use the tools in this fashion would leverage the extensibility feature of the DDI and provide a common platform to inform researchers and survey planners of the available survey data held by different producers. In addition to using the standard across the various producers, the DDI can also be used to document available administrative data. The architecture of the National Data Archive allows for independent management of collections and preservation of data access to the producer yet share the common metadata in one central searchable survey catalog.

The National Data Archive also allows for the dissemination of survey microdata by defining different levels of access. The definition of the appropriate access is set within the software and can be managed by each data producer. The data dissemination regimes include the ability to disseminate: public use files, licensed data files, data preserved in an enclave. The presentation will review work undertaken both in Latin America (i.e. Colombia and their federated system of data dissemination) and other Portuguese speaking countries where the IHSN tools have been largely disseminated through the Accelerated Data Program (ADP) and include: Mozambique, Angola and Cape Verde.

This presentation will also demonstrate an additional feature in the NADA software known as the Citation Manager. This is a functionality that allows to link research publications to specific surveys for easy on-line literature reviews.