The project

Safe stewardship, preservation and unified access to heterogeneous scientific datasets in the environmental domain.

This project aims at providing the scientific community with an integrated, cost-efficient and sustainable infrastructure following established standards for data documentation, archiving, search and exchange. The main emphasis is on coordination of efforts and requirements, enhancing interoperability and user interfaces to existing facilities, as well as development of toolboxes for data documentation in order to reduce technical and governance obstacles. The infrastructure must provide online access to data and facilitate long term preservation, in order to maximise the benefit of public funds invested in the datasets. The intention is to link existing institutional and discipline specific systems to promote science regardless of geographical and institutional location. The proposed approach is to further develop the idea of the DOKIPY system, to include more catalogues at each institution, to augment it with user support tools and online data access at all nodes, eventually establishing a unified catalogue for multidisciplinary climate and environmental scientific data. The proposal is not intentionally discipline specific, but in order to show the benefit of synchronised catalogues across institutions and disciplines a stepwise approach is suggested, where potential integration with other disciplines (e.g. human sciences, life sciences etc) is delayed until an acceptable level of interoperability has been achieved for climate and environmental data.

The following objectives have been identified:

  1. Development of a sustainable governance structure
    1. Promote and support the interdisciplinary ADC Polar Metadata Profile universally – also outside of the Polar Regions – to link scientists across disciplines and nations.
    2. Promote international accreditation of the participating data centres.
    3. Assess funding models to ensure a sustainable interdisciplinary scientific infrastructure.
    4. Coordinate requests for NorStore functionality from the climate and environmental data community.
  2. Implementation and maintenance of distributed heterogeneous data management
    1. Maintain the infrastructure for data management developed during IPY to support projects and scientists and renew it towards international data management efforts, emphasising the semantic brokering required to promote interdisciplinary science.
    2. Extend the data management infrastructure from the IPY to more catalogues hosted by the participating institutions in order to establish a unified catalogue.
    3. Maintain, improve and further standardize data set documentation to help data filtering and long term stewardship of heterogeneous datasets.
    4. Development of tools for data providers simplifying data documentation and submission, with emphasis on discovery and use metadata generation using international standards.
    5. Standardise the internal interfaces between data centres in order to effectively support a cost effective unified view of process oriented data.
    6. Provide direct, online access to datasets, in accordance with the usage or distribution constraints described in the metadata of each dataset.
    7. Provide users with best practises, guides and tools for proper use of data and information infrastructures.
  3. Long term preservation of datasets
    1. Provide permanent, unified access to preserved datasets. (Actual data storage must be handled by existing and emerging systems external to this project, at the host institutions and elsewhere.)
    2. Development of data preservation guidelines for data centres and data producers (scientists), including usage of unique, persistent and citable identifiers.
    3. Identify potential higher order services that are useful and needed by scientists to minimise technical obstacles encountered in scientific work.
    4. Develop and implement higher order services.
Outline of NorDataNet integrations with data repositories. Not all are fully integrated.

 

The Norwegian Scientific Data Network (NorDataNet) is a distributed infrastructure where existing systems are linked using index metadata to constitute a virtual infrastructure. The initial outline is shown in the figure below. This is neither a new infrastructure, nor a hardware proposal. It is an adaptation of existing systems intended to provide the end user with seamless search and retrieval of data relevant to their scientific work. Most of the tasks can be undertaken with very modest investments in hardware. The most costly hardware element is the long term preservation of datasets which is largely covered already by allocated funding for the NorStore archive for “homeless datasets”.

This project is intended to link infrastructure projects like NorStore, NORMAP and NMDC with existing mandated archives at the participating institutions using internationally accepted standards. Further intentions are to create two way linkages with corresponding international facilities and to establish a unified interface that allows scientists to search a number of catalogues simultaneously without having to access each individual portal. This approach will provide global visibility to Norwegian datasets and scientists.

Initial services will be provided for geophysical and biological data from the data centres that contributed to the IPY. In the future the services may be extended to include other types of research data, in particular economy and human sciences to establish a toolbox that can help answer the questions relating to the consequences of environmental change.