This course will provide an introduction to the FAIR guiding principles for data management, their specific implementations within geoscience and practical exercises. Practical steps towards Findable, Accessible, Interoperable and Reuseable data are discussed and exercised emphasising the data provider and data consumer perspectives. Practical introductions to the various elements of the FAIR guiding principles are related to concepts of discovery metadata, use metadata, persistent identifiers (e.g. Digital Object Identifiers) and how they help traceability of decisions (e.g. through scientific citation of data), containers for data (e.g. NetCDF), semantics for geoscientific data (glossaries, thesaurus, taxonomies and ontologies) in a interdisciplinary context and related to terminology as a mechanism for scientific collaboration, tools for generating FAIR data (e.g. how to work with Rosetta and other tools for converting data, how to use Python), how to work with FAIR data, how to publish data with the help of data centres, how to publish data with the help of schema.org (focusing on discoverability by Google), national structures that facilitate data sharing (e.g. Norwegian Marine Data Centre, Norwegian Scientific Data Network, Norwegian Infrastructure for Research Data), how these are connected and how to work with Data Management Plans that are/or will be required by funding agencies and resources providers (e.g. UNINETT Sigma2). Practical work will be based on students bringing their own data, evaluation of their FAIRness and how to improve FAIRness for these using Rosetta and Python to create NetCDF according to the Climate and Forecast (CF) Convention with Attribute Convention for Dataset Discovery (ACDD) embedded.
At the end of the course, students will know the FAIR guiding principles, best practises of FAIR data within geoscience and practical approaches to achieving FAIR data using Rosetta and Python as well as how to work with data management plans for their future career.
The first day will be a full day (6 hours) of lectures, introducing different concepts, one day will be for self study where students work their own dataset. Lecturers will be available by Zoom (open room outside lecture hours) and a dedicated Slack channel through the full week to support students. A more detailed outline of the lectures will be provided online, students are required to describe and upload the dataset they will work with. At the end of the course (last day), each student presents the status of FAIRness of their data following the exercises undertaken. This session is scheduled for 5 hours (10 minutes presentation by each student and a longer discussion session).
Course home page:
- Øystein Godøy (firstname.lastname@example.org, Head of Division for Remote Sensing and Data Management, The Norwegian Meteorological Institute)
- Markus Fiebig (email@example.com, Senior Scientist at the Atmosphere and Climate Department, Norwegian Institute for Air Research)
- Torill Hamre (Torill.Hamre@nersc.no, Research Leader/Senior Researcher at the Scientific Data Management Division, Nansen Environmental and Remote Sensing Center)
- Lara Ferrighi (firstname.lastname@example.org, Research Scientist at the Remote Sensing and Data Management Division, The Norwegian Meteorological Institute)
Further contacts: to contact the project about this course, we recommend to use the contact available at https://www.nordatanet.no/en/contact. This will ensure a more prompt answer to your enquiry, compare to writing emails directly to one of the lecturer.
Relevant info: Some of your questions/issues might be already tacked in the FAQ
- Online meeting ethics
- Use of break-out rooms
- Use of “raise hand” function
- Interactive course!
09:15-10:15: Motivation: Why do we need data management? (Øystein)
- Why do we need data management?
- Data Sharing and Management Snafu in 3 Short Acts
- Science life cycle/Data life cycle
- How to change data sharing culture.
- What are the FAIR data principles?
- How do they help with good data management?
- External boundary conditions by funding agencies and publishers, scientific data as service.
- Data management plan.
10:30-11:45: The basics: data and metadata (Lara, Markus)
- What are data? What are metadata?
- Discovery, site, and use metadata.
- What is provenance?
- Plan your experiment. Which data and metadata do you need to record?
- How to record various types of metadata.
- Metadata templates (Arven etter Nansen, EBAS)
- Gap handling for metadata (missing elements).
11:45-12:30: Lunch break
12:30-13:45: Data structure/formatting (Øystein, Markus)
- NetCDF/CF grid, trajectory, profile, timeseries
- Granularity requirements
- Standard names, vocabularies
14:00-15:30: Summary of the day
Group work. Groups will present a summary of today’s lessons in their own words. One groups per section. (Moderator: Lara)
09:00-10:15: Documentation of data (Torill)
- Tools for documenting data
- Rosetta (web application), NCO/CDO (command line), Python (netcdf4), R
- More detail on Python
- Validation tools for NetCDF-CF.
- What is actually validated?
- NorDataNet validator, PUMA validator
- Rosetta in more detail
- Profiles, time series, trajectory
- Template concept, benefits for processing multiple datasets, possibilities for collaboration (e.g. place template files in GitHub)
- Examples of e.g. CTD profile from Seabird sensor
10:30-11:45: Workshop: Document your own data (Torill, Øystein, Lara, Markus)
11:45-12:30: Lunch break
12:30-13:45: Publishing your data (Øystein, Markus)
- Mandated and long term archives
- Data publications
- PID (Explicit mention DOI)
- Data policies / Licensing
- Tracking usage (using DOI)
- NorDataNet (distributed network of data centres)
- NIRD RDA
- GAW repositories
- Repositories for model data
Group work. Groups will present a summary of today’s discoveries in their own words. (Moderator: Markus)
09:00-10:15: How to exploit / process further / consume data (Torill)
- Interfaces to data
- Examples of benefits when using truly interoperable data.
- Interfaces: WMS, OGC API, OpenAPI, OPeNDAP, RESTful (Restful in general)
- Integration in tools e.g.:
10:30-11:45: Workshop: Analysing data (Moderator: Torill, Øystein)
11:45-12:30: Lunch break
12:30-13:45: Data sharing ethics & culture, and how NorDataNet services help. (Øystein)
- Data sharing ethics, certainly before publishing
- Data Life Cycle and its relation to the scientific workflow, revisited from a scientists point of view
- Data sharing in a cultural perspective and relations to the scientific workflow
- NorDataNet service overview
14:00-15:30: Student summary of the course, what has been useful (and not). (Moderator: Lara)
Group work; Groups will present a summary of today’s discoveries in their own words.