The core partner data centres that are integrated in NorDataNet are listed in https://www.nordatanet.no/en/node/69. In addition to this NorDataNet harvests information on relevant datasets from a number of other data centres. The data centre responsible for the data presented is usually (but not always) listed in the discovery metadata. In essence NorDataNet is an aggregating service that combines information from a number of existing data centres.
Citation of data and service
If you use data retrieved through this portal, please acknowledge our funding source:
Research Council of Norway, project number 245967/F50, Norwegian Scientific Data Network.
Always remember to cite data when used!
Citation information for individual datasets is often provided in the metadata. However, not all datasets have this information embedded in the discovery metadata. On a general basis a citation of a dataset include the same components as any other citation:
author, title,
year of publication,
publisher (for data this is often the archive where it is housed),
edition or version,
access information (a URL or persistent identifier, e.g. DOI if provided)
All partner repositories of NorDataNet support Digital Object Identifiers (DOI), but not all datasets are minted. Whether or not minted depends often on source of the data (e.g. operational data are often yet not minted). However, all data centres support persistent identifiers according to local systems. The information required to properly cite a dataset is normally provided in the discovery metadata the datasets.
Brief user guide
The Data Access Portal has information in 3 columns. An outline of the content in these columns is provided above. When first entering the search interface, all potential datasets are listed. Datasets are indicated in the map and results tabulation elements which are located in the middle column. The order of results can be modified using the "Sort by" option in the left column. On top of this column is normally relevant guidance information to user presented as collapsible elements.
If the user want to refine the search, this can be done by constraining the bounding box search. This is done in the map - the listing of datasets is automatically updated. Date constraints can be added in the left column. For these to take effect, the user has to push the button marked search. In the left column it is also possible to specific text elements to search for in the datasets. Again pushing the button marked "Search" is necessary for these to take action. Complex search patterns can be constructed using logical operators identified in the drop down menu with and phrases embedded in quotation marks. Prefixing a phrase with '-' negates the phrase (i.e. should not occur in the results). Searches are case insensitive.
Other elements indicated in the left and right columns are facet searches, i.e. these are keywords that are found in the datasets and all datasets that contain these specific keywords in the appropriate metadata elements are listed together. Further refinement can be done using full text, date or bounding box constraints. Individuals, organisations and data centres involved in generating or curating the datasets are listed in the facets in the right column. The combination of search fields (including facets) is based on a logical "AND" combination of the fields, i.e. all conditions are fulfilled for the results provided.
These datasets contains shotgun metagenomic sequencing taxonomic results, an amplicon sequence variant ID (ASV) table from 16S rRNA gene amplicon sequencing, and metagenomic sequencing nitrogen cycling genes from samples collected in Kongsfjorden and Rijpfjorden, in 2016, during the MOSJ and ICE cruises. Accompanying physical and biogeochemical data may be found in another dataset (https://doi.org/10.21334/npolar.2024.4d4de169). The sampling and analysis of this genetic data was carrried out by colleagues from CIIMAR – Interdisciplinary Centre of Marine and Environmental Research (Portugal) in collaboration with the NPI.
Title: Shotgun metagenomic sequencing taxonomic results relative to Kongsfjorden and Rijpfjorden 2016
File: “metagenomes_taxonomy.csv”
This dataset contains the results of raw read processing of shotgun metagenomics, relative to taxonomy.
Variables:
- Level: taxonomic level;
- Name: name of taxon;
- Taxon_id: unique id for each taxonomic name;
- Reads: number of reads attributed to Taxon_id in specific sample;
- Percentage: percentage of reads in sample.
Bioinformatic processing of shotgun metagenomics sequencing results
Full details are available in Costa et al., 2024 (in review): «The raw shotgun metagenomic reads were trimmed with Trimmomatic v0.36, to remove adapter sequences, short reads (<36 bp), and reads with an average quality score <15 within 4-base windows (Bolger et al., 2014). Taxonomic annotation of the paired reads was carried out using Kaiju v1.9.0, with default parameters (Menzel et al., 2016). De novo assembly of the reads was performed using metaSPAdes - v3.15.3 (Nurk et al., 2017; Prjibelski et al., 2020), with a minimum contig length of 2000 bp.»
Keywords: metagenomics; microbial composition
# References Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114–2120. https://doi.org/10.1093/bioinformatics/btu170 Menzel, P., Ng, K. L., & Krogh, A. (2016). Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature Communications, 7, 1–9. https://doi.org/10.1038/ncomms11257 Nurk, S., Meleshko, D., Korobeynikov, A., & Pevzner, P. A. (2017). MetaSPAdes: A new versatile metagenomic assembler. Genome Research, 27(5), 824–834. https://doi.org/10.1101/gr.213959.116 Prjibelski, A., Antipov, D., Meleshko, D., Lapidus, A., & Korobeynikov, A. (2020). Using SPAdes De Novo Assembler. Current Protocols in Bioinformatics, 70(1). https://doi.org/10.1002/cpbi.102
Title: ASV table from 16S rRNA gene amplicon sequencing
File: ASV_table_16S.csv
This dataset contains the abundance table of ASVs obtained from 16S rRNA gene amplicon sequencing.
Variables:
- ASV - amplicon sequence variant ID;
- Abundance - abundance score of each ASV;
- Kingdom - equivalent to domain level taxonomy of the ASV;
- Phylum - phylum level taxonomy of ASV;
- Class - class level taxonomy of ASV;
- Order - order level taxonomy of ASV;
- Family - family level taxonomy of ASV;
- Genus - genus level taxonomy of ASV;
- Species - species level taxonomy of ASV.
Bioinformatic processing of V4-V5 16S rRNA gene amplicon sequencing results
From Costa et al., 2024 (in review): «Bioinformatic analysis was conducted as described in detail in Semedo et al. (2021). Primers from the raw FastQ files obtained from Illumina MiSeq sequencing were removed using “cutadapt v.1.16”. Files were imported into R (v 4.1.1) and analyzed following the DADA2 R package (v 1.20.0) (Callahan et al., 2016). Sample filtering, trimming (Forward = 240 nt, Reverse = 160 nt), error rates learning, dereplication and Amplicon Sequence Variant (ASV) inference were performed with default settings. Chimeras were removed using the function removeBimeraDenovo with the “consensus” method. Taxonomy was assigned using the DADA2 native implementation of the naive Bayesian classifier (Wang et al., 2007) with the GTDB v202 reference database (Cole et al., 2014; Parks et al., 2018). Taxonomy was filtered by removing the undesirable lineages “Eukaryota”, “Mitochondria”, “Chloroplast” and “unknown” from the dataset.»
References Semedo, M., Lopes, E., Baptista, M. S., Oller-Ruiz, A., Gilabert, J., Tomasino, M. P., & Magalhães, C. (2021). Depth Profile of Nitrifying Archaeal and Bacterial Communities in the Remote Oligotrophic Waters of the North Pacific. Frontiers in Microbiology, 12(3), 1–18. https://doi.org/10.3389/fmicb.2021.624071 Callahan, B. J., McMurdie, P. J., Rosen, M. J., Han, A. W., Johnson, A. J. A., & Holmes, S. P. (2016). DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods, 13(7), 581–583. https://doi.org/10.1038/nmeth.3869 Wang, Q., Garrity, G. M., Tiedje, J. M., & Cole, J. R. (2007). Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Applied and Environmental Microbiology, 73(16), 5261–5267. https://doi.org/10.1128/AEM.00062-07 Cole, J. R., Wang, Q., Fish, J. A., Chai, B., McGarrell, D. M., Sun, Y., Brown, C. T., Porras-Alfaro, A., Kuske, C. R., & Tiedje, J. M. (2014). Ribosomal Database Project: Data and tools for high throughput rRNA analysis. Nucleic Acids Research, 42(D1), 633–642. https://doi.org/10.1093/nar/gkt1244 Parks, D. H., Chuvochina, M., Waite, D. W., Rinke, C., Skarshewski, A., Chaumeil, P.-A., & Hugenholtz, P. (2018). A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology, 36(10), 996–1004. https://doi.org/10.1038/nbt.4229
Title: Shotgun metagenomic sequencing nitrogen cycling genes results relative to Kongsfjorden and Rijpfjorden 2016
File: “genes_meta_complete.csv”
This dataset contains the results of raw read processing of shotgun metagenomics, relative to genes associated with the nitrogen cycle.
Variables:
- originalFile: internal file used in construction of the dataset;
- geneCallersID: unique ID relative to a gene accession;
- source: software source for the gene accession;
- accession: gene obtained from the source;
- gene: common name for gene accession;
- geneFunction: function of the gene;
- contig: name of the contig used to identify gene;
- start: start position of contig;
- stop: stop position of contig;
- contigLength: length of contig;
- mappedReads: number of reads from the contig;
- geneCoverage: coverage of the gene;
- refGeneCoverage: coverage of reference gene;
- normalizedCoverage: normalized gene coverage.
Bioinformatic processing of shotgun metagenomics sequencing results
Full details are available in Costa et al., 2024 (in review): «The raw shotgun metagenomic reads were trimmed with Trimmomatic v0.36, to remove adapter sequences, short reads (<36 bp), and reads with an average quality score <15 within 4-base windows (Bolger et al., 2014). De novo assembly of the reads was performed using metaSPAdes - v3.15.3 (Nurk et al., 2017; Prjibelski et al., 2020), with a minimum contig length of 2000 bp. Functional annotation of genes of the assembled contigs was done using PROKKA v1.14.5 (Seemann, 2014), and gene abundance was estimated by mapping the trimmed paired reads back into the contigs using bowtie2 (Langmead & Salzberg, 2012), with local alignment mode and allowing 1 bp mismatch. The number of reads mapped to the target genes of this study (genes implicated in the nitrogen cycle, see Table S4) was counted, in each of the metagenomes, using SAMtools (Danecek et al., 2021). To quantify the coverage of each of the nitrogen cycle target genes found in the metagenomes, the number of reads mapping to the contig was divided by the length (in bp) of the gene. To take in account the differences of sequencing depth between samples, the gene coverage was then normalized against the mean coverage of three reference single-copy genes: RecA protein (recA), DNA gyrase subunit B (gyrB), and DNA-directed RNA polymerase subunit beta (rpoB), and expressed as “Average Genomic Copy Number”, according to what has been described in detail by Semedo and Song (2023).»
Keywords: metagenomics; nitrogen cycle
# References Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114–2120. https://doi.org/10.1093/bioinformatics/btu170 Menzel, P., Ng, K. L., & Krogh, A. (2016). Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature Communications, 7, 1–9. https://doi.org/10.1038/ncomms11257 Nurk, S., Meleshko, D., Korobeynikov, A., & Pevzner, P. A. (2017). MetaSPAdes: A new versatile metagenomic assembler. Genome Research, 27(5), 824–834. https://doi.org/10.1101/gr.213959.116 Prjibelski, A., Antipov, D., Meleshko, D., Lapidus, A., & Korobeynikov, A. (2020). Using SPAdes De Novo Assembler. Current Protocols in Bioinformatics, 70(1). https://doi.org/10.1002/cpbi.102 Seemann, T. (2014). Prokka: Rapid prokaryotic genome annotation. Bioinformatics, 30(14), 2068–2069. https://doi.org/10.1093/bioinformatics/btu153 Semedo, M., & Song, B. (2023). Sediment metagenomics reveals the impacts of poultry industry wastewater on antibiotic resistance and nitrogen cycling genes in tidal creek ecosystems. Science of the Total Environment, 857(July 2022), 159496. https://doi.org/10.1016/j.scitotenv.2022.159496