Chapter 2 bddwc

2.1 bddwc & Darwin Core

bdDwC is a R package that supplies an interactive Shiny app and a set of functions for standardizing field names in compliance to the Darwin Core (DwC) format. bdDwC is a key element in the bdverse– a collection of tools, that form a general framework for facilitating biodiversity science in R.

bdDwC in the bdverse –> To replace with new bddwc overview <–

What is the Darwin Core standard?

Darwin Core (DwC) is a global standard for publishing biodiversity data, whose goal is to facilitate the sharing of biodiversity information, by providing identifiers, labels, and definitions (Wieczorek et al. 2012). DwC was established as an evolving community-developed standard, by the Biodiversity Information Standards Working Group (www.tdwg.org). DwC is a library of definitions of common biodiversity data terms, each of which represents a field within the database. There are around 200 such fields (not including DwC extensions); a full set of the DwC terms with their descriptions is available in the Quick Reference Guide (http://rs.tdwg.org/dwc/terms). For more information see Learn more section.

Why it’s important to “Darwinize” a dataset

Running the Darwinizer enables you to standardize many field names in your dataset – and that allows the bdverse to handle data from various biodiversity portals seamlessly, and lets you enjoy all of bdvers features, regardless of publishers variation in field names.

2.2 Shiny app

2.2.2 App overview

bdDwC App Overview –> To replace with the new app <– In the first screen, you’ll need to load your biodiversity data; choose dictionary and run the Darwinizer. There are two options, form a file on your computer, of fetch from a web based data provider.

2.2.3 Data upload

2.2.3.1 From a local file

A CSV file or a Darwin Core Archive (DwC-A) zip file can be uploaded.

Data upload from a local file –> To replace with the new app <–

2.2.4 From an online database

Also, data can be retrieved directly from various online biodiversity databases. You need only to:

  • Select the database
  • Specify the desired scientific name.
  • Specify the number of records (upper limit of 50,000).
  • Check the box if records must have coordinates.
  • Wait for data to be downloaded.

Data upload from online biodiversity databases –> To replace with the new app <–

2.3 Dictionaries

A dictionary is a key component when Darwinizing a dataset. It’s basically a lookup table that lists a possible variation of field name and it corresponding DwC name.

2.3.1 The Darwin Cloud dictionary

The Darwin Cloud dictionary (Wieczorek et al. 2017), is a lookup table that accumulates different variations in DwC field names from different publishers. This valuable and critical dictionary was created and is maintained by the Kurator project (http://kurator.acis.ufl.edu/kurator-web/), which provides workflow tools for data quality improvement of biodiversity data, via a user-friendly web interface. The development of bdDwC was inspired by Kurator’s own Darwinizer.

Updating the Darwin Cloud dictionary

It’s recommended to update the Darwin Cloud dictionary file. This can be done easily by clicking the Update DC button.

Updating the Darwin Cloud dictionary –> To replace with the new app <–

2.3.2 Custom dictionary

It’s also possible to add your own dictionary by creating a CSV file with two columns, one for the Field Names and one for the Standard Names. After uploading the custom disctionary, we need to specify which field denotes the ‘User fierld names’ and which is the ‘Standard (DwC) field names’.

Uploading your own dictionary –> To replace with the new app <–

2.4 Darwinizing your dataset

Once a dataset is uploaded, the ‘Submit to Darwinizer’ button is activated, Clicking it will begin the interactive ‘Darwinize the dataset’ process.

Submit to Darwinizer button –> To replace with the new app <–

2.4.1 Darwinizer results

Results page overwiew

Darwinizer results –> To replace with the new app <–

Manually renaming field names can be done very easily, just choose the two corresponding fields and click the Rename button.

Manually renaming fields

Manually renaming fields

–> To replace with the new app <–

Hovering over a DwC standard name will display its description.

2.4.2 Download your Darwinized data

2.4.3 Closing the app

Just close the app browser tab, and the R session will be terminated. To reopen it run in the R Console runDwC().

2.5 Command line operations

–> To synchronize functions with the new version!!!! <–

2.6 Load package

Load the bdDwC package

2.7 Darwinizing a dataset

bdDwC contains Indian Reptile dataset bdDwC:::dataReptiles.

The function to Darwinize a dataset isdarwinizeNames (replace bdDwC:::dataReptiles with wanted dataset):

You can replace bdDwC:::dataReptiles with your dataset

Rename your dataset field names to Darwinized names using renameUserData:

2.8 Updating the Darwin Cloud dictionary

To get newest version of Darwin Cloud Data run:

which will download data from the remote repository and extract field and standard names.

2.9 A case study

Case study title


[ TBA ]

References

Wieczorek, John, David Bloom, Robert Guralnick, Stan Blum, Markus Döring, Renato Giovanni, Tim Robertson, and David Vieglais. 2012. “Darwin Core: an evolving community-developed biodiversity data standard.” PloS One 7 (1): e29715.

Wieczorek, John, Paul J. Morris, James Hanken, David B.Lowery, Bertram Ludäscher, James Macklin, Timothy McPhillips, Robert A.Morris, and Qian Zhang. 2017. “Darwin Cloud: Mapping Real-World Data to Darwin Core.” Biodiversity Information Science and Standards 1: e20486. https://doi.org/10.3897/tdwgproceedings.1.20486.