bdDwC: user level standardization of biodiversity data

bdDwC in the bdverse

bdDwC is an R package that provides an interactive Shiny app and a set of functions for standardizing field names in compliance with the Darwin Core (DwC) format. Running bdDwC enables you to carefully standardize all field names in your dataset – which allows the bdverse to handle data from various biodiversity portals seamlessly, and lets you enjoy all of its features, regardless of publishers’ variation in field names. The development of bdDwC was inspired by the Kurator project ‘Darwinizer tool’. bdDwC utilizes Darwin Cloud dictionary (Wieczorek et al. 2017), which is a lookup table that accumulates different variations in DwC field names, maintained by the Kurator team. It’s also possible to add your own dictionary by creating a CSV file with two columns, one for the Field Names and one for the Standard Names.

Architecture overview

Major challenges ahead

Establishing and maintaining a robust workflow for feeding the Darwin Cloud - to address this issue, we’ll consult key members of the biodiversity informatics community.
“Darwinizing” a dataset is a basic task for all bdverse tools and workflows, thus developing an intensive QA shell is in order.

Future plans

Enhance the UI.
Explore the idea of creating and maintaining a specific dictionary for each data publisher.
Experiment with fuzzy matching techniques, to generate suggestions for matching fields.
Explore techniques for enforcing recommended DwC vocabulary.

The bdverse

An ensemble of R biodiversity packages