Chapter 3 bdchecks

3.1 What is bdchecks

bdchecks supplies a Shiny app and a set of functions to perform and manage various data checks for biodiversity data.

bdchecks in the bdverse –> To replace with new bdchecks overview <–

What are biodiversity data checks?

Data checks can include format checks, completeness checks, reasonableness checks, limit checks, etc. These processes usually result in flagging, documenting, and subsequent correcting or eliminating of suspect records. The checks must be specifically tailored around the structure of the data at hand, in our case, the Darwin Core standard. Ideally, a data check needs to hold its functionality and relevant metadata.

What bdchecks can do for you?

bdchecks offers various features for various R users:

  • Using the Shiny app inexperienced R users can easily perform all data check and can easily filter the data accordingly. See [The shiny app] section.
  • Experienced R users can perform all data checks by utilizing few R functions form the command line or within an R script. See Command line operations section.
  • Advanced R users can even edit, add and manage their own collection of data checks, quite easily so. See Data checks infrastructure section.

3.2 Shiny app

3.2.1 Data upload and download

As inbddwc: data upload, data download.

3.2.2 Choose data checks

Choose a data check by checking its box –> To replace with the new app <–

Hovering over a data check name shows a short description –> To replace with the new app <–

3.2.3 Checks results and data filtering

Overwiew

Results page overview

Results page overview

–> To replace with the new app <–

Filtering the data based on the results

Choose specific results to filter out

Choose specific results to filter out

–> To replace with the new app <–

Filter the data and download your filtered data

Filter the data and download your filtered data

–> To replace with the new app <–

3.2.4 Closing the app

Just close the app browser tab, and the R session will be terminated. To reopen it run in the R Console runbdchecks().

3.3 Command line operations

–> To synchronize functions with the new version!!!! <–

3.4 Load package

Load the bdchecks package

3.5 Perform data checks

bdchecks contains a dataset on bats named dataBats.

To perform all data checks use performDataCheck:

replace bdchecks::dataBats with your own dataset name.

3.6 Review performed checks

See which data checks were performed:

Review data checks result (% of records that passed, failed or have missing data)

3.6.1 Filtering your data

[ TBA ]

3.7 Data checks infrastructure

[Write short explanation + give a link to the relevant section in the developer-guide]

3.7.1 Data checks YAML file

[Move section to dev-guide; add Shiny management app; explain unit testing framework + synchronization system]

The YMAL file holds the code and metadata of all data checks. The checks are derived from a core suite of tests and assertions being developed by TDWG’s Biodiversity Data Quality Task Group 2 ( Data Quality Tests and Assertions). More information and links can be found in the Learn more section.

3.7.2 Data check example

DC_b23110e7-1be7-444a-a677-cdee0cf4330c:
  name: countryMismatch
  meta:
    Description:
      Main: Check if given country match given country code.
      InputQuestion: Does country and country code match?
      Example:
        Fail: Country name (dwc:country) and ISO country code (dwc:countryCode) do
          not match
        Pass: Country name (dwc:country) and ISO country code (dwc:countryCode) match
        InputFail: country=Australia, countryCode=4
        InputPass: country=Australia, countryCode=AU
        OutputFail: Failed
        OutputPass: Passed
      Resolution:
        Record: SingleRecord
        Term: MultiTerm
      DarwinCoreClass: Location
      Keywords: location,iso,country
      guid: b23110e7-1be7-444a-a677-cdee0cf4330c
    Flags:
      Severity: Warning
      Warning: Inconsistent
      Output: Validation
      Dimension: Consistency
    Pseudocode: |
      get.Country($countryCode) == $country
    Source:
      Reference:
      CreatedBy: Povilas Gibas
      MaintainedBy: Povilas Gibas
      CreationDate: 2018-06-27
      ModificationDate: 2018-06-27
      ModificationHist:
  Input:
    Target: country,countryCode
    Dependency:
      DependencyType: Internal
      DataChecks:
      Rpackages: rgbif 
      Data: isocodes$name,isocodes$code
  Functionality: |
      FUNC <- function() {
          result <- sapply(seq_along(TARGET1), function(i) {
              if (is.na(TARGET1[i]) | is.na(TARGET2[i])) {
                  NA
              } else {
                  which(DEPEND1 == TARGET1[i]) == which(DEPEND2 == TARGET2[i])
              }
          })
          result <- unlist(result)
          return(result)
       }

To replace with the new structure!!!

3.7.3 Manage your own data checks

To encourage or not to encourage mmm…

After adding/ removing/ editing the YAML file, you can load data checks into R using getDC() function.

You can also export data checks from your YAML file to .rda and roxygen2 comments.