Chapter 3 bdchecks
3.1 What is bdchecks
bdchecks
supplies a Shiny app and a set of functions to perform and manage various data checks for biodiversity data.
–> To replace with new bdchecks overview <–
What are biodiversity data checks?
Data checks can include format checks, completeness checks, reasonableness checks, limit checks, etc. These processes usually result in flagging, documenting, and subsequent correcting or eliminating of suspect records. The checks must be specifically tailored around the structure of the data at hand, in our case, the Darwin Core standard. Ideally, a data check needs to hold its functionality and relevant metadata.
What bdchecks
can do for you?
bdchecks
offers various features for various R users:
- Using the Shiny app inexperienced R users can easily perform all data check and can easily filter the data accordingly. See [The shiny app] section.
- Experienced R users can perform all data checks by utilizing few R functions form the command line or within an R script. See Command line operations section.
- Advanced R users can even edit, add and manage their own collection of data checks, quite easily so. See Data checks infrastructure section.
3.2 Shiny app
3.2.1 Data upload and download
As inbddwc
: data upload, data download.
3.2.2 Choose data checks
–> To replace with the new app <–
–> To replace with the new app <–
3.2.3 Checks results and data filtering
Overwiew

Results page overview
–> To replace with the new app <–
Filtering the data based on the results

Choose specific results to filter out
–> To replace with the new app <–

Filter the data and download your filtered data
–> To replace with the new app <–
3.2.4 Closing the app
Just close the app browser tab, and the R session will be terminated. To reopen it run in the R Console runbdchecks()
.
3.3 Command line operations
–> To synchronize functions with the new version!!!! <–
3.5 Perform data checks
bdchecks
contains a dataset on bats named dataBats
.
To perform all data checks use performDataCheck
:
replace bdchecks::dataBats
with your own dataset name.
3.6 Review performed checks
See which data checks were performed:
Review data checks result (% of records that passed, failed or have missing data)
3.6.1 Filtering your data
[ TBA ]
3.7 Data checks infrastructure
[Write short explanation + give a link to the relevant section in the developer-guide]
3.7.1 Data checks YAML file
[Move section to dev-guide; add Shiny management app; explain unit testing framework + synchronization system]
The YMAL file holds the code and metadata of all data checks. The checks are derived from a core suite of tests and assertions being developed by TDWG’s Biodiversity Data Quality Task Group 2 ( Data Quality Tests and Assertions). More information and links can be found in the Learn more section.
3.7.2 Data check example
DC_b23110e7-1be7-444a-a677-cdee0cf4330c:
name: countryMismatch
meta:
Description:
Main: Check if given country match given country code.
InputQuestion: Does country and country code match?
Example:
Fail: Country name (dwc:country) and ISO country code (dwc:countryCode) do
not match
Pass: Country name (dwc:country) and ISO country code (dwc:countryCode) match
InputFail: country=Australia, countryCode=4
InputPass: country=Australia, countryCode=AU
OutputFail: Failed
OutputPass: Passed
Resolution:
Record: SingleRecord
Term: MultiTerm
DarwinCoreClass: Location
Keywords: location,iso,country
guid: b23110e7-1be7-444a-a677-cdee0cf4330c
Flags:
Severity: Warning
Warning: Inconsistent
Output: Validation
Dimension: Consistency
Pseudocode: |
get.Country($countryCode) == $country
Source:
Reference:
CreatedBy: Povilas Gibas
MaintainedBy: Povilas Gibas
CreationDate: 2018-06-27
ModificationDate: 2018-06-27
ModificationHist:
Input:
Target: country,countryCode
Dependency:
DependencyType: Internal
DataChecks:
Rpackages: rgbif
Data: isocodes$name,isocodes$code
Functionality: |
FUNC <- function() {
result <- sapply(seq_along(TARGET1), function(i) {
if (is.na(TARGET1[i]) | is.na(TARGET2[i])) {
NA
} else {
which(DEPEND1 == TARGET1[i]) == which(DEPEND2 == TARGET2[i])
}
})
result <- unlist(result)
return(result)
}
To replace with the new structure!!!
3.7.3 Manage your own data checks
To encourage or not to encourage mmm…
After adding/ removing/ editing the YAML file, you can load data checks into R using getDC()
function.
You can also export data checks from your YAML file to .rda and roxygen2 comments.