Resources

There are many approaches to Bat Data Science, this section references the resources utilized to make these web pages. The references provide the background to the example code and with the web pages reveal how it can be extended or adapted to generate your own reports on bat surveys. All this can be undertaken with the R statistical programming language (R Core Team 2023) through RStudio (Posit team 2022); the materials applied, software and data, are open source.

A prime resource for learning Bat Data Science with R are the the online books1, a comprehensive guide to these books and other R-resources is the Big Book of R by Oscar Baruffa. This page references the online books, packages, websites and other resources with a focus on Bat Data Science:

1 General

R for Data Science (R4DS), now in its 2nd edition, is an excellent overview of data science with R; it introduces the tidyverse a collection of packages providing essential data science tools (Wickham et al. 2019; Wickham 2023); many of these individual packages are referenced below. The tidyverse packages have been widely adopted by R Data Scientists; all packages share an underlying design philosophy, grammar, and data structures.

There are many free and other learning resources online; well structured courses are:

Other references:

Modern Data Science with R by Benjamin S. Baumer, Daniel T. Kaplan, and Nicholas J. Horton; a comprehensive guide to data science with R.

For an understanding of the Data Science versus Statistics (many argue they are the same) see David Donoho paper 50 Years of Data Science (Donoho 2017).

Understanding the link between digital skills and data science see the Royal Statistical Society article.

Why spreadsheets aren’t great for data science listen to Tim Harford’s More or less on BBC sounds. For a litany of mathematical mistakes, many involving spreadsheets, see Matt Parker’s book Humble Pi A comedy of Maths Errors (Parker 2019).

A comparison between R and Excel for data wrangling, conveying the advantages of R, has been undertaken by jumping rivers. Interestingly their blog also has a post on learning Excel as an R user; a good read for Excel users.

2 Tidy Data

Getting data into R from csv and Excel files can be done with readr (Wickham, Hester, and Bryan 2023) and readxl (Wickham and Bryan 2023) packages respectively. See also Data import in R4DS.

Once loaded in the R environment it is stored as a tibble (Müller and Wickham 2023). A Tibble is tabulated data, in R terms a simplified data frame, making working in the tidyverse a little easier.

Data wrangling is made easy with functions from the dplyr (Wickham, François, et al. 2023) and tidyr (Wickham, Vaughan, and Girlich 2023) packages. See also Tidy Data and Data transformation in R4DS.

The manipulation of text was through the stringr (Wickham 2022) package. See also Strings in R4DS.

The philosophy of tidy data is described by (Wickham 2014) (Tierney and Cook 2023).

Data validation is made effective through the validate (van der Loo and de Jonge 2022) package.

3 Meta Data

The computation with dates and times can be accomplished with lubridate (Spinu, Grolemund, and Wickham 2023) package. See also Dates and times in R4DS.

Suntimes can be obtained with the suncalc (Thieurmel and Elmarhraoui 2022) package.

The hms (Müller 2022) package provides a simple class for storing durations or time-of-day values and displaying them in the hh:mm:ss format.

The rnrfa (Vitolo 2022; Vitolo et al. 2016) package has a useful function osg_parse() for converting British National Grid (BNG) to latitude and longitude in the WSGS84 (Google Earth) coordinate system (EPSG code: 4326).

4 Aggregation

Tables have been produced with gt (Iannone et al. 2022), gtExtras (Mock 2022) and the flextable (Gohel and Skintzos 2023) packages.

The broman (Broman 2022) package provided some useful R functions.

The glue (Hester and Bryan 2022) package that allows variables to be passed directly into strings.

5 Visualisation

The graphics have been produced using the R package ggplot2 (Wickham, Chang, et al. 2023).

There are many packages that extend ggplot’s capability:

Online books:

  • ggplot2 by Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen; helps understand how ggplot works, giving the power to tailor any plot specifically.
  • Fundamentals of Data Visualization by Claus O. Wilke; aims to provide a guide to making visualizations that reflect the data, tell a story, and look professional.

See also Graphics for communication in R4DS.

Colour can play a large part in visualisation and colours are easily misused; for an understanding of the issues see the paper Misuse of Colour in Science Communication (Crameri, Shephard, and Heron 2020).

The UK civil servants working in government analysis have produced constructive guidance on data visualisation through charts.

6 Maps

The excellent online book Geocomputation with R by Robin Lovelace, Jakub Nowosad and Jannes Muenchow. It teaches a range of spatial skills, including: reading, writing and manipulating geographic data; making static and interactive maps; applying geocomputation to solve real-world problems; and modelling geographic phenomena.

sf (Pebesma 2018, 2023) provides support for simple features, a standardized way to encode spatial vector data2.

ggspatial (Dunnington 2022) allows spatial data to be plotted with the power of the ggplot2. It also gives access to Open Street Map tiles.

osmdata (Padgham et al. 2023) is an R package for downloading and using data from OpenStreetMap (OSM). Unlike the ggspatial package, which facilitates the download of raster tiles, osmdata provides access to the vector data underlying OSM.

elevatr (Hollister 2022) a package for accessing elevation data from various sources.

terra (Hijmans 2023b) a package of methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data.

tanaka (Giraud 2022) a package the performs the Tanaka method enhancing the representation of topography on a map using shaded contour lines.

metR (Campitelli 2022) a package with several functions and utilities that make R better for handling meteorological data; used here for conour plots.

raster (Hijmans 2023a) a package for reading, writing, manipulating, analyzing and modeling of spatial data.

rnaturalearth (Massicotte and South 2023) A package with Natural Earth data including world and country maps.

7 Statistics

R, specifically base R (R Core Team 2023) is a comprehensive software environment for statistical computing and graphics.

Summary statistics have been produced with the mosaic (Pruim, Kaplan, and Horton 2022) package.

broom (Robinson, Hayes, and Couch 2023) a package that takes the messy output of built-in functions in R and turns them into tidy tibbles; these can be easily tabulated.

dunn.test (Dinno 2017) a package that performs Dunn’s test of multiple comparisons using rank sums.

infer (Bray et al. 2022) a package for statistical inference that coheres with the tidyverse design framework, for example bootstrapping.

vegan (Oksanen et al. 2022) package of ordination methods, diversity analysis and other functions for community and vegetation ecologists.

Online books:

Introduction to Modern Statistics by Mine Çetinkaya-Rundel and Johanna Hardin; an contemporary guide to statistical thinking and methods.

Statistical Inference via Data Science by Chester Ismay and Albert Y. Kim; Statistical Inference via Data Science: A ModernDive into R and the Tidyverse.

Modern Statistics with R by Måns Thulin; covers wrangling and exploring data to inference and predictive modelling.

Other references:

The Office for Statistics Regulation the independent regulatory arm of the UK Statistics Authority has produced two key reference documents that have relevance for data scientists who publish in the public domain3

8 Reporting

Reports can be produced through literate programming (Knuth 1984) with R Markdown (Allaire et al. 2023; Xie, Allaire, and Grolemund 2018; Xie, Dervieux, and Riederer 2020) and Quarto®; to use Quarto with R, the rmarkdown R package should be installed; the rmarkdown package will also install the knitr package (Xie 2014, 2015, 2023) to ensure documents render containing R code.

Rending reports into Microsoft Word or PowerPoint can be greatly enhanced by:

officedown (Gohel and Ross 2023) a package facilitating the formatting of Microsoft Word documents produced by R Markdown.

officer (Gohel 2023) a package that lets R users manipulate Word .docx and PowerPoint *.pptx documents.

Online books:

officeverse by David Gohel; reporting from R with the packages officer, officedown}, flextable.

R Markdown Cookbook by Yihui Xie, Christophe Dervieux, Emily Riederer; a book designed to provide a range of examples on how to extend the functionality of R Markdown documents.

R Markdown: The Definitive Guide by Yihui Xie, J. J. Allaire, Garrett Grolemund; details the large number of tasks that you could do with R Markdown.

A special mention should go to John MacFarlane who created Pandoc a package to convert Markdown/RMarkdown documents (and many other types of documents) to a large variety of output formats.

Online videos:

R Markdown Advanced Tips to Become a Better Data Scientist… | With Tom Mock

Welcome to Quarto Workshop! | Led by Tom Mock, RStudio

9 Interactive

leaflet (Cheng, Karambelkar, and Xie 2023) one of the most popular open-source JavaScript libraries for interactive maps.

plotly (Sievert et al. 2022) R graphing library that makes interactive, publication-quality graphs.

DT (Xie, Cheng, and Tan 2023) DataTables displaying R matrices or data frames as interactive HTML tables that support filtering, pagination, and sorting.

reactable (Lin 2023) nteractive data tables for R.

See also htmlwidgets for R

References

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2023. Rmarkdown: Dynamic Documents for r. https://CRAN.R-project.org/package=rmarkdown.
Arnold, Jeffrey B. 2021. Ggthemes: Extra Themes, Scales and Geoms for Ggplot2. https://github.com/jrnold/ggthemes.
Bray, Andrew, Chester Ismay, Evgeni Chasnovski, Simon Couch, Ben Baumer, and Mine Cetinkaya-Rundel. 2022. Infer: Tidy Statistical Inference. https://CRAN.R-project.org/package=infer.
Broman, Karl W. 2022. Broman: Karl Broman’s r Code. https://github.com/kbroman/broman.
Campitelli, Elio. 2022. metR: Tools for Easier Analysis of Meteorological Fields. https://github.com/eliocamp/metR.
Cheng, Joe, Bhaskar Karambelkar, and Yihui Xie. 2023. Leaflet: Create Interactive Web Maps with the JavaScript Leaflet Library. https://rstudio.github.io/leaflet/.
Crameri, Fabio, Grace E Shephard, and Philip J Heron. 2020. “The Misuse of Colour in Science Communication.” Nature Communications 11 (5444). https://doi.org/10.1038/s41467-020-19160-7.
Dinno, Alexis. 2017. Dunn.test: Dunn’s Test of Multiple Comparisons Using Rank Sums. https://CRAN.R-project.org/package=dunn.test.
Donoho, David. 2017. “50 Years of Data Science.” Journal of Computational and Graphical Statistics 26 (4): 745–66. https://doi.org/10.1080/10618600.2017.1384734.
Dunnington, Dewey. 2022. Ggspatial: Spatial Data Framework for Ggplot2. https://CRAN.R-project.org/package=ggspatial.
Giraud, Timothée. 2022. Tanaka: Design Shaded Contour Lines (or Tanaka) Maps. https://github.com/riatelab/tanaka/.
Gohel, David. 2023. Officer: Manipulation of Microsoft Word and PowerPoint Documents. https://CRAN.R-project.org/package=officer.
Gohel, David, and Noam Ross. 2023. Officedown: Enhanced r Markdown Format for Word and PowerPoint. https://CRAN.R-project.org/package=officedown.
Gohel, David, and Panagiotis Skintzos. 2023. Flextable: Functions for Tabular Reporting. https://CRAN.R-project.org/package=flextable.
Hester, Jim, and Jennifer Bryan. 2022. Glue: Interpreted String Literals. https://CRAN.R-project.org/package=glue.
Hijmans, Robert J. 2023a. Raster: Geographic Data Analysis and Modeling. https://rspatial.org/raster.
———. 2023b. Terra: Spatial Data Analysis. https://rspatial.org/terra/.
Hollister, Jeffrey. 2022. Elevatr: Access Elevation Data from Various APIs. https://github.com/jhollist/elevatr/.
Iannone, Richard, Joe Cheng, Barret Schloerke, Ellis Hughes, and JooYoung Seo. 2022. Gt: Easily Create Presentation-Ready Display Tables. https://CRAN.R-project.org/package=gt.
Knuth, Donald E. 1984. “Literate Programming.” Comput. J. 27 (2): 97–111. https://doi.org/10.1093/comjnl/27.2.97.
Lin, Greg. 2023. Reactable: Interactive Data Tables for r. https://CRAN.R-project.org/package=reactable.
Massicotte, Philippe, and Andy South. 2023. Rnaturalearth: World Map Data from Natural Earth. https://docs.ropensci.org/rnaturalearth/ https://github.com/ropensci/rnaturalearth.
Mock, Thomas. 2022. gtExtras: Extending Gt for Beautiful HTML Tables. https://CRAN.R-project.org/package=gtExtras.
Müller, Kirill. 2022. Hms: Pretty Time of Day. https://CRAN.R-project.org/package=hms.
Müller, Kirill, and Hadley Wickham. 2023. Tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.
Oksanen, Jari, Gavin L. Simpson, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre, Peter R. Minchin, R. B. O’Hara, et al. 2022. Vegan: Community Ecology Package. https://github.com/vegandevs/vegan.
Padgham, Mark, Bob Rudis, Robin Lovelace, Maëlle Salmon, and Joan Maspons. 2023. Osmdata: Import OpenStreetMap Data as Simple Features or Spatial Objects. https://CRAN.R-project.org/package=osmdata.
Parker, Matt. 2019. Humble Pi a Comedy of Maths Errors. Penguin.
Pebesma, Edzer. 2018. Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.
———. 2023. Sf: Simple Features for r. https://CRAN.R-project.org/package=sf.
Posit team. 2022. RStudio: Integrated Development Environment for r. Boston, MA: Posit Software, PBC. http://www.posit.co/.
Pruim, Randall, Daniel T. Kaplan, and Nicholas J. Horton. 2022. Mosaic: Project MOSAIC Statistics and Mathematics Teaching Utilities. https://CRAN.R-project.org/package=mosaic.
R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Robinson, David, Alex Hayes, and Simon Couch. 2023. Broom: Convert Statistical Objects into Tidy Tibbles. https://CRAN.R-project.org/package=broom.
Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2022. Plotly: Create Interactive Web Graphics via Plotly.js. https://CRAN.R-project.org/package=plotly.
Slowikowski, Kamil. 2023. Ggrepel: Automatically Position Non-Overlapping Text Labels with Ggplot2. https://github.com/slowkow/ggrepel.
Spinu, Vitalie, Garrett Grolemund, and Hadley Wickham. 2023. Lubridate: Make Dealing with Dates a Little Easier. https://CRAN.R-project.org/package=lubridate.
Thieurmel, Benoit, and Achraf Elmarhraoui. 2022. Suncalc: Compute Sun Position, Sunlight Phases, Moon Position and Lunar Phase. https://github.com/datastorm-open/suncalc.
Tierney, Nicholas, and Dianne Cook. 2023. “Expanding Tidy Data Principles to Facilitate Missing Data Exploration, Visualization and Assessment of Imputations.” Journal of Statistical Software 105 (1): 1–31. https://doi.org/10.18637/jss.v105.i07.
van der Loo, Mark, and Edwin de Jonge. 2022. Validate: Data Validation Infrastructure. https://github.com/data-cleaning/validate.
Vitolo, Claudia. 2022. Rnrfa: UK National River Flow Archive Data from r. https://ilapros.github.io/rnrfa/.
Vitolo, Claudia, Fry, Matthew, Buytaert, and Wouter. 2016. “Rnrfa: An r Package to Retrieve, Filter and Visualize Data from the UK National River Flow Archive.” The R Journal 8 (2): 102–16. https://journal.r-project.org/archive/2016/RJ-2016-036/index.html.
Wickham, Hadley. 2014. “Tidy Data.” The Journal of Statistical Software 59. http://www.jstatsoft.org/v59/i10/.
———. 2022. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.
———. 2023. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, and Jennifer Bryan. 2023. Readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2023. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2023. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Wickham, Hadley, and Dana Seidel. 2022. Scales: Scale Functions for Visualization. https://CRAN.R-project.org/package=scales.
Wickham, Hadley, Davis Vaughan, and Maximilian Girlich. 2023. Tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2023. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.
Xie, Yihui, Joe Cheng, and Xianying Tan. 2023. DT: A Wrapper of the JavaScript Library DataTables. https://github.com/rstudio/DT.
Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.

Footnotes

  1. Nearly all the online reference books on R are created within the R environment; most commonly with RMarkdown or Quarto.↩︎

  2. see https://en.wikipedia.org/wiki/Simple_Features↩︎

  3. In support of a planning application for example.↩︎