Bats face many challenges, making the most of data helps. © Steve Markham

Web site last updated 07 March 2024
An extended book version of these pages is available here.

Introduction

These web pages complement the data science chapter in the 4th Edition of BCT’s Bat Survey Guidelines, and hopefully encourage ecologists to make the most of their bat survey data. They also demonstrate literate programming with Quarto®1 and R Markdown2 which can vastly improve workflow3 (welcome to the world beyond Excel).

The term data science is used, as this covers data collection, management, processing, analysis, visualisation, interpretation, reporting and reproducibility. Statisticians would state this is what they have always done in statistics! There is no doubt the phenomenon of data science is growing, most major universities now offer a degree course in the subject and together with the increasing power of computer algorithms; data science is more than just a rebranding of statistics (Donoho 2017).

The data science is applied through literate programming outlined in Figure 1. This enables efficient reporting of bat data4 from a simple table, such as a count of bats, to the output and interpretation of machine learning in a fully formatted report; plus everything in-between, all accomplished through open source R5 (R Core Team 2023) and RStudio (Posit team 2022). The beauty of literate programming is reproducibility; an essential tenet of all scientific study, in the commercial and legal world it makes for defensible reporting. The Reporting page has literate programming examples for a Word report and PowerPoint presentation.

Figure 1: Literate Programming

Much is said about the digital skills gap 6; in a small way, these data science pages aim to improve digital skills by demonstrating modern data science methods7. For a balanced understanding of the link between digital skills and data science see the Royal Statistical Society article

You may ask what’s wrong with the spreadsheet for data science? On a practical level spreadsheets are hard to maintain, find errors in or see there was an error in the first place, they are poor at handling dates8 and difficult to share with others. For spreadsheet blunders listen to Tim Harford’s More or less on BBC sounds9; for a litany of mathematical mistakes, many involving spreadsheets, see Matt Parker’s book Humble Pi A comedy of Maths Errors (Parker 2019). On a positive note spreadsheets are handy and easy to use for a few lines of data.

Getting Started

To help ecologists on their data science journey, all the code making the graphs and analysis in these web pages is free to copy and use; just click on Show the code, copy to the clipboard 10, paste into the R environment and run. If new to R and RStudio see Section.

A Show the code is given below, the code produces Figure 2. The code copied to the clipboard is designed to run as a standalone chunk (or R script)11; the code loads the required R libraries and data.

Code
### Libraries Used
library(tidyverse) # Data Science packages - see https://www.tidyverse.org/
library(treemapify) # extension to ggplot for plotting treemaps -
# see https://cran.r-project.org/web/packages/treemapify/vignettes/introduction-to-treemapify.html
library(ggthemes) # for colour pallet "Tableau 10"

# Install devtools if not installed
# devtools is used to install the iBats package from GitHub
if (!require(devtools)) {
  install.packages("devtools")
}

# If iBats not installed load from Github
if (!require(iBats)) {
  devtools::install_github("Nattereri/iBats")
}
library(iBats)
###

# Add data and time information to the iBats statics bat survey data set using the iBats::date_time_info
statics_plus <- iBats::date_time_info(statics)

graph_data <- statics_plus %>%
  group_by(Species, Month) %>%
  tally()

ggplot(graph_data, aes(area = n, fill = Month, label = Species, subgroup = Month)) +
  scale_fill_tableau(palette = "Tableau 10") + #
  geom_treemap(colour = "white", size = 2, alpha = 0.9) +
  geom_treemap_subgroup_border(colour = "black", size = 5, alpha = 0.9) +
  geom_treemap_subgroup_text(place = "centre", grow = T, alpha = 0.9, colour = "grey20", min.size = 0) +
  geom_treemap_text(colour = "grey90", place = "topleft", fontface = "italic", reflow = T, min.size = 0, alpha = 0.9) +
  theme_bw() +
  theme(legend.position = "none") # No legend

Figure 2: Example Graph: Monthly Bat Activity from the statics data set in the iBats Package
Coding Tip

Rather than write code from scratch adapt working code to your own purposes.

Literate programming facilitates the use of coding languages other than R such as Python12, and Julia13. Computer languages can be mixed in the same literate programming document; for example with a chuck of R code doing the data manipulation and another chunk of Python code performing the machine learning. Coding languages applied to data science are developing rapidly in terms of their ability, speed of execution, and user friendliness14; literate programming provides the framework for ecologists to keep their data science skills moving forward.

Evidence Led Reporting

Literate programming assists data science and reproducibility, promoting evidence led reporting and decision making. Reports are often produced for regulatory bodies, central government or local authorities, these organisations have mandatory strategies for the use of science, evidence and evaluation in there advice and actions, and the legality of their decisions(AUTOKEY?).

Install R, RStudio and Packages

  1. Download and install the latest version of R https://cran.r-project.org/bin/windows/base/. Download the version for your operating system; R can be downloaded for Windows, Mac & Linux.
  2. It is recommended R is used through the RStudio IDE. Download and install the latest version of RStudio from their web page https://www.rstudio.com/products/rstudio/#Desktop. Download the free desktop version.

Install the iBats Package from GitHub

The iBats package contains example data and functions that help with the Data Science of bat survey results. To install this package use the code below in the RStudio Console; one line at a time. The package is installed from GitHub.

Code
install.packages("devtools")

devtools::install_github("Nattereri/iBats")

Acknowledgements

Free and Open Source Software (FOSS) constitutes 70-90% of any modern software solution15. R and RStudio are open source software that have made data science more open, intuitive, accessible, and collaborative. As a Public Good16 the value of FOSS is yet to be fully recognised. FOSS is provided by a large community, without whom these web pages would not be written; some of this community are acknowledged as individuals in the references section of the Resources page.

References

Donoho, David. 2017. “50 Years of Data Science.” Journal of Computational and Graphical Statistics 26 (4): 745–66. https://doi.org/10.1080/10618600.2017.1384734.
Parker, Matt. 2019. Humble Pi a Comedy of Maths Errors. Penguin.
Posit team. 2022. RStudio: Integrated Development Environment for r. Boston, MA: Posit Software, PBC. http://www.posit.co/.
R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Footnotes

  1. Quarto® is an open-source scientific and technical publishing system: see https://quarto.org/.↩︎

  2. R Markdown see https://rmarkdown.rstudio.com/.↩︎

  3. For example literate programming eliminates the countless copy and paste of a traditional reporting workflow.↩︎

  4. as exported from sound analysis software↩︎

  5. Windows https://cran.r-project.org/bin/windows/base/ Mac Intel & ARM https://cran.r-project.org/bin/macosx/↩︎

  6. The UK government has quantified the the UK Data Skills Gap see https://www.gov.uk/government/publications/quantifying-the-uk-data-skills-gap/quantifying-the-uk-data-skills-gap-full-report.↩︎

  7. The data skills gap is relevant to professional bodies, such as the Chartered Institute of Ecology and Environmental Management (CIEEM) a leading institute for professional ecologists; their competency framework, which members are required to fulfill, makes no mention of statistics or data science.↩︎

  8. Excel will convert a data entry into a date even if it is not, e.g. an entry of “1/1” or “1-1” would return “01-Jan”!↩︎

  9. More or Less (Spreadsheet disasters) was released by the World Service on 11 Feb 2023 and is available for over a year.↩︎

  10. Clip board icon is in the top right hand corner of the code window.↩︎

  11. Many R script’s are required in applying literate programming to bat data science; these are best organised through Quarto or R Markdown documents where the R scripts form code chunks.↩︎

  12. https://www.python.org/↩︎

  13. https://julialang.org/↩︎

  14. Julia has a language syntax similar to Python, runs fast, has a statistical library like R and linear programming skills similar to MATLAB.↩︎

  15. https://www.linuxfoundation.org/blog/blog/a-summary-of-census-ii-open-source-software-application-libraries-the-world-depends-on↩︎

  16. A commodity or service that is provided without profit to all members of a society, either by the government or by a private individual or organization.↩︎