Aggregation

Data aggregation is one of the pillars of data analysis, the process where tidy data (plus meta data) is gathered or grouped and then expressed in summary form. Bat survey data tends to be aggregated in two ways:

  1. over a given time period, or
  2. spatially for locations or a factor related to a location (e.g. woodland)

The bat survey data statics in the iBats package has 6,930 bat observations; a random selection of 5 rows is shown in Table 1.

Code
statics %>% # statics is a tidy data set from the iBats package
  sample_n(5) %>% 
  arrange(DateTime) %>% 
  # Table made with gt
  gt() %>% 
  tab_style(
    style = list(
      cell_fill(color = "black"),
      cell_text(color = "white", weight = "bold")
      ),
    locations = cells_column_labels(
      columns = c(everything())
    )
  ) %>% 
  # Make bat scientific name italic
  tab_style(
    style = list(
      cell_text(style = "italic")
      ),
    locations = cells_body(
      columns = c(Species)
    )
  ) %>% 
  tab_options(data_row.padding = px(2)) %>% 
  cols_align(
    align = "left",
    columns = DateTime)
Table 1:

Statics Bat Survey Data (5 random selected rows)

Description DateTime Species Longitude Latitude
Static 3 2016-05-27 22:26:38 Pipistrellus pipistrellus -3.593018 50.33101
Static 4 2016-07-27 02:36:50 Pipistrellus pipistrellus -3.591748 50.33133
Static 4 2016-07-28 04:50:16 Pipistrellus pipistrellus -3.591908 50.33141
Static 4 2016-07-31 22:45:12 Pipistrellus pipistrellus -3.591808 50.33126
Static 4 2016-08-24 21:43:37 Pipistrellus pipistrellus -3.591768 50.33133

1 Count of Bats

The simplest form of aggregation is a count of bats1; as shown in Table 2

Code
# Libraries (Packages) used
library(tidyverse)
library(iBats)
library(gt)

statics %>%
  group_by(Species) %>% 
  count() %>% 
  #arrange descending
  arrange(desc(n)) %>% 
  # rename n as count
  rename(`Bat Species` = Species, Count = n) %>% 
  # so table is produced with individual species on one row
  ungroup() %>% 
  # Table made gt()
  gt() %>% 
  tab_style(
    style = list(
      cell_fill(color = "black"),
      cell_text(color = "white", weight = "bold")
      ),
    locations = cells_column_labels(
      columns = c(everything())
    )
  ) %>% 
  # Make bat scientific name italic
  tab_style(
    style = list(
      cell_text(style = "italic")
      ),
    locations = cells_body(
      columns = c(`Bat Species`)
    )
  ) %>% 
  tab_options(data_row.padding = px(2)) 
Table 2:

A Count by Species

Bat Species Count
Pipistrellus pipistrellus 4972
Myotis spp. 435
Barbastella barbastellus 419
Nyctalus noctula 315
Pipistrellus spp. 283
Rhinolophus ferrumequinum 160
Plecotus spp. 136
Pipistrellus pygmaeus 86
Rhinolophus hipposideros 73
Pipistrellus nathusii 32
Eptesicus serotinus 12
Nyctalus leisleri 3
Myotis nattereri 2
Nyctalus spp. 2

More informative tables can be made by adding meta data such as the Month and Year of the observations. This is easy done on a tidy data conforming to the minimal data requirement with the date_time_info function in the iBats package. Table 3 shows the monthly count (bat passes) of Annex II species2; .

Code
# Libraries (Packages) used
library(tidyverse)
library(iBats)
library(flextable)
library(officer)

# Add data and time information to the statics data using the iBats::date_time_info
statics_plus <- iBats::date_time_info(statics)

# Vector of month names used in the factor function
month_levels_full <- c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")

# re-order months column with rev so months run top to bottom of the table
statics_plus <- statics_plus %>%
   mutate(MonthFull = factor(MonthFull, levels = rev(month_levels_full), ordered = T))

AnnexII <- c("Barbastella barbastellus", "Rhinolophus ferrumequinum", "Rhinolophus hipposideros")

table_border <- fp_border(color = "black", width = 1) # from library(officer)

statics_plus %>%
  filter(Species %in% AnnexII) %>%
  group_by(MonthFull, Species) %>%
  count() %>%
  # arrange descending
  arrange(desc(MonthFull)) %>%
  # rename n as count
  rename(Month = MonthFull, `Bat Species` = Species, Count = n) %>%
  # Table made with flextable
  flextable(col_keys = colnames(.)) %>%
  italic(j = 2, italic = TRUE, part = "body") %>% 
  fontsize(part = "header", size = 12) %>%
  fontsize(part = "body", size = 12) %>%
  colformat_double(j = "Count", digits = 4, big.mark = ",") %>%
  width(j = 1, width = 2) %>%
  width(j = 2, width = 2.5) %>%
  width(j = 3, width = 1) %>%
  merge_v(j = 1) %>%
  border_inner_h(part = "body", border = table_border) %>%
  hline_bottom(part = "body", border = table_border) %>%
  bg(bg = "black", part = "header") %>%
  color(color = "white", part = "header")
Table 3:

A Count of Annex II Bats Grouped by Month

Month

Bat Species

Count

May

Barbastella barbastellus

7

Rhinolophus ferrumequinum

9

Rhinolophus hipposideros

10

June

Barbastella barbastellus

198

Rhinolophus ferrumequinum

47

Rhinolophus hipposideros

11

July

Barbastella barbastellus

49

Rhinolophus ferrumequinum

7

Rhinolophus hipposideros

4

August

Barbastella barbastellus

109

Rhinolophus ferrumequinum

31

Rhinolophus hipposideros

23

September

Barbastella barbastellus

9

Rhinolophus ferrumequinum

64

Rhinolophus hipposideros

17

October

Barbastella barbastellus

47

Rhinolophus ferrumequinum

2

Rhinolophus hipposideros

8

Tables of numbers can be frustrating to read. To help readability the table can be annotated highlighting the count on a colour scale linked to the magnitude of the result. This is shown in Table 4 for the count of bats shown in Table 3.

Code
# Libraries (Packages) used
library(tidyverse)
library(flextable)
library(officer)
library(iBats)

# Add data and time information to the statics data using the iBats::date_time_info
statics_plus <- iBats::date_time_info(statics)

# Vector of month names used in the factor function
month_levels_full <- c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")

# re-order months column with rev so months run top to bottom of the table
statics_plus <- statics_plus %>%
   mutate(MonthFull = factor(MonthFull, levels = rev(month_levels_full), ordered = T))

AnnexII <- c("Barbastella barbastellus", "Rhinolophus ferrumequinum", "Rhinolophus hipposideros")

table_border <- fp_border(color = "black", width = 1) # from library(officer)

table_data <- statics_plus %>%
  filter(Species %in% AnnexII) %>%
  group_by(MonthFull, Species) %>%
  count() %>%
  # arrange descending
  arrange(desc(MonthFull)) %>%
  # rename n as count
  rename(Month = MonthFull, `Bat Species` = Species, Count = n) 

# Find max and min counts
maxCount <- max(table_data$Count, na.rm = T)
minCount <- min(table_data$Count, na.rm = T)

#Make coloured palette
colourer <- scales::col_numeric(
  palette = c("transparent", "orangered2"),
  domain = c(minCount, maxCount)) 

# Table made with flextable
table_data %>% 
  flextable(col_keys = colnames(.)) %>%
  italic(j = 2, italic = TRUE, part = "body") %>% 
  fontsize(part = "header", size = 12) %>%
  fontsize(part = "body", size = 12) %>%
  colformat_double(j = "Count", digits = 4, big.mark = ",") %>%
  width(j = 1, width = 2) %>%
  width(j = 2, width = 2.5) %>%
  width(j = 3, width = 1) %>%
  merge_v(j = 1) %>%
  # Scale colour to count
  bg(bg = colourer, j = "Count", part = "body") %>%
  border_inner_h(part = "body", border = table_border) %>%
  hline_bottom(part = "body", border = table_border) %>%
  bg(bg = "black", part = "header") %>%
  color(color = "white", part = "header")
Table 4:

A Count (Highlighted) of Annex II Bats Grouped by Month

Month

Bat Species

Count

May

Barbastella barbastellus

7

Rhinolophus ferrumequinum

9

Rhinolophus hipposideros

10

June

Barbastella barbastellus

198

Rhinolophus ferrumequinum

47

Rhinolophus hipposideros

11

July

Barbastella barbastellus

49

Rhinolophus ferrumequinum

7

Rhinolophus hipposideros

4

August

Barbastella barbastellus

109

Rhinolophus ferrumequinum

31

Rhinolophus hipposideros

23

September

Barbastella barbastellus

9

Rhinolophus ferrumequinum

64

Rhinolophus hipposideros

17

October

Barbastella barbastellus

47

Rhinolophus ferrumequinum

2

Rhinolophus hipposideros

8

2 Summary Statistics

Summary statistics for the Common pipistrelle (Pipistrellus pipistrellus) observations at all static locations for each night is shown in Table 5; to make the table easier to read, the median column is highlighted with a graduated colour scale. The summary statistics are calculated with the favstats function from the mosaic package.

Code
# Libraries (Packages) used
library(tidyverse)
library(mosaic)
library(gt)
library(gtExtras)
library(iBats)

# Add data and time information to the statics data using the iBats::date_time_info
statics_plus <- iBats::date_time_info(statics)

# Group by Description and Night and Count the Observations
grouped_data <- statics_plus %>% 
  filter(Species == "Pipistrellus pipistrellus") %>% 
  group_by(Description, Night) %>% 
  tally()

# The summary statistics are saved into a variable riven_cond_stats 
cond_stats <- favstats(n~Description, data = grouped_data)

# riven_cond_stats is made into a the table (using the code below)
cond_stats %>% 
  # Create the table with the gt package
  gt() %>% 
  # Style the header to black fill and white text
  tab_style(
    style = list(
      cell_fill(color = "black"),
      cell_text(color = "white", weight = "bold")),
    locations = cells_column_labels(
      columns = c(everything())
    )
  ) %>% 
  gt_color_rows(median, palette = "ggsci::yellow_material") %>% 
  tab_options(data_row.padding = px(2)) 
Table 5:

Common Pipstrelle Observations (Passes) at the Static Locations

Description min Q1 median Q3 max mean sd n missing
Static 1 1 1.0 2 3.5 73 6.391304 15.570190 23 0
Static 2 1 6.0 10 17.5 42 12.031746 8.820738 63 0
Static 3 1 3.5 6 11.0 40 9.740741 10.323955 27 0
Static 4 10 33.5 78 155.5 438 115.129032 121.519749 31 0
Static 5 1 4.0 7 9.0 31 8.103448 6.586484 29 0

Table 6 shows the summary statistics for Annex II Bat Species3 observations by month; with the maximum number of nightly bat passes highlighted. The table can be grouped by Month through switching Common and MonthFull names in the code below; this alternative table is shown in Table 7 with the mean (average) number of nightly bat passes highlighted.

Code
# Libraries (Packages) used
library(tidyverse)
library(mosaic)
library(gt)
library(gtExtras)
library(iBats)

# Annex II Bats (in the static)
AnnexII <- c("Barbastella barbastellus", "Rhinolophus ferrumequinum", "Rhinolophus hipposideros")

# Add data and time information to the statics data using the iBats::date_time_info
statics_plus <- iBats::date_time_info(statics)

# List of bat common names and the scientific names
BatCommon <- c(
  "Barbastella barbastellus" = "Barbastelle",
  "Rhinolophus ferrumequinum" = "Greater horseshoe",
  "Rhinolophus hipposideros" = "Lesser horseshoe")

# From Scientific name create a Common Name Vector 
statics_plus$Common <-  unname(BatCommon[statics_plus$Species])

# Group by Description and Night and Count the Observations
grouped_data <- statics_plus %>% 
  filter(Species %in% AnnexII) %>%
  group_by(Common, MonthFull, Night) %>% 
  tally() %>% 
  summarise(Minimum = round(min(n, na.rm = T), 2),
              Q1 = round(quantile(n, c(0.25), na.rm = T), 2),
              Mean = round(mean(n, na.rm = T), 2),
              Median = round(median(n, na.rm = T), 2),
              Q3 = round(quantile(n, c(0.75), na.rm = T), 2),
              Maximum = round(max(n, na.rm = T), 2),
              SD = round(sd(n, na.rm = T), 2),
              Nr = n())

# riven_cond_stats is made into a the table (using the code below)
grouped_data %>% 
  # Create the table with the gt package
  gt(rowname_col = "MonthFull",
    groupname_col = "Common") %>% 
  # Style the header to black fill and white text
  tab_style(
    style = list(
      cell_fill(color = "black"),
      cell_text(color = "white", weight = "bold")),
    locations = cells_column_labels(
      columns = c(everything())
    )
  ) %>% 
  tab_style(
    style = list(
      cell_fill(color = "midnightblue"),
      cell_text(color = "white"),
      cell_text(weight = "normal")
      ),
    locations = cells_body(
      columns = MonthFull
    )) %>% 
  gt_color_rows(Maximum, palette = "ggsci::purple_material") %>% 
  tab_options(data_row.padding = px(2)) 
Table 6:

Nightly Species Observations (Passes) by Month

Minimum Q1 Mean Median Q3 Maximum SD Nr
Barbastelle
May 1 1.00 1.75 1.5 2.25 3 0.96 4
June 1 1.50 18.00 18.0 31.50 42 15.67 11
July 4 5.75 8.17 8.0 10.25 13 3.43 6
August 1 3.00 7.27 6.0 8.00 22 6.13 15
September 1 1.00 1.80 1.0 2.00 4 1.30 5
October 4 5.00 6.71 6.0 7.00 13 2.98 7
Greater horseshoe
May 2 2.50 3.00 3.0 3.50 4 1.00 3
June 1 1.50 3.13 3.0 4.00 7 1.81 15
July 1 1.00 1.75 1.5 2.25 3 0.96 4
August 1 1.00 2.07 2.0 3.00 6 1.39 15
September 3 3.25 10.67 4.0 6.25 43 15.91 6
October 1 1.00 1.00 1.0 1.00 1 0.00 2
Lesser horseshoe
May 1 1.00 2.50 1.5 3.00 6 2.38 4
June 1 1.00 1.38 1.0 2.00 2 0.52 8
July 1 1.50 2.00 2.0 2.50 3 1.41 2
August 1 1.00 2.30 1.5 3.00 6 1.70 10
September 1 1.50 2.43 3.0 3.00 4 1.13 7
October 1 1.00 2.67 1.0 3.50 6 2.89 3
Code
# Libraries (Packages) used
library(tidyverse)
library(mosaic)
library(gt)
library(gtExtras)
library(iBats)

# Annex II Bats (in the static)
AnnexII <- c("Barbastella barbastellus", "Rhinolophus ferrumequinum", "Rhinolophus hipposideros")

# Add data and time information to the statics data using the iBats::date_time_info
statics_plus <- iBats::date_time_info(statics)

# List of bat common names and the scientific names
BatCommon <- c(
  "Barbastella barbastellus" = "Barbastelle",
  "Rhinolophus ferrumequinum" = "Greater horseshoe",
  "Rhinolophus hipposideros" = "Lesser horseshoe")

# From Scientific name create a Common Name Vector 
statics_plus$Common <-  unname(BatCommon[statics_plus$Species])

# Group by Description and Night and Count the Observations
grouped_data <- statics_plus %>% 
  filter(Species %in% AnnexII) %>%
  group_by(Common, MonthFull, Night) %>% 
  tally() %>% 
  summarise(Minimum = round(min(n, na.rm = T), 2),
              Q1 = round(quantile(n, c(0.25), na.rm = T), 2),
              Mean = round(mean(n, na.rm = T), 2),
              Median = round(median(n, na.rm = T), 2),
              Q3 = round(quantile(n, c(0.75), na.rm = T), 2),
              Maximum = round(max(n, na.rm = T), 2),
              SD = round(sd(n, na.rm = T), 2),
              Nr = n())



# riven_cond_stats is made into a the table (using the code below)
grouped_data %>% 
  # Create the table with the gt package
  gt(rowname_col = "Common",
    groupname_col = "MonthFull") %>% 
  # Style the header to black fill and white text
  tab_style(
    style = list(
      cell_fill(color = "black"),
      cell_text(color = "white", weight = "bold")),
    locations = cells_column_labels(
      columns = c(everything())
    )
  ) %>% 
  gt_color_rows(Mean, palette = "ggsci::green_material") %>% 
  tab_options(data_row.padding = px(2)) 
Table 7:

Nightly Species Observations by Month (Alternative Grouping)

Minimum Q1 Mean Median Q3 Maximum SD Nr
May
Barbastelle 1 1.00 1.75 1.5 2.25 3 0.96 4
Greater horseshoe 2 2.50 3.00 3.0 3.50 4 1.00 3
Lesser horseshoe 1 1.00 2.50 1.5 3.00 6 2.38 4
June
Barbastelle 1 1.50 18.00 18.0 31.50 42 15.67 11
Greater horseshoe 1 1.50 3.13 3.0 4.00 7 1.81 15
Lesser horseshoe 1 1.00 1.38 1.0 2.00 2 0.52 8
July
Barbastelle 4 5.75 8.17 8.0 10.25 13 3.43 6
Greater horseshoe 1 1.00 1.75 1.5 2.25 3 0.96 4
Lesser horseshoe 1 1.50 2.00 2.0 2.50 3 1.41 2
August
Barbastelle 1 3.00 7.27 6.0 8.00 22 6.13 15
Greater horseshoe 1 1.00 2.07 2.0 3.00 6 1.39 15
Lesser horseshoe 1 1.00 2.30 1.5 3.00 6 1.70 10
September
Barbastelle 1 1.00 1.80 1.0 2.00 4 1.30 5
Greater horseshoe 3 3.25 10.67 4.0 6.25 43 15.91 6
Lesser horseshoe 1 1.50 2.43 3.0 3.00 4 1.13 7
October
Barbastelle 4 5.00 6.71 6.0 7.00 13 2.98 7
Greater horseshoe 1 1.00 1.00 1.0 1.00 1 0.00 2
Lesser horseshoe 1 1.00 2.67 1.0 3.50 6 2.89 3