Wednesday Birders 2018

Stats and plots using R

R
python
birding
Author

Mark Isken

Published

February 4, 2019

In a previous post I described how I used a mix of Python and R to acquire, clean, and analyze eBird data from our birding group’s weekly walks. In this post, I’ll add plots for 2018 and do a little more analysis.

To summarize the basic approach (described in the first post), we:

Data prep

All of the data prep and analysis is done in R. We’ll need a few libraries:

library(dplyr)
library(ggplot2)
library(lubridate)

Before diving into analysis and plots, a little data prep was needed:

  • downloaded data from 2018 and converted the JSON to csv (used appropriately modified Python script detailed in first post)
  • read CSV file into an R dataframe
  • convert datetime fields to POSIXct
  • include only the lists from our Wednesday morning walks
  • combined the 2018 data with the data from 2015-1017 and saved as an RDS file
obs_df <- readRDS(file = "./data/observations.rds")

There are a bunch of fields in the observation records from eBird. Here’s just a sample focusing on the stuff we need. Each row in the data frame corresponds to an individual species count on a certain date in a certain location.

obs_df <- obs_df %>% 
  select(checklistId, obsDt, locName, comName, sciName, howMany, subId)

head(obs_df)
  checklistId               obsDt             locName                comName
1     CL24105 2015-03-18 09:35:00 Charles Ilsley Park           Canada Goose
2     CL24105 2015-03-18 09:35:00 Charles Ilsley Park         Turkey Vulture
3     CL24105 2015-03-18 09:35:00 Charles Ilsley Park        Red-tailed Hawk
4     CL24105 2015-03-18 09:35:00 Charles Ilsley Park         Sandhill Crane
5     CL24105 2015-03-18 09:35:00 Charles Ilsley Park Red-bellied Woodpecker
6     CL24105 2015-03-18 09:35:00 Charles Ilsley Park       Downy Woodpecker
               sciName howMany     subId
1    Branta canadensis       2 S22414352
2       Cathartes aura       1 S22414352
3    Buteo jamaicensis       2 S22414352
4  Antigone canadensis       2 S22414352
5 Melanerpes carolinus       1 S22414352
6   Picoides pubescens       1 S22414352

Plots of Species Counts

How many birds of each species have we seen? How frequently are each species seen?

Let’s start with simple bar charts:

  • one bar per species, one year per graph,
  • bar length is number of birds seen,
  • color of bar is related to percentage of lists on which that species seen,
  • number at end of bar is percentage of lists on which that species seen.

Here are the plots from 2015-2017:

knitr::include_graphics("images/top30_2015.png")

knitr::include_graphics("images/top30_2016.png")

knitr::include_graphics("images/top30_2017.png")

Creating the plot for 2018

`summarise()` has grouped output by 'year', 'obsDt', 'locName'. You can
override using the `.groups` argument.
`summarise()` has grouped output by 'comName'. You can override using the
`.groups` argument.

The plots above are easy to create from a dataframe that looks like this:

                   comName birding_year num_lists tot_birds totlists   pctlists
1           American Robin         2018        32       466       41 0.78048780
2             Canada Goose         2018        23       239       41 0.56097561
3       American Goldfinch         2018        32       212       41 0.78048780
4     Red-winged Blackbird         2018        33       204       41 0.80487805
5            Mourning Dove         2018        35       176       41 0.85365854
6            Cedar Waxwing         2018        18       142       41 0.43902439
7                 Blue Jay         2018        41       139       41 1.00000000
8   Black-capped Chickadee         2018        41       125       41 1.00000000
9            American Crow         2018        36       106       41 0.87804878
10        Eastern Bluebird         2018        23        97       41 0.56097561
11       European Starling         2018        15        89       41 0.36585366
12       Northern Cardinal         2018        40        81       41 0.97560976
13            Song Sparrow         2018        26        81       41 0.63414634
14          Turkey Vulture         2018        17        52       41 0.41463415
15                 Mallard         2018        18        51       41 0.43902439
16        Ring-billed Gull         2018        12        51       41 0.29268293
17            Lesser Scaup         2018         1        50       41 0.02439024
18                 Redhead         2018         1        50       41 0.02439024
19         Dark-eyed Junco         2018        15        46       41 0.36585366
20         Tufted Titmouse         2018        26        45       41 0.63414634
21 White-breasted Nuthatch         2018        24        43       41 0.58536585
22          Common Grackle         2018        16        41       41 0.39024390
23  Red-bellied Woodpecker         2018        31        41       41 0.75609756
24        Downy Woodpecker         2018        27        38       41 0.65853659
25           Field Sparrow         2018        15        37       41 0.36585366
26             Rock Pigeon         2018         6        37       41 0.14634146
27            Barn Swallow         2018        12        36       41 0.29268293
28        Northern Flicker         2018        21        36       41 0.51219512
29            Tree Swallow         2018        11        35       41 0.26829268
30    Brown-headed Cowbird         2018        12        33       41 0.29268293

Here’s how you can make such a data frame.

# Create num species by list df
numsp_bylist <- obs_df %>%
  group_by(year=year(obsDt), obsDt, locName, subId) %>%
  summarize(
    n = n(),
    totbirds = sum(howMany)
  ) %>% 
  arrange(year, subId)
`summarise()` has grouped output by 'year', 'obsDt', 'locName'. You can
override using the `.groups` argument.
# Using numsp_bylist, create num lists by date
numlists_bydt <- numsp_bylist %>%
  group_by(obsDt) %>%
  summarise(
    numlists = n()
  ) %>%
  filter(numlists >= 1) %>%
  arrange(obsDt)

# Usings numlists_bydt, create num lists by year
numlists_byyear <- numlists_bydt %>%
  group_by(birding_year=year(obsDt)) %>%
  summarise(
    totlists = sum(numlists)
  )

# Now ready to compute species by year
species_byyear <- obs_df %>%
  group_by(comName, birding_year=year(obsDt)) %>%
  summarize(
    num_lists = n(),
    tot_birds = sum(howMany)
  ) %>%
  arrange(birding_year, desc(tot_birds))
`summarise()` has grouped output by 'comName'. You can override using the
`.groups` argument.
# Join to numlists_byyear so we can compute pct of lists each species
# appeared in.
species_byyear <- left_join(species_byyear, numlists_byyear, by = 'birding_year') 

bird_year <- 2018

top_obs_byyear <- species_byyear %>%
  filter(birding_year == bird_year) %>%
  mutate(pctlists = num_lists / totlists) %>%
  arrange(desc(tot_birds)) %>%
  head(30)

Make the plot.

ggplot(top_obs_byyear) + 
    geom_bar(aes(x=reorder(comName, tot_birds), 
                 y=tot_birds, fill=pctlists), stat = "identity") +
                 scale_fill_gradient(low='#05D9F6', high='#5011D1') +
                 labs(x="", 
                      y="Total number of birds sighted", 
                      fill="Pct of lists",
                      title = paste0("Top 30 most sighted birds by Wednesday Birders in ", bird_year),
                      subtitle = "(number at right of bar is % of lists on which the species appeared)") +
                 coord_flip() +
    geom_text(data=top_obs_byyear,
              aes(x=reorder(comName,tot_birds),
                  y=tot_birds,
                  label=format(pctlists, digits = 1),
                  hjust=0
              ), size=3) + ylim(0,500)

Patterns and changes over time

Now that we have four full years of eBird sightings data from the four parks we visit each month, let’s see how things look over time (and location).

Here’s an overall time series plot of number of species appearing on each list.

numsp_bylist %>% 
  ggplot() + geom_point(aes(x = as.Date(obsDt), y = n)) +
  ggtitle("Number of species sighted for each list", subtitle = "2015-2018") +
  scale_x_date(date_breaks = "3 months",
               date_labels = "%m-%y") +
  xlab("List date (month-year)") +
  ylab("Number of species")

Same plot but faceted by park.

numsp_bylist %>% 
  ggplot() + geom_point(aes(x = as.Date(obsDt), y = n)) +
  facet_grid(locName ~ .) +
  ggtitle("Number of species sighted for each list", subtitle = "2015-2018") +
  scale_x_date(date_breaks = "3 months",
               date_labels = "%m-%y") +
  xlab("List date (month-year)") +
  ylab("Number of species")

Let’s look at distribution of list size during the peak period of April-October.

numsp_bylist %>% 
  filter(month(obsDt) %in% 4:10) %>% 
  ggplot() + geom_histogram(aes(x = n, y = ..density..), bins = 20) +
  ggtitle("Distribution of list size", subtitle = "April-October") +
  xlab("List size (number of species)") +
  geom_density(aes(x = n))
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.

Use boxplots and group by park.

numsp_bylist %>% 
  filter(month(obsDt) %in% 4:10) %>% 
  ggplot() + geom_boxplot(aes(x = locName, y = n)) +
  ggtitle("Distribution of list size", subtitle = "April-October") +
  xlab("Location") + ylab("List size (number of species)")

Let’s repeat the above but using total number of birds counted per list instead of number of species per list.

numsp_bylist %>% 
  filter(month(obsDt) %in% 4:10) %>% 
  ggplot() + geom_boxplot(aes(x = locName, y = totbirds)) +
  ggtitle("Distribution of number of total birds per list", subtitle = "April-October") +
  xlab("Location") + ylab("List size (number of birds counted)")

Let’s look at time series plots of number of birds counted for each of the species in the Top 30 for 2018. We’ll need the following data frame to power the plots.

top_species_byyear <- top_obs_byyear %>% 
  select("comName") %>% 
  inner_join(species_byyear, by = "comName") %>% 
  mutate(pctlists = num_lists / totlists)

top_species_byyear
# A tibble: 143 × 6
# Groups:   comName [30]
   comName        birding_year num_lists tot_birds totlists pctlists
   <fct>                 <dbl>     <int>     <dbl>    <int>    <dbl>
 1 American Robin         2015        26       313       33    0.788
 2 American Robin         2016        32       491       39    0.821
 3 American Robin         2017        34       269       48    0.708
 4 American Robin         2018        32       466       41    0.780
 5 American Robin         2019        33       390       41    0.805
 6 Canada Goose           2015        20       538       33    0.606
 7 Canada Goose           2016        19       518       39    0.487
 8 Canada Goose           2017        23       200       48    0.479
 9 Canada Goose           2018        23       239       41    0.561
10 Canada Goose           2019        22       395       41    0.537
# ℹ 133 more rows

Bar height is total number of birds counted and bar color is related to percentage of lists on which this bird appeared.

ggplot(top_species_byyear) + 
  geom_bar(aes(x = birding_year, y = tot_birds, fill=pctlists), 
           stat = "identity") +
  scale_fill_gradient(low='#05D9F6', high='#5011D1') +
  ggtitle("Number of birds counted by year") +
  facet_wrap(~comName, ncol = 5, scales = "free_y") +
  labs(x="Year", 
       y="Total number of birds sighted", 
       fill="Pct of lists")

A few things pop out:

  • overall number of Black-capped Chicadees counted has declined, though they are still sighted very frequently. Maybe we’ve become somewhat immune to their constant chatter and just aren’t recording as many. “Oh, just another chicadee.”
  • Downy Woodpecker sightings have declined both in terms of number and frequency as have White-breasted Nuthatch sightings.
  • Mourning Doves are up in both number and frequency.
  • After 2015, sightings of Eastern Bluebirds have been remarkably constant.

Now swap roles of total birds and frequency of appearing on a list. Bar height is percentage of lists on which this bird appeared and bar color is related to total number of birds counted.

ggplot(top_species_byyear) + 
  geom_bar(aes(x = birding_year, y = pctlists, fill = tot_birds), 
           stat = "identity") +
  scale_fill_gradient(low='#05D9F6', high='#5011D1') +
  ggtitle("Number of birds counted by year") +
  facet_wrap(~comName, ncol = 5) +
  labs(x="Year", 
       y="Pct of lists", 
       fill="Total birds counted")

Next steps

Now our group can try to make sense of these and I’m sure we’ll have ideas for more analysis. In my next post I’m going to explore those springtime favorites, the warblers.

Reuse

Citation

BibTeX citation:
@online{isken2019,
  author = {Isken, Mark},
  title = {Wednesday {Birders} 2018},
  date = {2019-02-04},
  langid = {en}
}
For attribution, please cite this work as:
Isken, Mark. 2019. “Wednesday Birders 2018.” February 4, 2019.