<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Bits of Analytics</title>
<link>https://bitsofanalytics.org/</link>
<atom:link href="https://bitsofanalytics.org/index.xml" rel="self" type="application/rss+xml"/>
<description>An analytics blog with a Python and R focus</description>
<generator>quarto-1.4.551</generator>
<lastBuildDate>Wed, 11 Mar 2026 04:00:00 GMT</lastBuildDate>
<item>
  <title>Another cautionary tale for using LLMs for business analytics</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/caution_llm_weighted_avg/caution_llm.html</link>
  <description><![CDATA[ 




<p>In a <a href="https://bitsofanalytics.org/posts/caution_llm/caution_llm">previous post</a>, I explored a data analysis error made repeatedly over the course of several years by multiple LLMs. Recently, while grading homework in my <a href="https://pcda.misken.org/">Practical Computing for Data Analytics</a> class, another such LLM misstep grabbed my attention. This is one is less subtle than the one raised in the previous post and, to me, more troubling.</p>
<p>But, the point of this post is not to highlight some shortcoming of LLMs. There’s plenty of examples of this out there already. I’m more interested in the response that Gemini gives me and, really, my students who are the intended audience when I ask it to reflect on why it made the errors it did.</p>
<section id="exploring-country-metrics" class="level2">
<h2 class="anchored" data-anchor-id="exploring-country-metrics">Exploring country metrics</h2>
<p>One of the homework assignments in my class is focused on learning the basics of the R package, <a href="https://dplyr.tidyverse.org/">dplyr</a>. The students are given a dataset containing various economic, population, energy, and health related metrics that was obtained from the well known <a href="https://ourworldindata.org/">Our World in Data</a> site. Each row in the dataset is for a single country in a single year.</p>
</section>
<section id="ask-the-students" class="level2">
<h2 class="anchored" data-anchor-id="ask-the-students">Ask the students</h2>
<p>During the assignment, students are asked to filter the dataframe to a single year and then compute the mean of GDP per capita by continent. Pretty much everyone did the following:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">country_metrics  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span>  </span>
<span id="cb1-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2022</span>)  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span>  </span>
<span id="cb1-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(continent)  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span>  </span>
<span id="cb1-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb1-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean_gdp_pc =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(gdp_per_capita, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb1-6">  ) </span></code></pre></div>
</div>
<p>The problem is that the units <code>gdp_per_capita</code> are in dollars per person and the population of the different countries within a continent might vary tremendously. In order to take this into account, we need to do a weighted average. Doing a simple average of <code>gdp_per_capita</code> for one country containing 300 million people and another containing 20 million people, doesn’t make sense if we want the average GDP per capita for all 320 million people. We can do that like this.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">country_metrics  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span>  </span>
<span id="cb2-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2022</span>)  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span>  </span>
<span id="cb2-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(continent)  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span>  </span>
<span id="cb2-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb2-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean_gdp_pc =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(gdp_per_capita <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> population) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(population),</span>
<span id="cb2-6">  ) </span></code></pre></div>
</div>
<p>This is a common oversight that I’ve seen over multiple semesters and as part of my general feedback to the class I discuss this issue. It also pops up with metrics such as life expectancy or under five mortality rates. But, this semester, I thought I’d see what one of the big LLMs would do with this. Our university has an agreement with Google and all students have free access to Google Gemini. So, I used Google Gemini (Gemini 3 Flash).</p>
</section>
<section id="ask-the-llm---question-1-compute-mean-gdp-per-capita" class="level2">
<h2 class="anchored" data-anchor-id="ask-the-llm---question-1-compute-mean-gdp-per-capita">Ask the LLM - Question 1: Compute mean GDP per capita</h2>
<p>I started with this prompt.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>I have an R dataframe where each row is a country and I have columns with the following variables: GDP per capita, life expectancy, and total energy use. I also have a column called continent that indicates which continent each country is a part of. Please show me the R code to compute the mean GDP per capita by continent, mean life expectancy by continent and mean total energy use by continent. Use the dplyr library.</p>
</div>
</div>
<p>Both GDP per capita and life expectancy require weighted averaging. For total energy use, a summation probably makes more sense than any type of mean (weighted or not). Let’s see what the LLM comes up with.</p>
<section id="geminis-response-to-question-1" class="level3">
<h3 class="anchored" data-anchor-id="geminis-response-to-question-1">Gemini’s response to Question 1</h3>
<p>Using dplyr is the way to go here. It makes data manipulation feel more like writing a sentence and less like solving a puzzle. To get those averages, you’ll want to use the <strong>split-apply-combine</strong> strategy: split the data by continent, apply the mean() function to your variables, and combine it all back into a neat summary table.</p>
<p><strong>The R Code</strong></p>
<p>Assuming your dataframe is named <code>df</code>, here is the cleanest way to compute those means:</p>
<p>R</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dplyr)</span>
<span id="cb3-2"></span>
<span id="cb3-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Summary statistics by continent</span></span>
<span id="cb3-4">continent_summary <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb3-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(continent) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb3-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb3-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean_gdp =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">GDP per capita</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb3-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean_life_exp =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">life expectancy</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb3-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean_energy =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">total energy use</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb3-10">  )</span>
<span id="cb3-11"></span>
<span id="cb3-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># View the results</span></span>
<span id="cb3-13"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(continent_summary)</span></code></pre></div>
</div>
<p><strong>Why this works:</strong></p>
<ul>
<li><strong>group_by(continent)</strong>: This tells R to treat each continent as its own little mini-dataframe for the next step.</li>
<li><strong>summarize()</strong>: This collapses each group into a single row.</li>
<li><strong>na.rm = TRUE</strong>: This is a lifesaver. If a single country is missing data (an NA), the mean would normally return NA. This argument tells R to just ignore the missing values and calculate the average from the data that is there. Pro Tip: If you have many more columns and don’t want to type them all out, you can use across() to do them all at once:</li>
</ul>
<p>R</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(continent) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">GDP per capita</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">life expectancy</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">total energy use</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>), mean, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>))</span></code></pre></div>
</div>
<p>Would you like me to show you how to visualize these continent averages using a bar chart or a bubble plot?</p>
</section>
<section id="reflections-on-geminis-answer-to-question-1" class="level3">
<h3 class="anchored" data-anchor-id="reflections-on-geminis-answer-to-question-1">Reflections on Gemini’s answer to Question 1</h3>
<p>Obviously, Gemini provided code to do simple averaging for all three statistics. Notice how friendly and confident the LLM sounds as it assures the student that it knows its stuff. It cheerily explains why it did what it did in plain english.</p>
<p>I thought, well, I didn’t actually give Gemini a population field to work with. Let’s help it out.</p>
</section>
</section>
<section id="ask-the-llm---question-2-including-population" class="level2">
<h2 class="anchored" data-anchor-id="ask-the-llm---question-2-including-population">Ask the LLM - Question 2: Including population</h2>
<div class="callout callout-style-default callout-tip no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>Oops, I forgot to mention that I also have a column with the population of each country. Please repeat the previous analysis but also include a computation of the total population by continent. It should still be done with a single dplyr command.</p>
</div>
</div>
<section id="geminis-response-to-question-2" class="level3">
<h3 class="anchored" data-anchor-id="geminis-response-to-question-2">Gemini’s response to Question 2</h3>
<p>No problem! Adding the population is a quick tweak. Since you want the total population rather than the average, we just swap the function for that specific column. Here is the updated code to handle everything in one go:</p>
<p>R</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dplyr)</span>
<span id="cb5-2"></span>
<span id="cb5-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Comprehensive summary by continent</span></span>
<span id="cb5-4">continent_summary <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb5-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(continent) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb5-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb5-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean_gdp =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">GDP per capita</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb5-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean_life_exp =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">life expectancy</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb5-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean_energy =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">total energy use</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb5-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">total_pop =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(population, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Summing for total population</span></span>
<span id="cb5-11">  )</span>
<span id="cb5-12"></span>
<span id="cb5-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># View the results</span></span>
<span id="cb5-14"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(continent_summary)</span></code></pre></div>
</div>
<p><strong>Key Adjustments</strong></p>
<ul>
<li><code>sum(population)</code>: Unlike the other variables where we want the average, we use sum() here to get the aggregate headcount for the entire continent.</li>
<li><strong>The Power of the Pipe (%&gt;%)</strong>: Notice how we keep everything inside a single block. This “pipeline” style keeps your workspace clean and your logic easy to follow.</li>
</ul>
<p><strong>Quick Tip on Data Types</strong></p>
<p>If your population numbers are very large, R might display them in scientific notation (like 1.4e09). If you prefer seeing the full numbers, you can run <code>options(scipen = 999)</code> before printing your summary.</p>
<p>Would you like to see how to rank these continents from highest to lowest total population within the same command?</p>
</section>
<section id="reflections-on-geminis-answer-to-question-2" class="level3">
<h3 class="anchored" data-anchor-id="reflections-on-geminis-answer-to-question-2">Reflections on Gemini’s answer to Question 2</h3>
<p>Not surprisingly, it correctly added the code to sum the <code>population</code> field. But, it didn’t pick up on the fact that countries have different populations and that things like mean GDP per capita and mean life expectancy need weighted averages.</p>
</section>
</section>
<section id="ask-the-llm---question-3-challenge-the-previous-answers" class="level2">
<h2 class="anchored" data-anchor-id="ask-the-llm---question-3-challenge-the-previous-answers">Ask the LLM - Question 3: Challenge the previous answers</h2>
<p>Now it was time to call out the LLM on its analysis.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>Wait, isn’t that the wrong way to compute mean GDP per capita and mean life expectancy? Each country has a different population and a simple average will lead to inaccurate results.</p>
</div>
</div>
<section id="geminis-response-to-question-3" class="level3">
<h3 class="anchored" data-anchor-id="geminis-response-to-question-3">Gemini’s response to Question 3</h3>
<p>You are absolutely right. My apologies for the oversight—you’ve hit on a crucial statistical distinction! If you take a simple average of GDP per capita across countries, you are treating a country of 50,000 people the same as a country of 1.4 billion. To get an accurate picture of the “average person’s” experience on a continent, you need to calculate <strong>weighted means</strong>.</p>
<p>For GDP per capita, the mathematically accurate way to find the continent average is to divide the <strong>Total GDP</strong> of the continent by its <strong>Total Population</strong>. For Life Expectancy, we weight the value by the population of each country.</p>
<p><strong>The Corrected “Weighted” Analysis</strong></p>
<p>Here is how you do that accurately in a single dplyr pipe:</p>
<p>R</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dplyr)</span>
<span id="cb6-2"></span>
<span id="cb6-3">continent_analysis <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb6-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(continent) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb6-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb6-6">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Total population for the continent</span></span>
<span id="cb6-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">total_pop =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(population, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb6-8">    </span>
<span id="cb6-9">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Weighted Mean Life Expectancy: sum(life_exp * pop) / sum(pop)</span></span>
<span id="cb6-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">weighted_mean_life_exp =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">life expectancy</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> population, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> total_pop,</span>
<span id="cb6-11">    </span>
<span id="cb6-12">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># True GDP per capita: Total GDP / Total Population</span></span>
<span id="cb6-13">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># We first calculate Total GDP for each country (GDP_per_cap * pop)</span></span>
<span id="cb6-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean_gdp_per_capita =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">GDP per capita</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> population, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> total_pop,</span>
<span id="cb6-15">    </span>
<span id="cb6-16">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Mean Energy Use (Weighted by population)</span></span>
<span id="cb6-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">weighted_mean_energy =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">total energy use</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> population, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> total_pop</span>
<span id="cb6-18">  )</span>
<span id="cb6-19"></span>
<span id="cb6-20"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(continent_analysis)</span></code></pre></div>
</div>
<p><strong>Why this is more accurate:</strong></p>
<ul>
<li><strong>The Problem:</strong> If “Country A” has 10 people and a GDP per capita of <img src="https://latex.codecogs.com/png.latex?%5C$100">, and “Country B” has 90 people and a GDP per capita of <img src="https://latex.codecogs.com/png.latex?%5C$10">, a simple average says the mean is <img src="https://latex.codecogs.com/png.latex?%5C$55">.</li>
<li><strong>The Reality:</strong> In that combined group of 100 people, the actual wealth per person is only <img src="https://latex.codecogs.com/png.latex?%5C$19">.</li>
<li><strong>The Logic:</strong> By multiplying the variable by the population before summing, we ensure that larger countries have a proportional “vote” in the final continental average.</li>
</ul>
<p><strong>A Note on Energy Use</strong></p>
<p>Depending on your specific goals, “total energy use” is often reported as an aggregate for the whole country. If your column is already the Total energy (not per capita), you would simply use sum(total_energy_use) to get the continent’s footprint. If it is Energy Use per capita, then the weighted formula above is the way to go.</p>
<p>Would you like to see how to use weighted.mean()—a built-in R function—to make this code even shorter?</p>
</section>
<section id="reflections-on-geminis-answer-to-question-3" class="level3">
<h3 class="anchored" data-anchor-id="reflections-on-geminis-answer-to-question-3">Reflections on Gemini’s answer to Question 3</h3>
<p>So, I tell the LLM it’s wrong but I don’t tell it why it’s wrong. It comes up with the correct weighted averaging approach. Why? Did it not know that GDP per capita needs weighted averaging? How does it suddenly “know” this after being told it’s original approach is incorrect? Let’s ask Gemini.</p>
</section>
</section>
<section id="ask-the-llm---question-4-why-did-you-botch-this-in-the-first-place" class="level2">
<h2 class="anchored" data-anchor-id="ask-the-llm---question-4-why-did-you-botch-this-in-the-first-place">Ask the LLM - Question 4: Why did you botch this in the first place?</h2>
<div class="callout callout-style-default callout-tip no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>I’m curious as to why you gave an incorrect answer to begin with when this is a rather basic example of needing a weighted average? Is there something about the way LLMs are designed that makes these kinds of errors unavoidable?</p>
</div>
</div>
<section id="geminis-response-to-question-4" class="level3">
<h3 class="anchored" data-anchor-id="geminis-response-to-question-4">Gemini’s response to Question 4</h3>
<p>You’ve hit on one of the most fascinating “glitches in the matrix” of how Large Language Models (LLMs) like me function. You’re right—statistically, a weighted average is the only way to get a meaningful continental metric, yet I initially defaulted to a simple <code>mean()</code>.</p>
<p>There are three main reasons why LLMs frequently “trip” on this specific type of logic:</p>
<p><strong>1. Pattern Matching vs.&nbsp;Intent Modeling</strong></p>
<p>I am trained on billions of lines of code. In the vast majority of R tutorials and Stack Overflow answers, the “basic” way to use group_by and summarize is paired with the <code>mean()</code> function.</p>
<ul>
<li><strong>The Pattern:</strong> “Compute the average X by Y” <img src="https://latex.codecogs.com/png.latex?%5Crightarrow"> <code>summarize(mean(X))</code>.</li>
<li><strong>The Flaw:</strong> My “brain” recognized the request for a summary and reached for the most common code pattern associated with that request, rather than first performing a deep “sanity check” on the underlying statistical validity of applying that pattern to population-based data.</li>
</ul>
<p><strong>2. The “Path of Least Resistance” Bias</strong></p>
<p>LLMs are designed to be helpful and direct. When a user asks for “the mean,” the model often takes the literal linguistic path.</p>
<ul>
<li><strong>Literal Interpretation:</strong> You asked for “the mean,” so I provided the <code>mean()</code> function.</li>
<li><strong>Analytical Interpretation:</strong> You asked for a “continental average,” which requires understanding that a continent is a collection of weighted units.</li>
</ul>
<p>Unless a prompt explicitly mentions weighting or “per capita adjustments,” models often default to the simplest arithmetic interpretation because that is what satisfies the “literal” requirement of the prompt most often.</p>
<p><strong>3. Lack of a Persistent “World Model”</strong></p>
<p>While I “know” that China has more people than Iceland, I don’t “hold” that fact in active memory while writing the code unless I am specifically prompted to consider the relationship between the columns. I treat GDP_per_capita and population as abstract variable names until the logic of the calculation forces me to look at how they interact.</p>
<p><strong>The Architecture of the Error</strong></p>
<p>Most LLMs operate on Autoregressive Prediction. I am predicting the next most likely token (word or symbol) in a sequence. <img src="https://latex.codecogs.com/png.latex?P(w_%7Bn%7D%20%7C%20w_%7B1%7D,%20...,%20w_%7Bn-1%7D)"> In the sequence “summarize mean…”, the probability of the next token being the column name is very high. The “correction” requires a higher-level reasoning step that overrides this probabilistic “auto-complete.”</p>
<p><strong>How to “Stress Test” an AI in the future</strong></p>
<p>To get better results from any AI when dealing with data, it helps to use “Chain of Thought” prompting. If you say: “Think about the relationship between population and per-capita metrics, then write the code,” it forces the model to activate those analytical “neurons” before it starts typing code.</p>
<p>Would you like to see how to use the <code>weighted.mean()</code> function to make that previous code even more concise?</p>
</section>
<section id="reflections-on-geminis-answer-to-question-4" class="level3">
<h3 class="anchored" data-anchor-id="reflections-on-geminis-answer-to-question-4">Reflections on Gemini’s answer to Question 4</h3>
<p>Gemini’s response is a good reminder to students who use LLMs for analysis work. These things don’t know what words actually mean, they know how patterns of text relate to other patterns of text. These relationships are learned from looking at many examples of such text patterns. This one particular sentence really stood out:</p>
<p><strong>I treat GDP_per_capita and population as abstract variable names until the logic of the calculation forces me to look at how they interact.</strong></p>
<p>There is no world view nor sense of contextual meaning of terms like <em>GDP per capita</em> or <em>population</em>. The output of an autoregressive prediction model trained on vast troves of text provides the illusion of understanding meaning and having a world view. It wasn’t until I added certain words to the model’s context window that it found different text pattern relationships that actually ended up answering the question.</p>
<p><em>Wait, isn’t that the wrong way to compute mean GDP per capita and mean life expectancy? Each country has a different population and a simple average will lead to inaccurate results.</em></p>
<p>I never use the term “weighted average”. I do mention things like “wrong”, different population”, “simple average” and “inaccurate”. Which terms pushed the model to find different next token probabilities?</p>


</section>
</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{isken2026,
  author = {Isken, Mark},
  title = {Another Cautionary Tale for Using {LLMs} for Business
    Analytics},
  date = {2026-03-11},
  url = {https://bitsofanalytics.org//posts/caution_llm_weighted_avg/caution_llm.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-isken2026" class="csl-entry quarto-appendix-citeas">
Isken, Mark. 2026. <span>“Another Cautionary Tale for Using LLMs for
Business Analytics.”</span> March 11, 2026. <a href="https://bitsofanalytics.org//posts/caution_llm_weighted_avg/caution_llm.html">https://bitsofanalytics.org//posts/caution_llm_weighted_avg/caution_llm.html</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>llm</category>
  <category>ai</category>
  <guid>https://bitsofanalytics.org/posts/caution_llm_weighted_avg/caution_llm.html</guid>
  <pubDate>Wed, 11 Mar 2026 04:00:00 GMT</pubDate>
</item>
<item>
  <title>A cautionary tale for using LLMs for business analytics</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/caution_llm/caution_llm.html</link>
  <description><![CDATA[ 




<p>By now it has become passé to show examples of large language models (LLMs) making errors. We know they make errors, often called <em>hallucinations</em>, and there is some evidence that recent models have actually <a href="https://www.nytimes.com/2025/05/05/technology/ai-hallucinations-chatgpt-google.html">gotten worse with respect to such behavior</a>. Yet, there is one particular case I want to highlight here. I have been exploring this error since ChatGPT came on the scene in November of 2022 and it raises a number of issues regarding the use of LLMs for teaching, learning and doing business analytics.</p>
<section id="the-bike-share-analysis-problem" class="level2">
<h2 class="anchored" data-anchor-id="the-bike-share-analysis-problem">The bike share analysis problem</h2>
<p>Bike share systems are common in large cities. Bicycles are available for rental from numerous geographically dispersed stations. Riders can return their bike to any station leading to challenging and interesting bike inventory rebalancing and capacity planning problems. These systems produce a large volume of transaction data that can be analyzed to aid such decision making problems <span class="citation" data-cites="freundMinimizingMultimodularFunctions2022 haider2018inventory schuijbroek2017inventory jianSimulationOptimizationLargescale2016">(Freund, Henderson, and Shmoys 2022; Haider et al. 2018; Schuijbroek, Hampshire, and Van Hoeve 2017; Jian et al. 2016)</span>.</p>
<p>For each rental, we know where the rental originated and where the bike was returned. Timestamps for both of these events make it possible to do detailed analysis of rental volume patterns. System users might be classified as either <em>member</em> or a <em>short-term pass holder</em> and analysis may reveal very different rental patterns for these different user classes.</p>
<p>Kaggle has several samples of bike share data including data from Seattle (<a href="https://www.kaggle.com/datasets/pronto/cycle-share-dataset" class="uri">https://www.kaggle.com/datasets/pronto/cycle-share-dataset</a>) and Washington D.C. (<a href="https://www.kaggle.com/datasets/lakshmi25npathi/bike-sharing-dataset" class="uri">https://www.kaggle.com/datasets/lakshmi25npathi/bike-sharing-dataset</a>). We will be using the Seattle dataset for this example.</p>
<p>The three main data tables are the <code>trip</code>, <code>station</code> and <code>weather</code> tables. The <code>trip</code> table contains one row per bike rental. The field names, data types and example data are shown in Table [table:trip_dict].</p>
<div id="table:trip_dict">
<table class="table">
<caption>Trip table dictionary</caption>
<colgroup>
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
</colgroup>
<tbody>
<tr class="odd">
<td style="text-align: left;"><code>trip_id</code></td>
<td style="text-align: left;">integer</td>
<td style="text-align: left;">unique trip identifier</td>
<td style="text-align: center;">4242</td>
</tr>
<tr class="even">
<td style="text-align: left;"><code>starttime</code></td>
<td style="text-align: left;">string</td>
<td style="text-align: left;">datetime bike rented from source</td>
<td style="text-align: center;">“10/20/2014 17:43”</td>
</tr>
<tr class="odd">
<td style="text-align: left;"><code>stoptime</code></td>
<td style="text-align: left;">string</td>
<td style="text-align: left;">datetime bike returned to destination</td>
<td style="text-align: center;">“10/20/2014 17:56”</td>
</tr>
<tr class="even">
<td style="text-align: left;"><code>bike_id</code></td>
<td style="text-align: left;">string</td>
<td style="text-align: left;">unique bike identifier</td>
<td style="text-align: center;">“SEA00255”</td>
</tr>
<tr class="odd">
<td style="text-align: left;"><code>trip_duration</code></td>
<td style="text-align: left;">float</td>
<td style="text-align: left;">time bike rented in seconds</td>
<td style="text-align: center;">792.294</td>
</tr>
<tr class="even">
<td style="text-align: left;"><code>from_station_name</code></td>
<td style="text-align: left;">string</td>
<td style="text-align: left;">common name of source station</td>
<td style="text-align: center;">“3rd Ave &amp; Broad St”</td>
</tr>
<tr class="odd">
<td style="text-align: left;"><code>to_station_name</code></td>
<td style="text-align: left;">string</td>
<td style="text-align: left;">common name of destination station</td>
<td style="text-align: center;">“Dexter Ave N &amp; Aloha St”</td>
</tr>
<tr class="even">
<td style="text-align: left;"><code>from_station_id</code></td>
<td style="text-align: left;">string</td>
<td style="text-align: left;">source station identifier</td>
<td style="text-align: center;">“BT-01”</td>
</tr>
<tr class="odd">
<td style="text-align: left;"><code>to_station_id</code></td>
<td style="text-align: left;">string</td>
<td style="text-align: left;">destination station identifier</td>
<td style="text-align: center;">“SLU-02”</td>
</tr>
<tr class="even">
<td style="text-align: left;"><code>usertype</code></td>
<td style="text-align: left;">string</td>
<td style="text-align: left;">member or short-term</td>
<td style="text-align: center;">“Member”</td>
</tr>
</tbody>
</table>
</div>
<p>In the Seattle version of the dataset available on Kaggle, there are 286857 records in the trip table spanning a period of 689 days ( 98 weeks). On average, there are a little over 2900 rentals per week.</p>
<p>I have used this dataset in my <a href="https://ba.misken.org/">spreadsheet modeling class</a> for an assignment on exploratory data analysis. Students are presented with a scenario in which they play the role of a consultant tasked with answering several analysis questions and creating a presentation for a new manager of the bike share system. For example:</p>
<ol type="1">
<li>How does the volume of rides vary over time?</li>
<li>Which stations are most popular to rent from? Which stations are popular destinations? Which source and destination pairs are most popular?</li>
<li>How does rental volume vary by time of day and day of week?</li>
<li>What is the distribution of ride duration look like?</li>
</ol>
<p>Each of these questions can be further refined by seeing how the results vary for Members and Short-term Pass Holders. Using the weather data, we can see if and how different weather conditions impact rental volume and ride duration.</p>
<p>I have also used this dataset in my <a href="https://pcda.misken.org/">Practical Computing for Data Analytics</a> course for modules and assignments on EDA and predictive modeling using R and Python. One of the questions I usually pose is to ask for the average number of bikes rented by day of week. Seems simple enough, but there are a few subtleties that make this a little trickier than it first appears. These subtleties become even more important for more complex variants of this question such as asking for the mean and 95th percentile of the number of bikes rented by day of week and hour of day for a specific station. We will elaborate on these issues in subsequent sections. I also use airline data such as the well known <a href="https://cran.r-project.org/web/packages/nycflights13/index.html">nycflights13</a> R package and ask similar questions with respect to the number of flights by day of week for certain airlines.</p>
<p>When ChatGPT was released, one of the first things I tried was to ask it to generate Python code to answer these questions involving average rentals by day of week - spoiler alert: the original results were abysmal. Over the past few years as LLMs have improved and more models have been made available, I have retried these questions on a few different LLMs and think that the results are illuminating both with respect to the advances made as well as the problems that still remain. The results are quite relevant for those of us trying to prepare the next generation of data analysts and data scientists.</p>
</section>
<section id="the-first-attempt" class="level1">
<h1>The first attempt</h1>
<p>In April of 2023, I noticed that DataCamp, a popular online learning hub for data science, had made an AI Assistant available which was powered by the LLMs created by OpenAI. I was able to upload the <strong>trip.csv</strong> file into a workspace and then ask questions directly to the AI Assistant.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>Import the trip.csv file into a pandas dataframe. The starttime and stoptime fields are datetimes.</p>
</div>
</div>
<p>The AI Assistant responded with the following perfectly good Python code.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Response
</div>
</div>
<div class="callout-body-container callout-body">
<div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb1-2">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'trip.csv'</span>, parse_dates<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'stoptime'</span>])</span></code></pre></div>
</div>
</div>
<p>Not wanting to try anything too complex yet, I followed up with a question about popular rental stations.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>List the 8 most popular stations from which to rent a bike.</p>
</div>
</div>
<p>The generated code used a field name, <code>start station name</code>, that is not part of the <code>df</code> dataframe.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Response
</div>
</div>
<div class="callout-body-container callout-body">
<div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1">popular_stations <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start station name'</span>].value_counts().head(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>)</span>
<span id="cb2-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(popular_stations)</span></code></pre></div>
</div>
</div>
<p>Perhaps there is another cycle share dataset out there that does use this column name. Apparently, the AI Assistant’s underlying LLM could not run code and use the results in follow up questions. When I clicked the <strong>Fix Error</strong> button provided by the AI Assistant, I got:</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Response
</div>
</div>
<div class="callout-body-container callout-body">
<p>The error is caused by the key ‘start station name’ not being present in the DataFrame ‘df’. This could be due to a misspelling or the key not existing in the DataFrame. To fix the error, we can check the column names in the DataFrame and use the correct key. Here’s the corrected code:</p>
<div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1">popular_stations <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Start Station'</span>].value_counts().head(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>)</span>
<span id="cb3-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(popular_stations)</span></code></pre></div>
</div>
</div>
<p>So, it was unable to find a potentially correct field name but did help find the root cause of the error. I manually fixed the code. From there I moved on to other relatively simple tasks such as creating a plot based on the <code>popular_stations</code> variable and plotting the number of rentals by date. Again, the LLM returned code that used non-existent field names and made other basic, easy to spot, errors. Even though my confidence level was extremely low, I decided to see what the AI Assistant would do with a more difficult question. For reference, the plot of number of rentals by date shows that the overall average is in the ballpark of 415 rides/day. It varies over time and has a seasonal pattern, but the overall magnitude is on the order of several hundred bike rentals per day.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Prompt">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>Compute the average number of bike rentals by day of week.</p>
</div>
</div>
<p>The AI Assistant returned the following code to do the computation:</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Response
</div>
</div>
<div class="callout-body-container callout-body">
<div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert starttime to datetime object</span></span>
<span id="cb4-2">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.to_datetime(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>])</span>
<span id="cb4-3"></span>
<span id="cb4-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Group by day of week</span></span>
<span id="cb4-5">grouped <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].dt.dayofweek)</span>
<span id="cb4-6"></span>
<span id="cb4-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate number of bike rentals for each day of week</span></span>
<span id="cb4-8">rentals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grouped.size()</span>
<span id="cb4-9"></span>
<span id="cb4-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate average number of bike rentals by day of week</span></span>
<span id="cb4-11">average <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rentals.mean()</span>
<span id="cb4-12"></span>
<span id="cb4-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Print average</span></span>
<span id="cb4-14"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(average)</span>
<span id="cb4-15"><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">40979.57142857143</span></span></code></pre></div>
</div>
</div>
<p>The average was not computed for each day of the week and even a weekly rental volume of near 41000 is not even close to the actual value of 2916 rentals per week.</p>
<p>After making it clear that I wanted the average for each of the days of the week, the AI Assistant generated the following:</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Response
</div>
</div>
<div class="callout-body-container callout-body">
<div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># create a new column for day of week</span></span>
<span id="cb5-2">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].dt.day_name()</span>
<span id="cb5-3"></span>
<span id="cb5-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># group the data by day of week and calculate the average number of rides</span></span>
<span id="cb5-5">avg_rides_by_day <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'tripduration'</span>].count() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>].nunique()</span>
<span id="cb5-6"></span>
<span id="cb5-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(avg_rides_by_day)</span>
<span id="cb5-8"></span>
<span id="cb5-9">day_of_week</span>
<span id="cb5-10">Friday       <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">6185.714286</span></span>
<span id="cb5-11">Monday       <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">6029.142857</span></span>
<span id="cb5-12">Saturday     <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5496.000000</span></span>
<span id="cb5-13">Sunday       <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4729.714286</span></span>
<span id="cb5-14">Thursday     <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">6278.000000</span></span>
<span id="cb5-15">Tuesday      <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">6126.571429</span></span>
<span id="cb5-16">Wednesday    <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">6134.428571</span></span>
<span id="cb5-17">Name: tripduration, dtype: float64</span></code></pre></div>
</div>
</div>
<p>Again, these results are off by an order of magnitude. I shared the results of my experiment with one of my classes and told them they needed to double check any code they received from an LLM and use analytical common sense to check the plausibility of the numeric results. Nevertheless, several students turned in eerily similar code that led to similarly blatantly incorrect results in response to a question I posed about the average number of airline flights by day of week for flights out of a specific airport during a given year. I wrote up this experience as a <a href="https://bitsofanalytics.org/posts/datacamp_ai_1/">blog post</a> which ended with these parting thoughts:</p>
<blockquote class="blockquote">
<p>I certainly wasn’t surprised that the AI Assistant failed miserably on this last task as it’s not a simple query and the correct approach is unlikely to be very prominent in the training data. I was a bit surprised how often the AI Assistant would use non-existent column names or filenames, or try to do datatype conversions on columns that were already of the desired data type. It is certainly capable of creating boilerplate code for simple things which can then be manually patched up (e.g.&nbsp;fixing column names). I’m sure it will improve over time via some sort of reinforcement learning or non-LLM based tweaks to prevent things like nonexistent column name use. For now, I’m sticking with StackOverflow and writing my own code.</p>
</blockquote>
<section id="march-2025" class="level2">
<h2 class="anchored" data-anchor-id="march-2025">March 2025</h2>
<p>Fast forward to 2025 and LLMs have made significant strides. The main proprietary general purpose models included ChatGTP from Open AI, Gemini from Google, and Claude from Anthropic. There are numerous versions of these models as the companies compete for market domination and are frequently releasing new variants of their base models. How would this new generation of LLMs fare with the bike share analysis problems?</p>
<p>I decided to use Anthropic’s Claude 3.5 Sonnet model.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Prompt">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>You have a csv file named trip.csv. Each row is a bike rental from a bike share system. The fields starttime and stoptime are datetime values specifying when the bike is rented and returned. The fields from_station_id and to_station_id are strings indicating the bike rental stations that the bike was rented from and returned to. Using Python with pandas, generate the code to compute the total number of rentals from each station.</p>
</div>
</div>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Claude code response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Claude code response
</div>
</div>
<div class="callout-body-container callout-body">
<div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb6-2"></span>
<span id="cb6-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Read the CSV file</span></span>
<span id="cb6-4">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'trip.csv'</span>)</span>
<span id="cb6-5"></span>
<span id="cb6-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Count rentals from each station</span></span>
<span id="cb6-7">station_rentals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from_station_id'</span>].value_counts()</span>
<span id="cb6-8"></span>
<span id="cb6-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert to a DataFrame for better formatting</span></span>
<span id="cb6-10">rental_summary <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.DataFrame({</span>
<span id="cb6-11">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'station'</span>: station_rentals.index,</span>
<span id="cb6-12">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total_rentals'</span>: station_rentals.values</span>
<span id="cb6-13">})</span>
<span id="cb6-14"></span>
<span id="cb6-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sort by number of rentals in descending order</span></span>
<span id="cb6-16">rental_summary <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rental_summary.sort_values(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total_rentals'</span>, ascending<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb6-17"></span>
<span id="cb6-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Reset index for cleaner display</span></span>
<span id="cb6-19">rental_summary <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rental_summary.reset_index(drop<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb6-20"></span>
<span id="cb6-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Display the results</span></span>
<span id="cb6-22"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Total Rentals by Station:"</span>)</span>
<span id="cb6-23"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(rental_summary.to_string(index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>))</span></code></pre></div>
</div>
</div>
<p>The code ran with no errors and produced expected results.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Prompt">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>Great. Now using Python and pandas, compute the average length of time in hours that bikes are rented.</p>
</div>
</div>
<p>Things get more interesting as the response goes beyond just what was asked. There was one error in that a necessary library was not imported. Here is the code along with a few minor changes.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Claude code response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Claude code response
</div>
</div>
<div class="callout-body-container callout-body">
<div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb7-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## MY FIX: added necessary import</span></span>
<span id="cb7-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> matplotlib.pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb7-4"></span>
<span id="cb7-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Read the CSV file</span></span>
<span id="cb7-6">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'trip.csv'</span>)</span>
<span id="cb7-7"></span>
<span id="cb7-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert datetime strings to datetime objects</span></span>
<span id="cb7-9">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.to_datetime(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>])</span>
<span id="cb7-10">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'stoptime'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.to_datetime(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'stoptime'</span>])</span>
<span id="cb7-11"></span>
<span id="cb7-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate duration in hours for each rental</span></span>
<span id="cb7-13">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_duration'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'stoptime'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>]).dt.total_seconds() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3600</span></span>
<span id="cb7-14"></span>
<span id="cb7-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate average duration</span></span>
<span id="cb7-16">average_duration <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_duration'</span>].mean()</span>
<span id="cb7-17"></span>
<span id="cb7-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate additional statistics</span></span>
<span id="cb7-19">duration_stats <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {</span>
<span id="cb7-20">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_hours'</span>: average_duration,</span>
<span id="cb7-21">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'median_hours'</span>: df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_duration'</span>].median(),</span>
<span id="cb7-22">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'min_hours'</span>: df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_duration'</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>(),</span>
<span id="cb7-23">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'max_hours'</span>: df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_duration'</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>(),</span>
<span id="cb7-24">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'std_hours'</span>: df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_duration'</span>].std()</span>
<span id="cb7-25">}</span>
<span id="cb7-26"></span>
<span id="cb7-27"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Print results</span></span>
<span id="cb7-28"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Rental Duration Statistics (in hours):"</span>)</span>
<span id="cb7-29"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> stat, value <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> duration_stats.items():</span>
<span id="cb7-30">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>stat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>replace(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'_'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">' '</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>title()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>value<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb7-31"></span>
<span id="cb7-32"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a histogram of rental durations for rentals under 24 hours</span></span>
<span id="cb7-33">short_rentals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_duration'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">24</span>]</span>
<span id="cb7-34">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>))</span>
<span id="cb7-35">plt.hist(short_rentals[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_duration'</span>], bins<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">48</span>, edgecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'black'</span>)</span>
<span id="cb7-36">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Distribution of Rental Durations (&lt;= 24 hours)'</span>)</span>
<span id="cb7-37">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Duration (hours)'</span>)</span>
<span id="cb7-38">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Number of Rentals'</span>)</span>
<span id="cb7-39">plt.grid(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)</span>
<span id="cb7-40">plt.show()</span></code></pre></div>
</div>
</div>
<p>The output includes a statistical summary and a histogram of ride duration. I had only asked for the mean but Claude decided to give me more. Ok, now for the real question.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Prompt">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>Great. Now using Python and pandas, compute the average number of bike rentals by day of week.</p>
</div>
</div>
<p>The response by Claude is interesting. It does far better than the early attempts described in the previous section in its overall approach to solving the problem. Yet, it makes a non-trivial syntactical error in which it uses a method that does not exist. It also overlooks the more subtle conceptual issue that lies at the heart of this entire exercise.</p>
<p>Like the previous example, Claude forgets to load a necessary library and then proceeds to read the data into a pandas dataframe named <code>df</code>. Then it begins to build up the pieces to compute the average number of rentals by day of week. First, for the numerator, we get code to count the number of rentals that occurred on each day of the week.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Claude code response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Claude code response
</div>
</div>
<div class="callout-body-container callout-body">
<div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert start_rental to datetime</span></span>
<span id="cb8-2">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.to_datetime(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>])</span>
<span id="cb8-3"></span>
<span id="cb8-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract day of week (0 = Monday, 6 = Sunday)</span></span>
<span id="cb8-5">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].dt.dayofweek</span>
<span id="cb8-6"></span>
<span id="cb8-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create day name column for better readability</span></span>
<span id="cb8-8">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_name'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].dt.day_name()</span>
<span id="cb8-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate average rentals by day of week</span></span>
<span id="cb8-10">daily_rentals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_name'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].count().reset_index()</span>
<span id="cb8-11">daily_rentals.columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total_rentals'</span>]</span></code></pre></div>
</div>
</div>
<p>Displaying the value of <code>daily_rentals</code> gives the following output.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Claude code output">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Claude code output
</div>
</div>
<div class="callout-body-container callout-body">
<table class="table">
<tbody>
<tr class="odd">
<td></td>
<td style="text-align: center;">day_of_week</td>
<td style="text-align: center;">total_rentals</td>
</tr>
<tr class="even">
<td>0</td>
<td style="text-align: center;">Friday</td>
<td style="text-align: center;">43300</td>
</tr>
<tr class="odd">
<td>1</td>
<td style="text-align: center;">Monday</td>
<td style="text-align: center;">42204</td>
</tr>
<tr class="even">
<td>2</td>
<td style="text-align: center;">Saturday</td>
<td style="text-align: center;">38472</td>
</tr>
<tr class="odd">
<td>3</td>
<td style="text-align: center;">Sunday</td>
<td style="text-align: center;">33108</td>
</tr>
<tr class="even">
<td>4</td>
<td style="text-align: center;">Thursday</td>
<td style="text-align: center;">43946</td>
</tr>
<tr class="odd">
<td>5</td>
<td style="text-align: center;">Tuesday</td>
<td style="text-align: center;">42886</td>
</tr>
<tr class="even">
<td>6</td>
<td style="text-align: center;">Wednesday</td>
<td style="text-align: center;">42941</td>
</tr>
</tbody>
</table>
</div>
</div>
<p>Next comes the code to compute how many of each weekday appear in the range of dates represented in the dataframe to serve as the denominator.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Claude code response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Claude code response
</div>
</div>
<div class="callout-body-container callout-body">
<div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate the number of unique dates for each day of week</span></span>
<span id="cb9-2">unique_days <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_name'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].dt.date.nunique().reset_index()</span></code></pre></div>
</div>
</div>
<p>Running this code leads to the following error.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Claude code output">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Claude code output
</div>
</div>
<div class="callout-body-container callout-body">
<pre><code>AttributeError: 'SeriesGroupBy' object has no attribute 'dt'</code></pre>
</div>
</div>
<p>While the <a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.html">dt accessor</a> is usable with a <code>Series</code> object, it’s not usable with a <code>SeriesGroupBy</code> object. For a beginner Python with pandas user, this is not a particularly simple error to diagnose nor fix. Beginning programmers often do not carefully read error messages, or if they do, have trouble understanding how to use the information in the error message to diagnose the problem.</p>
<p>Here’s how we can do this while still using the same approach as Claude.</p>
<div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1">unique_dates_df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.DataFrame(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].dt.date.unique(), columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>])</span>
<span id="cb11-2">unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.to_datetime(unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>])</span>
<span id="cb11-3">unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>].dt.dayofweek</span>
<span id="cb11-4">unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_name'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>].dt.day_name()</span>
<span id="cb11-5"></span>
<span id="cb11-6">unique_days <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> unique_dates_df.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_name'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>].size().reset_index()</span>
<span id="cb11-7">unique_days.columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'number_of_days'</span>]</span>
<span id="cb11-8"></span>
<span id="cb11-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(unique_days)</span></code></pre></div>
<table class="table">
<tbody>
<tr class="odd">
<td></td>
<td style="text-align: center;">day_of_week</td>
<td style="text-align: right;">number_of_days</td>
</tr>
<tr class="even">
<td>0</td>
<td style="text-align: center;">Friday</td>
<td style="text-align: right;">98</td>
</tr>
<tr class="odd">
<td>1</td>
<td style="text-align: center;">Monday</td>
<td style="text-align: right;">99</td>
</tr>
<tr class="even">
<td>2</td>
<td style="text-align: center;">Saturday</td>
<td style="text-align: right;">98</td>
</tr>
<tr class="odd">
<td>3</td>
<td style="text-align: center;">Sunday</td>
<td style="text-align: right;">98</td>
</tr>
<tr class="even">
<td>4</td>
<td style="text-align: center;">Thursday</td>
<td style="text-align: right;">98</td>
</tr>
<tr class="odd">
<td>5</td>
<td style="text-align: center;">Tuesday</td>
<td style="text-align: right;">99</td>
</tr>
<tr class="even">
<td>6</td>
<td style="text-align: center;">Wednesday</td>
<td style="text-align: right;">99</td>
</tr>
</tbody>
</table>
<p>After this fix, the rest of the code works perfectly.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Claude code response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Claude code response
</div>
</div>
<div class="callout-body-container callout-body">
<div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Merge and calculate average</span></span>
<span id="cb12-2">daily_averages <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.merge(daily_rentals, unique_days, on<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>)</span>
<span id="cb12-3">daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total_rentals'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'number_of_days'</span>]</span>
<span id="cb12-4"></span>
<span id="cb12-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sort by day of week (Monday first)</span></span>
<span id="cb12-6">day_order <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Monday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Tuesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Wednesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Thursday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Friday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Saturday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Sunday'</span>]</span>
<span id="cb12-7">daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.Categorical(daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>], categories<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>day_order, ordered<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb12-8">daily_averages <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daily_averages.sort_values(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>)</span>
<span id="cb12-9"></span>
<span id="cb12-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Round averages to 1 decimal place</span></span>
<span id="cb12-11">daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb12-12"></span>
<span id="cb12-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Display results</span></span>
<span id="cb12-14"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Average Daily Rentals by Day of Week:"</span>)</span>
<span id="cb12-15"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(daily_averages[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>]].to_string(index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>))</span>
<span id="cb12-16"></span>
<span id="cb12-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a bar plot</span></span>
<span id="cb12-18">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>))</span>
<span id="cb12-19">plt.bar(daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>], daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>])</span>
<span id="cb12-20">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Average Number of Bike Rentals by Day of Week'</span>)</span>
<span id="cb12-21">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Day of Week'</span>)</span>
<span id="cb12-22">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Average Number of Rentals'</span>)</span>
<span id="cb12-23">plt.xticks(rotation<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">45</span>)</span>
<span id="cb12-24">plt.grid(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)</span>
<span id="cb12-25">plt.tight_layout()</span>
<span id="cb12-26">plt.show()</span></code></pre></div>
</div>
</div>
<div id="fig-output_dow1" class="quarto-figure quarto-figure-center quarto-float anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-output_dow1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://bitsofanalytics.org/posts/caution_llm/images/output_dow1.png" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-output_dow1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Average rentals by DOW
</figcaption>
</figure>
</div>
<p>This approach works fine for computing the mean number of rentals under the assumption that we have at least one rental on every date in the range spanned by the dataset. But, if there is a date having zero rides, that date will not get included in the denominator counts and all of the statistics will be inflated. For overall mean rental volume by day of week, it is extremely unlikely to have zero rental demand for an entire day. However, if we modify the question to ask about mean rental volume by day of week for a particular station, there might very well be days with zero rentals.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Prompt">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>Great. Now using Python and pandas, compute the average number of bike rentals by day of week and hour of day.</p>
</div>
</div>
<p>Claude proceeds to generate code that suffers from similar shortcomings as in the previous example. It makes the same error in using a library method that does not exist. It also fails to correctly compute the denominator, the number of day of week and hour combinations over the analysis date range. It ends with this description of what its code does.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Claude code output">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Claude code output
</div>
</div>
<div class="callout-body-container callout-body">
<p>This code creates a comprehensive analysis of rental patterns by day and hour:</p>
<ul>
<li><p>Creates a heatmap showing:</p>
<ul>
<li>Days of week on the y-axis</li>
<li>Hours of day on the x-axis</li>
<li>Color intensity indicating average number of rentals</li>
<li>Actual values annotated in each cell</li>
</ul></li>
<li><p>Provides summary statistics including:</p>
<ul>
<li>Top 5 peak usage times (day and hour combinations)</li>
<li>Busiest hours averaged across all days</li>
</ul></li>
<li><p>Uses proper averaging by:</p>
<ul>
<li>Counting total rentals for each day-hour combination</li>
<li>Dividing by the number of each unique day in the dataset</li>
<li>Handling day sorting correctly (Monday through Sunday)</li>
</ul></li>
</ul>
<p>The heatmap visualization makes it easy to:</p>
<ul>
<li>Identify rush hour patterns</li>
<li>Compare weekday vs weekend patterns</li>
<li>Spot peak usage times</li>
<li>See the daily cycle of rentals</li>
</ul>
<p>Would you like me to modify the analysis? I could:</p>
<ul>
<li>Add seasonal patterns to see how these patterns change throughout the year</li>
<li>Break down patterns by station</li>
<li>Add weather data correlation if available</li>
<li>Create separate visualizations for weekdays vs weekends</li>
<li>Calculate additional statistics about peak vs off-peak usage</li>
</ul>
</div>
</div>
<p>The summary and suggestion for further analysis make sense. It all sounds very certain and correct.</p>
<p>After fixing the syntax error involving the <code>dt</code> accessor, the code runs and produces a beautiful heatmap with plausible looking numbers in each colored cell. As long as there was at least one rental on each date, these results are correct. Next, we will ask to include an upper percentile along with the mean number of rentals by day of week. The approach that Claude has used so far will not work as we need the number of rentals by date as an intermediate calculation as opposed to just a total count of rentals by day of week.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Prompt">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>Using Python and pandas, can you compute the mean and 95th percentile of the number of bikes rented by day of week?</p>
</div>
</div>
<p>After reading in the data and making sure that the <code>starttime</code> column was a datetime, Clause proceeded with the following code to correctly compute the number of rentals by date.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Claude code response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Claude code response
</div>
</div>
<div class="callout-body-container callout-body">
<div class="sourceCode" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract date and day of week</span></span>
<span id="cb13-2">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].dt.date</span>
<span id="cb13-3">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].dt.day_name()</span>
<span id="cb13-4"></span>
<span id="cb13-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get daily counts</span></span>
<span id="cb13-6">daily_counts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>]).size().reset_index(name<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'count'</span>)</span></code></pre></div>
</div>
</div>
<p>Again, as long as there was at least one rental for each date in the analysis date range, the <code>daily_counts</code> dataframe can be used to compute the mean and percentiles of interest of the number of rentals by day of week. Claude provided that code and produced a very nice summary table. With the preceding as a warm-up, now it was time to put Claude to the test. What would it do when the possibility of zero volume days was almost certain?</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Prompt">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>Using Python and pandas, can you compute the mean and 95th percentile of the number of bikes rented by day of week for station CH-06?</p>
</div>
</div>
<p>Like the example above, Claude first generated a dataframe of the number of rentals by date originating from the station of interest. In Figure Figure&nbsp;2, we can see that there are 653 rows in the dataframe of rentals by date. Recall that our raw data spans 689 days. There were 36 dates in which there were no rentals from this station. We need those “zero dates” in order to correctly compute the statistics of interest using the approach that Claude used in the previous query.</p>
<div id="fig-station_date_counts" class="quarto-figure quarto-figure-center quarto-float anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-station_date_counts-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://bitsofanalytics.org/posts/caution_llm/images/output_station_date_counts.png" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-station_date_counts-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: Rentals by date
</figcaption>
</figure>
</div>
<p>Indeed, Claude generated the same code logic as in the previous example and produced the summary statistics shown in the left panel of Figure Figure&nbsp;3. They look plausible. They are wrong. Not by a lot, but they are wrong - see the right panel for the correct values.</p>
<div id="fig-ch06_comparison" class="quarto-layout-panel">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-ch06_comparison-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/caution_llm/images/output_station_ch06_summary_claude.png" class="img-fluid figure-img"></p>
<figcaption>Summary statistics per Claude code</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/caution_llm/images/output_station_ch06_summary_correct.png" class="img-fluid figure-img"></p>
<figcaption>Summary statistics per correct code</figcaption>
</figure>
</div>
</div>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-ch06_comparison-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3: Station CH-06
</figcaption>
</figure>
</div>
<p>While the differences in the statistics are not too different for station CH-06, this is not the case for other stations. For example, Figure Figure&nbsp;4 shows the statistical summaries for station UD-02. The impact of zero volume days is quite large.</p>
<div id="fig-ud02_comparison" class="quarto-layout-panel">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-ud02_comparison-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/caution_llm/images/output_station_ud02_summary_claude.png" class="img-fluid figure-img"></p>
<figcaption>Summary statistics per Claude code</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/caution_llm/images/output_station_ud02_summary_correct.png" class="img-fluid figure-img"></p>
<figcaption>Summary statistics per correct code</figcaption>
</figure>
</div>
</div>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-ud02_comparison-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;4: Station UD-02
</figcaption>
</figure>
</div>
<p>This raises a related question. Are the zero volume days scattered throughout the range of dates in the dataset or was the station offline for some period of time during this date range? If it was offline for a period, we might want to adjust the date range upon which to base our analysis. You can see the distribution of zero days in Figure Figure&nbsp;5 for station CH-06 and in Figure Figure&nbsp;6 for station UW-11.</p>
<div id="fig-ch06_zero_days" class="quarto-figure quarto-figure-center quarto-float anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-ch06_zero_days-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://bitsofanalytics.org/posts/caution_llm/images/zero_days_CH06.png" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-ch06_zero_days-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;5: Rentals by Date for CH-06
</figcaption>
</figure>
</div>
<p>A look at the same plot for station UW-11 reveals a different story.</p>
<div id="fig-uw11_zero_days" class="quarto-figure quarto-figure-center quarto-float anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-uw11_zero_days-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://bitsofanalytics.org/posts/caution_llm/images/zero_days_UW11.png" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-uw11_zero_days-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;6: Rentals by Date for UW11-06
</figcaption>
</figure>
</div>
<p>Care must be taken to specify exactly how zero volume days should be treated when computing volume statistics. So, in the case of using tools like Claude, is there a prompting problem? In the absence of telling it to do any different, Claude defaulted to using the entire dataset. Just as the onus would be on the analyst to take zero volume days into account when doing the analysis themselves, they are equally responsible for making sure that tools like Claude do the analysis properly. While the act of writing code and working through the analysis forces the analyst to think about all of the steps required, using an LLM might lead to cognitive offloading and subtle data analysis issues overlooked in subsequent prompting and analysis. It is argued in <span class="citation" data-cites="oakleyMemoryParadoxWhy2025">Oakley et al. (2025)</span> that such cognitive offloading “undermines reasoning, impedes learning, and diminishes productivity”.</p>
<p>Does the act of iteratively building up analysis code lead to more thinking and better analysis? If one can bypass this thinking and offload to an LLM, are we getting less nuanced analysis? Anyone who analyzes data for a living knows that data is messy and there are dragons hidden in them. Data analysts are often the best people to find these dragons and either slay them or alert others who can. LLMs may circumvent this important exploratory phase of the data analysis process.</p>
<p>How did I know to ask about zero volume days? I had graphed volume over time for the entire system and individual stations. I knew Python and pandas and SQL very well and knew that an aggregate query counting rentals by date was not going to have rows for those dates with zero volume. And, I realized that this would lead to statistic inflation because I had done so many similar analyses over the years and it is just common analytical sense that if the denominator is artificially small, the ratio is artificially large. Unfortunately, I didn’t have enough common sense to tell Claude about the missing zeros to see what it would do. I would rectify this oversight in July of 2025.</p>
</section>
<section id="july-2025" class="level2">
<h2 class="anchored" data-anchor-id="july-2025">July 2025</h2>
<p>After Claude 4 was released, I wanted to see if it was any better in dealing with this zero days problem. This time I structured the prompt a little differently to make it easy to test the results. I also decided to submit the same prompt to Gemini and ChatGPT.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Prompt">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>You have a pandas dataframe named trips. Each row is a bike rental from a bike share system. The fields starttime and stoptime are datetime values specifying when the bike is rented and returned. The fields from_station_id and to_station_id are strings indicating the bike rental stations that the bike was rented from and returned to. Using Python, write a function that will compute the mean and 95th percentile of the number of bike rentals by day of week for a specific station that is specified by the user. The user will pass in to the function the dataframe containing the rental data as well as a string indicating which station to compute the statistics for. The function should be called bike_summary_dow_station. The function should return a dataframe with three columns. The first column should be a string representing the day of the week. The second column should be called mean and should contain the mean number of bikes rented for that day of the week. The third column should be named p95 and contain the 95th percentile of the number of bikes rented by day of week.</p>
</div>
</div>
<p>Claude 4 responded with</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Claude response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Claude response
</div>
</div>
<div class="callout-body-container callout-body">
<p>This function does the following:</p>
<ol type="1">
<li><strong>Filters the data</strong> to only include trips that start from the specified station using the <code>from_station\id</code> field</li>
<li><strong>Extracts date and day of week</strong> from the <code>starttime</code> column</li>
<li><strong>Groups by date and day of week</strong> to count the number of rentals per day</li>
<li><strong>Calculates statistics</strong> by grouping by day of week and computing the mean and 95th percentile of daily rental counts</li>
<li><strong>Orders the results</strong> so days of the week appear in the standard Monday-Sunday order</li>
<li><strong>Returns a DataFrame</strong> with the requested three columns: <code>day_of_week</code>, <code>mean</code>, and <code>p95</code></li>
</ol>
<p>The function handles the key aspects of your requirements: - Uses the <code>starttime</code> field to determine when bikes were rented - Filters by the <code>from_station_id</code> to focus on rentals from the specified station - Computes both mean and 95th percentile statistics - Returns results organized by day of week in a clean format</p>
<p>To use this function, you would call it like:</p>
<div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1">result <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bike_summary_dow_station(trips, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"station_123"</span>)</span></code></pre></div>
</div>
</div>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Claude code response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Claude code response
</div>
</div>
<div class="callout-body-container callout-body">
<div class="sourceCode" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb15-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb15-3"></span>
<span id="cb15-4"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> bike_summary_dow_station(trips, station_id):</span>
<span id="cb15-5">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb15-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Compute mean and 95th percentile of bike rentals by day of week for a specific station.</span></span>
<span id="cb15-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    </span></span>
<span id="cb15-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Parameters:</span></span>
<span id="cb15-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    trips (pd.DataFrame): DataFrame with bike rental data containing columns:</span></span>
<span id="cb15-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                         - starttime: datetime of bike rental</span></span>
<span id="cb15-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                         - from_station_id: string ID of rental station</span></span>
<span id="cb15-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    station_id (str): Station ID to compute statistics for</span></span>
<span id="cb15-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    </span></span>
<span id="cb15-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Returns:</span></span>
<span id="cb15-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    pd.DataFrame: DataFrame with columns:</span></span>
<span id="cb15-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                 - day_of_week: string representing day of week</span></span>
<span id="cb15-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                 - mean: mean number of bikes rented for that day</span></span>
<span id="cb15-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                 - p95: 95th percentile of bikes rented for that day</span></span>
<span id="cb15-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    """</span></span>
<span id="cb15-20">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Filter trips for the specified station</span></span>
<span id="cb15-21">    station_trips <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> trips[trips[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from_station_id'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> station_id].copy()</span>
<span id="cb15-22">    </span>
<span id="cb15-23">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract date and day of week from starttime</span></span>
<span id="cb15-24">    station_trips[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> station_trips[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].dt.date</span>
<span id="cb15-25">    station_trips[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> station_trips[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].dt.day_name()</span>
<span id="cb15-26">    </span>
<span id="cb15-27">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Count rentals per day</span></span>
<span id="cb15-28">    daily_counts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> station_trips.groupby([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>]).size().reset_index(name<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_count'</span>)</span>
<span id="cb15-29">    </span>
<span id="cb15-30">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate mean and 95th percentile by day of week</span></span>
<span id="cb15-31">    summary <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daily_counts.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_count'</span>].agg([</span>
<span id="cb15-32">        (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>),</span>
<span id="cb15-33">        (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'p95'</span>, <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: np.percentile(x, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">95</span>))</span>
<span id="cb15-34">    ]).reset_index()</span>
<span id="cb15-35">    </span>
<span id="cb15-36">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Ensure proper day of week ordering (Monday first)</span></span>
<span id="cb15-37">    day_order <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Monday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Tuesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Wednesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Thursday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Friday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Saturday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Sunday'</span>]</span>
<span id="cb15-38">    summary[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.Categorical(summary[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>], categories<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>day_order, ordered<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb15-39">    summary <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> summary.sort_values(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>).reset_index(drop<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb15-40">    </span>
<span id="cb15-41">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> summary</span></code></pre></div>
</div>
</div>
<p>Again, the zero days were ignored and this code will lead to inflated statistics whenever there are zero days for the specified station. This time I followed up with a question.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Prompt">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>Thank you. I think you forgot to consider the possibility of there being zero rentals on some days from the specified station. Please modify the code to account for this possibility.</p>
</div>
</div>
<p>Claude responded:</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Claude response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Claude response
</div>
</div>
<div class="callout-body-container callout-body">
<p>You’re absolutely right! If there are days with zero rentals for a station, those days won’t appear in the grouped data, which would skew the statistics. Let me modify the function to account for this. Now the function properly handles days with zero rentals by:</p>
<ol type="1">
<li><strong>Creating a complete date range</strong> from the minimum to maximum date in the entire dataset</li>
<li><strong>Generating all date/day-of-week combinations</strong> so we have every single day represented</li>
<li><strong>Merging the actual rental counts</strong> with this complete date framework using a left join</li>
<li><strong>Filling missing rental counts with 0</strong> for days when the station had no rentals</li>
</ol>
<p>This ensures that days with zero rentals are properly included in the mean and percentile calculations, giving you more accurate statistics that reflect the true distribution of daily rental activity for each station.</p>
</div>
</div>
<p>Claude included this code:</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Claude code response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Claude code response
</div>
</div>
<div class="callout-body-container callout-body">
<div class="sourceCode" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb16-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb16-3"></span>
<span id="cb16-4"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> bike_summary_dow_station(trips, station_id):</span>
<span id="cb16-5">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb16-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Compute mean and 95th percentile of bike rentals by day of week for a specific station.</span></span>
<span id="cb16-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    </span></span>
<span id="cb16-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Parameters:</span></span>
<span id="cb16-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    trips (pd.DataFrame): DataFrame with bike rental data containing columns:</span></span>
<span id="cb16-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                         - starttime: datetime of bike rental</span></span>
<span id="cb16-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                         - from_station_id: string ID of rental station</span></span>
<span id="cb16-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    station_id (str): Station ID to compute statistics for</span></span>
<span id="cb16-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    </span></span>
<span id="cb16-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Returns:</span></span>
<span id="cb16-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    pd.DataFrame: DataFrame with columns:</span></span>
<span id="cb16-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                 - day_of_week: string representing day of week</span></span>
<span id="cb16-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                 - mean: mean number of bikes rented for that day</span></span>
<span id="cb16-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                 - p95: 95th percentile of bikes rented for that day</span></span>
<span id="cb16-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    """</span></span>
<span id="cb16-20">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Filter trips for the specified station</span></span>
<span id="cb16-21">    station_trips <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> trips[trips[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from_station_id'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> station_id].copy()</span>
<span id="cb16-22">    </span>
<span id="cb16-23">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract date and day of week from starttime</span></span>
<span id="cb16-24">    station_trips[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> station_trips[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].dt.date</span>
<span id="cb16-25">    station_trips[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> station_trips[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].dt.day_name()</span>
<span id="cb16-26">    </span>
<span id="cb16-27">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get the date range from the entire dataset</span></span>
<span id="cb16-28">    all_dates <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.date_range(start<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>trips[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>().date(), </span>
<span id="cb16-29">                             end<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>trips[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>().date(), </span>
<span id="cb16-30">                             freq<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'D'</span>)</span>
<span id="cb16-31">    </span>
<span id="cb16-32">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a complete date/day_of_week combination</span></span>
<span id="cb16-33">    complete_dates <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.DataFrame({</span>
<span id="cb16-34">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>: all_dates.date,</span>
<span id="cb16-35">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>: all_dates.day_name()</span>
<span id="cb16-36">    })</span>
<span id="cb16-37">    </span>
<span id="cb16-38">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Count rentals per day for the station</span></span>
<span id="cb16-39">    daily_counts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> station_trips.groupby([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>]).size().reset_index(name<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_count'</span>)</span>
<span id="cb16-40">    </span>
<span id="cb16-41">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Merge with complete dates to include days with zero rentals</span></span>
<span id="cb16-42">    complete_daily_counts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> complete_dates.merge(daily_counts, on<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>], how<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'left'</span>)</span>
<span id="cb16-43">    complete_daily_counts[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_count'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> complete_daily_counts[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_count'</span>].fillna(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb16-44">    </span>
<span id="cb16-45">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate mean and 95th percentile by day of week</span></span>
<span id="cb16-46">    summary <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> complete_daily_counts.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_count'</span>].agg([</span>
<span id="cb16-47">        (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>),</span>
<span id="cb16-48">        (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'p95'</span>, <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: np.percentile(x, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">95</span>))</span>
<span id="cb16-49">    ]).reset_index()</span>
<span id="cb16-50">    </span>
<span id="cb16-51">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Ensure proper day of week ordering (Monday first)</span></span>
<span id="cb16-52">    day_order <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Monday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Tuesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Wednesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Thursday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Friday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Saturday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Sunday'</span>]</span>
<span id="cb16-53">    summary[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.Categorical(summary[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>], categories<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>day_order, ordered<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb16-54">    summary <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> summary.sort_values(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>).reset_index(drop<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb16-55">    </span>
<span id="cb16-56">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> summary</span></code></pre></div>
</div>
</div>
<p>I followed up with:</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Prompt">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>Why do you think you overlooked the possibility of zero rental days in your first response?</p>
</div>
</div>
<p>Claude said:</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Claude response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Claude response
</div>
</div>
<div class="callout-body-container callout-body">
<p>Looking back, I think I overlooked it because I was focused on the mechanics of the aggregation and grouping operations without fully thinking through the data characteristics. When working with real-world data like bike rentals, it’s quite common to have days with zero activity - especially for individual stations that might be in less busy areas, during bad weather, or on certain days of the week.</p>
<p>My initial approach was more of a “happy path” solution where I assumed the grouped data would naturally represent all the relevant days. I was thinking about the pandas operations (groupby, agg) rather than considering the business logic of what the data actually represents.</p>
<p>It’s a good reminder that when working with count-based or frequency data, you often need to explicitly account for the “absence” of events (zero counts) since they don’t naturally appear in grouped datasets. This is particularly important for time-series analysis where missing periods can significantly skew statistical measures.</p>
<p>Thanks for catching that - it’s exactly the kind of edge case that can lead to misleading results in real analytics work.</p>
</div>
</div>
<p>From Claude’s response, it definitely “understands” the issue of zero rental days. What I find striking is how its response is structured as if a human analyst had written it. Claude is an LLM yet says things like “I was focused”, “without fully thinking through the data characteristics”, and “I assumed” and “I was thinking”. The user is nudged to think of Claude as a human-like entity using the same analytical processes a human might use. Yet, deep down, at the architecture level, lies an enormous neural network used to do very sophisticated and probabilistic pattern matching. Wouldn’t the human be better served if Claude admitted that it was the additional mention of the possibility of zero rental days in the follow up prompt that led to inputs that produced the correct outputs? Of course, Claude can’t “admit” anything. It’s not a reflective thinking being. The more that analysts remember this, the better chance they use it cautiously and reflectively themselves.</p>
<p>Then I submitted the same prompts to ChatGPT and Google Gemini (2.5 Flash). Both made the same error as Claude with almost identical code. Once prompted about the possibility of zero rental days, they were able to explain the implications in language very similar to Claude’s and to generate correct code. The three LLMs produced extremely similar code. Their answers to my question about why they made the error were very similar to that of Claude’s.</p>
</section>
<section id="so-how-should-students-use-llms" class="level2">
<h2 class="anchored" data-anchor-id="so-how-should-students-use-llms">So, how should students use LLMs?</h2>
<p>After one of these coding exchanges with Claude, I expressed my concern about how students are likely to use these tools and asked for Claude’s reaction and suggestions.</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Prompt">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>Thank you. I have one final question for now. What would you tell a student who is just beginning to learn Python to do analytics work regarding the use of tools like Claude?</p>
</div>
</div>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Claude response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Claude response
</div>
</div>
<div class="callout-body-container callout-body">
<p>That’s such an important question, and I think about this a lot given how AI tools are changing the learning landscape. Here’s what I’d tell a student just starting their Python analytics journey:</p>
<section id="embrace-ai-as-a-learning-accelerator-not-a-replacement" class="level3">
<h3 class="anchored" data-anchor-id="embrace-ai-as-a-learning-accelerator-not-a-replacement">Embrace AI as a Learning Accelerator, Not a Replacement</h3>
<p>Think of me like a really good tutor who’s available 24/7. I can help you understand concepts, debug code, and see examples - but the learning still needs to happen in your brain. The goal isn’t to have me write all your code, but to help you become a better programmer faster.</p>
</section>
<section id="use-ai-to-break-through-stuck-points" class="level3">
<h3 class="anchored" data-anchor-id="use-ai-to-break-through-stuck-points">Use AI to Break Through Stuck Points</h3>
<p>When you’re learning, getting stuck is incredibly frustrating and can kill motivation. Use AI tools to: - Understand error messages that seem cryptic - Get unstuck when you know what you want to do but can’t figure out the syntax - See multiple approaches to the same problem - Get explanations of code you’re trying to understand</p>
<p>This keeps you moving forward instead of giving up.</p>
</section>
<section id="always-try-first-then-ask" class="level3">
<h3 class="anchored" data-anchor-id="always-try-first-then-ask">Always Try First, Then Ask</h3>
<p>Before asking me to solve something, attempt it yourself. This might mean: - Writing pseudocode for what you think should happen - Trying some code even if you’re not sure it’s right - Looking up basic documentation first</p>
<p>Then when you ask me for help, you’ll understand my explanation much better because you’ve already engaged with the problem.</p>
</section>
<section id="ask-for-explanations-not-just-solutions" class="level3">
<h3 class="anchored" data-anchor-id="ask-for-explanations-not-just-solutions">Ask for Explanations, Not Just Solutions</h3>
<p>Instead of “Write code to analyze this data,” try: - “I’m trying to group this data by category and calculate averages. Here’s what I tried… can you explain where I went wrong?” - “Can you show me two different ways to solve this problem and explain the trade-offs?” - “I don’t understand why this pandas function works the way it does. Can you break it down?”</p>
</section>
<section id="build-your-foundation-systematically" class="level3">
<h3 class="anchored" data-anchor-id="build-your-foundation-systematically">Build Your Foundation Systematically</h3>
<p>AI tools are fantastic for specific problems, but they can’t replace systematic learning. Make sure you’re still: - Working through structured courses or books - Practicing basic concepts until they’re automatic - Understanding core data structures and control flow - Learning to read documentation</p>
</section>
<section id="the-real-skill-is-knowing-what-to-ask" class="level3">
<h3 class="anchored" data-anchor-id="the-real-skill-is-knowing-what-to-ask">The Real Skill is Knowing What to Ask</h3>
<p>As you progress, you’ll realize that the most valuable skill is knowing how to break down complex problems and ask the right questions. AI tools actually help you develop this because you quickly learn that vague questions get mediocre answers.</p>
</section>
<section id="dont-skip-the-struggle-but-dont-suffer-needlessly" class="level3">
<h3 class="anchored" data-anchor-id="dont-skip-the-struggle-but-dont-suffer-needlessly">Don’t Skip the Struggle (But Don’t Suffer Needlessly)</h3>
<p>Some struggle is essential for learning - it’s how your brain builds neural pathways. But there’s a difference between productive struggle (working through logical steps) and unproductive suffering (being stuck on syntax for hours). Use AI to minimize the latter while preserving the former.</p>
<p>The students I see succeeding with AI tools are those who use them to learn faster and explore more, not to avoid learning altogether. You’re entering an exciting time to learn programming - you have access to incredibly powerful learning tools that can accelerate your growth if you use them thoughtfully.</p>
<p>What specific aspects of learning Python analytics are you most excited or nervous about?</p>
</section>
</div>
</div>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Prompt">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Prompt
</div>
</div>
<div class="callout-body-container callout-body">
<p>I’m particularly worried about my students using Claude in suboptimal ways such as simply asking it to write the code for a homework problem. I worry that if this becomes a habit, they will struggle mightily when they are trying to solve complex proprietary problems in industry. How will they find subtle bugs? How will they explain their code to their coworkers or clients? How will they develop the skills to “ask the right questions” if they take the easy route in their university classes and just ask Claude for the answers?</p>
</div>
</div>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Claude response">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Claude response
</div>
</div>
<div class="callout-body-container callout-body">
<p>Your concerns are absolutely valid and reflect a critical challenge in education today. You’re right to worry - there’s a real risk that students who rely on AI as a “magic answer box” will hit a wall when they encounter complex, novel problems in industry.</p>
<section id="the-real-dangers-youve-identified" class="level2">
<h2 class="anchored" data-anchor-id="the-real-dangers-youve-identified">The Real Dangers You’ve Identified</h2>
<p><strong>The Debugging Crisis</strong>: Students who haven’t learned to trace through code line-by-line, understand variable states, and systematically isolate problems will be helpless when Claude gives them code that’s 90% right but has a subtle logical error. Industry problems often involve edge cases and domain-specific constraints that generic AI solutions miss.</p>
<p><strong>The Communication Gap</strong>: If students can’t explain their approach because they didn’t develop it themselves, they’ll struggle in code reviews, client presentations, and collaborative debugging sessions. “Claude wrote it” isn’t an acceptable explanation to a stakeholder.</p>
<p><strong>The Problem Decomposition Deficit</strong>: The most valuable skill in analytics isn’t coding syntax - it’s breaking down messy, ambiguous business problems into tractable technical steps. Students who skip this mental muscle-building will struggle when faced with problems that don’t fit standard templates.</p>
</section>
<section id="pedagogical-strategies-to-consider" class="level2">
<h2 class="anchored" data-anchor-id="pedagogical-strategies-to-consider">Pedagogical Strategies to Consider</h2>
<p><strong>Require Process Documentation</strong>: Ask students to submit not just code, but their problem-solving process. “What did you try first? What didn’t work? How did you modify your approach?” This makes copy-pasting much harder.</p>
<p><strong>Focus on Code Reading and Debugging</strong>: Give students broken code to fix, or code to extend. These skills transfer directly to industry and are harder to outsource to AI.</p>
<p><strong>Emphasize Data Understanding</strong>: Require students to explore and describe their datasets before any analysis. AI can’t replace domain knowledge and data intuition.</p>
<p><strong>Two-Stage Assignments</strong>: First, require a written analysis plan without code. Then allow coding with AI assistance. This separates problem-solving from implementation.</p>
</section>
<section id="the-bigger-picture-challenge" class="level2">
<h2 class="anchored" data-anchor-id="the-bigger-picture-challenge">The Bigger Picture Challenge</h2>
<p>You’re grappling with a fundamental shift: in industry, the most valuable analysts aren’t necessarily the fastest coders, but those who can:</p>
<ul>
<li>Ask the right business questions</li>
<li>Spot when results don’t make sense</li>
<li>Communicate insights to non-technical stakeholders</li>
<li>Design robust, maintainable analyses</li>
</ul>
<p>Students who use AI as a shortcut miss developing these meta-skills.</p>
</section>
<section id="a-possible-middle-path" class="level2">
<h2 class="anchored" data-anchor-id="a-possible-middle-path">A Possible Middle Path</h2>
<p>Perhaps the goal isn’t to prevent AI use, but to structure it productively. For example: - “Use Claude to help debug, but first explain what you think the bug is” - “Get AI help with syntax, but write the analysis plan yourself” - “Use AI to generate multiple approaches, then justify which you chose and why”</p>
<p>Your instincts are spot-on. Students who develop deep problem-solving skills alongside AI fluency will be incredibly powerful. Those who become dependent on AI for thinking will struggle when they encounter the messy, ambiguous problems that define real analytics work.</p>
</section>
</div>
</div>
<p>There’s a lot of sensible advice in there. I’ll certainly share this with my students. Then it’s up to them.</p>
</section>
<section id="thoughts" class="level2">
<h2 class="anchored" data-anchor-id="thoughts">Thoughts</h2>
<p>Why does this one particular example bothers me so much and compel me to periodically revisit it? It’s a very common business analysis problem and not a very difficult one, yet after two years of continued LLM development and untold billions spent, the top LLMs still get it wrong. As I write this on August 8, I just tried it again with GPT-5. Still wrong. These LLMs are ever confident and I can easily see a novice analyst uncritically accepting their answer to this problem. But what really bothers me is the nagging suspicion that many students are going to use these tools to blast through homework assignments with little work to actually learn much of anything. Resisting the call of the homework machine will prove much too tempting for many. Use will be easy to rationalize by appealing to the argument that use of LLMs in industry will be required.</p>
<p>It will be extremely difficult for educators to convince students that it is worth their time to forgo LLM use until they have developed significant competency and familiarity with the tools and concepts being learned. I will definitely be sharing Claude’s advice to students from the previous section. Will many follow the advice? Who knows. In the past, I’ve made similar pleas to my students. For example:</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Message to my students">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Message to my students
</div>
</div>
<div class="callout-body-container callout-body">
<p>Tools like ChatGPT can be extremely powerful in the hands of a skilled analyst or developer. However, used by a novice, can be a recipe for disaster. Not only will you not learn R and Python, you may submit work in a professional environment that is of low quality, uses outdated methods or libraries, is buggy, is blatantly wrong, or even leads to security concerns. And of course, as a novice, you won’t even know it or know what to fix or how to fix it. That sounds fun.</p>
<p>Furthermore, while you may use ChatGPT to get through your courses with reasonable grades, employers don’t care about your grades from some mid-tier university B-school. Everyone knows that grades are inflated and you’ll be competing against applicants from better schools and more technical degree programs. You will be forced to show you actually know the material through difficult technical interviews. And of course, faculty will have no interest in serving as a professional reference for a student who ChatGPT’d their way through the course.</p>
<p>ChatGPT has its place, but code novices using it as a substitute for learning to code isn’t a great use.</p>
</div>
</div>
<p>and this:</p>
<div class="callout callout-style-default callout-tip no-icon callout-titled" title="Message to my students">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Message to my students
</div>
</div>
<div class="callout-body-container callout-body">
<p>Unintended consequences of Gen AI in tech hiring</p>
<p>The Pragmatic Engineer is probably the top software engineering newsletter on Substack. It is extremely good. Recently he released a newsletter discussing the impact of GenAI on tech hiring. You can read a good chunk of it for free from this link - https://newsletter.pragmaticengineer.com/p/how-genai-changes-tech-hiring.</p>
<p>A big takeaway is that use of GenAI to “cheat” during job interviews is rampant. It’s causing such a problem that many companies have to totally rethink how they do tech interviewing and hiring. People being people, one of the easiest things for hiring firms to do is put way less weight on the types of coding questions that can be gamed during interviews and fall back on the old “pedigree” criteria - top university for degrees and high quality professional experience.</p>
<p>So, if you come to a place like Oakland U for your degree and then proceed to ChatGPT your way through your classes, not only will you learn a lot less than you could have, you’ll also have a degree from a place that people are potentially biased against - that UM or MSU grad gets the nod because of where they went to school. Awesome, eh? So, now you have to work doubly hard to convince people you know your stuff. That’s much easier if you didn’t ChatGPT your way through your courses.</p>
<p>On a similar note, I’d like to share a little professional advice. I’ve gotten wayyyyyyy too many emails this year starting with the dreaded “I hope this email finds you well”. That opening reached meme status quite a while ago. ChatGPT was/is notorious for starting its generated emails with that greeting. People everywhere are cringing every time they get such an email. Even if your email wasn’t written with ChatGPT, the recipient has a reason to believe it was or that you haven’t figured out that that greeting has worn out its welcome. I encourage you to find a more authentic way to begin your emails. Again, unintended consequences of GenAI in the tech world.</p>
</div>
</div>
<p>Brian Klaas, in his essay <a href="https://substack.com/home/post/p-165929984">The Death of the Student Essay—and the Future of Cognition</a> makes the case far better than I could ever hope to that uncritical use of LLMs will likely fundamentally affect humanity’s ability to think clearly. He pleads with his students.</p>
<blockquote class="blockquote">
<p>Aware of my inability to detect AI use with certainty, I implored the students: please don’t use AI. It’s terrible for you. It’s terrible for me, a dystopian experience of spending weeks giving detailed constructive feedback to a machine. I strongly suspect—but can’t prove—that many didn’t listen.</p>
</blockquote>
<p>In another <a href="https://www.jonbergmann.com/blog/OpenLetterToMyStudents">letter to his students</a>, the author likens using LLMs for coursework and expecting to learn to using a forklift to go into the weight room and expecting to get stronger. Research is starting to suggest that cognitive offloading to LLMs is eroding students’ critical thinking abilities (<span class="citation" data-cites="gerlichAIToolsSociety2025a">Gerlich (2025)</span>, <span class="citation" data-cites="kosmynaYourBrainChatGPT2025">Kosmyna et al. (2025)</span>, <span class="citation" data-cites="oakleyMemoryParadoxWhy2025">Oakley et al. (2025)</span>). No surprise there. Ask young adults to navigate a city or suburb without Google Maps and prepare yourself for a wide-eyed plea to “be serious”.</p>
<section id="more-concerns" class="level3">
<h3 class="anchored" data-anchor-id="more-concerns">More concerns</h3>
<p>As companies begin to reduce entry level analyst hiring thinking that AI is a substitute, who exactly will be the one to catch subtle errors such as these? How will senior analysts be developed from a shrinking pipeline of entry level people? When decreased entry level hiring leads to drops in enrollment in the degree programs that fed these job pipelines, what will longer term talent prospects look like? Senior analysts and devs will move into management and eventually retire. Who will take their place?</p>
<p>Much like our modern society riding the cresting wave of cheap energy due to a one time exploitation of fossil fuels, companies are cutting labor costs by exploiting the energy jolt provided by exploiting the creative work of countless people that has accumulated over many years. The AI giants are already desperate for new “high quality” data to train their models. The once rich reservoirs of the internet are now being filled with streams of AI generated content of questionable quality and utility. The <a href="https://en.wikipedia.org/wiki/Enshittification">enshittification</a> of the internet is real and AI slop is everywhere.</p>
<p>Relying on an LLM can make your business processes less resilient. What happens when the model goes down, or degrades, or goes out of business, or a new version behaves significantly differently? Yesterday, GPT-5 was released and Reddit is full of people complaining about it and especially its “router” that automatically determines which GPT submodel to use. Redditors mention carefully crafted workflows that are now broken due to the change in the underlying LLM powering parts of them. As agents or automated pipelines call LLMs and send the output to dashboards, will subtle errors such as the zero days problem sneak through? Using LLMs is suboptimizing for efficiency and comes at the expense of resilience. We need more resilience and less optimization. Optimization to the point of fragility is asking for trouble. Agentic AI will just magnify this lack of resilience.</p>
<p>I enjoy reading StackOverflow posts as well as RingTFM. The back and forth in a good SO post was more valuable to me than just getting “the answer” from some LLM. Sure, sometimes people were a bit snarky, but it’s a pretty information dense place full of actual humans who donate their time to help others solve problems. Learning about why one method is better than another or just seeing how other humans are struggling with the same problem is something I value. It made me feel part of something larger. Perusing software documentation was a bit like a browsing the stacks at a library, opening my eyes to related topics and and approaches that might prove useful. It helped me grasp the big picture of how a certain software tool fit into the problem solving landscape. And, if I revisited an SO post or a documentation page, it might have some minor editions or changes, but it was essentially the same information. It’s not stochastic.</p>
<p>Sadly, SO traffic has decreased considerably and its future is questionable. As fewer humans contribute to SO, LLMs lose one of their key training assets and we humans lose the power and potential of a huge networked knowledge community. I don’t want one opinion on how to do something and I certainly don’t want to blindly just accept an LLMs prescribed solution. There’s rarely one right way and the process of choosing among several approaches involves critical thinking that I feel is integral to becoming a better analyst and software developer.</p>
<p>Ultimately, LLMs profoundly change the problem solving process by obliterating large parts of it. It is as if society has decided that it’s perfectly fine to ask one so-called expert, who is known to make mistakes and will give you different answers every time you ask it a question, for the answers to problems we used to grapple with ourselves. Sure, we used search engines, computers, You Tube and talked to other people, but we did’t ask some computerized oracle for the answer. There’s no process nor learning journey when one simply gets the answer from an LLM. It’s efficient, for sure. Unless, of course, it’s wrong and fixing the mess turns out to be a whole lot of work.</p>
<p>I still haven’t learned to play the ukulele that was gifted to me. The simple reason is that I don’t practice enough. When I do practice and make some progress, I feel a sense of accomplishment and of hope. If I just do the work, I can be better at something I care about and I’ll be honoring the gift. If I don’t do the work, I won’t get any better. If I’m content to just listen to others play the uke, then that will be ok. But, will it, really?</p>



</section>
</section>
</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-bibliography"><h2 class="anchored quarto-appendix-heading">References</h2><div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0">
<div id="ref-freundMinimizingMultimodularFunctions2022" class="csl-entry">
Freund, Daniel, Shane G. Henderson, and David B. Shmoys. 2022. <span>“Minimizing <span>Multimodular Functions</span> and <span>Allocating Capacity</span> in <span>Bike-Sharing Systems</span>.”</span> <em>Operations Research</em> 70 (5): 2715–31. <a href="https://doi.org/10.1287/opre.2022.2320">https://doi.org/10.1287/opre.2022.2320</a>.
</div>
<div id="ref-gerlichAIToolsSociety2025a" class="csl-entry">
Gerlich, Michael. 2025. <span>“<span>AI Tools</span> in <span>Society</span>: <span>Impacts</span> on <span>Cognitive Offloading</span> and the <span>Future</span> of <span>Critical Thinking</span>.”</span> {{SSRN Scholarly Paper}}. Rochester, NY: Social Science Research Network. <a href="https://doi.org/10.2139/ssrn.5082524">https://doi.org/10.2139/ssrn.5082524</a>.
</div>
<div id="ref-haider2018inventory" class="csl-entry">
Haider, Zulqarnain, Alexander Nikolaev, Jee Eun Kang, and Changhyun Kwon. 2018. <span>“Inventory Rebalancing Through Pricing in Public Bike Sharing Systems.”</span> <em>European Journal of Operational Research</em> 270 (1): 103–17.
</div>
<div id="ref-jianSimulationOptimizationLargescale2016" class="csl-entry">
Jian, Nanjing, Daniel Freund, Holly M. Wiberg, and Shane G. Henderson. 2016. <span>“Simulation Optimization for a Large-Scale Bike-Sharing System.”</span> In <em>2016 Winter Simulation Conference (<span>WSC</span>)</em>, 602–13. IEEE.
</div>
<div id="ref-kosmynaYourBrainChatGPT2025" class="csl-entry">
Kosmyna, Nataliya, Eugene Hauptmann, Ye Tong Yuan, Jessica Situ, Xian-Hao Liao, Ashly Vivian Beresnitzky, Iris Braunstein, and Pattie Maes. 2025. <span>“Your <span>Brain</span> on <span>ChatGPT</span>: <span>Accumulation</span> of <span>Cognitive Debt</span> When <span>Using</span> an <span>AI Assistant</span> for <span>Essay Writing Task</span>.”</span> arXiv. <a href="https://doi.org/10.48550/arXiv.2506.08872">https://doi.org/10.48550/arXiv.2506.08872</a>.
</div>
<div id="ref-oakleyMemoryParadoxWhy2025" class="csl-entry">
Oakley, Barbara, Michael Johnston, Kenzen Chen, Eulho Jung, and Terrence Sejnowski. 2025. <span>“The <span>Memory Paradox</span>: <span>Why Our Brains Need Knowledge</span> in an <span>Age</span> of <span>AI</span>.”</span> <em>Available at SSRN 5250447</em>.
</div>
<div id="ref-schuijbroek2017inventory" class="csl-entry">
Schuijbroek, Jasper, Robert C Hampshire, and W-J Van Hoeve. 2017. <span>“Inventory Rebalancing and Vehicle Routing in Bike Sharing Systems.”</span> <em>European Journal of Operational Research</em> 257 (3): 992–1004.
</div>
</div></section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{isken2025,
  author = {Isken, Mark},
  title = {A Cautionary Tale for Using {LLMs} for Business Analytics},
  date = {2025-08-05},
  url = {https://bitsofanalytics.org//posts/caution_llm/caution_llm.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-isken2025" class="csl-entry quarto-appendix-citeas">
Isken, Mark. 2025. <span>“A Cautionary Tale for Using LLMs for Business
Analytics.”</span> August 5, 2025. <a href="https://bitsofanalytics.org//posts/caution_llm/caution_llm.html">https://bitsofanalytics.org//posts/caution_llm/caution_llm.html</a>.
</div></div></section></div> ]]></description>
  <category>python</category>
  <category>pandas</category>
  <category>bikeshare</category>
  <category>llm</category>
  <category>ai</category>
  <guid>https://bitsofanalytics.org/posts/caution_llm/caution_llm.html</guid>
  <pubDate>Tue, 05 Aug 2025 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Analyzing NestWatch data with R</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/nestwatch-data-analysis/analyzing_nestwatch_data_r.html</link>
  <description><![CDATA[ 




<section id="about-nestwatch" class="level2">
<h2 class="anchored" data-anchor-id="about-nestwatch">About NestWatch</h2>
<p>One of the programs run by the <a href="https://www.birds.cornell.edu/home/">Cornell Lab of Ornithology</a> is known as <a href="https://nestwatch.org/">NestWatch</a>.</p>
<blockquote class="blockquote">
<p>NestWatch is a nationwide nest-monitoring program designed to track status and trends in the reproductive biology of birds.</p>
</blockquote>
<p>Our local birding group has participated in this program for a number of years. One member has built numerous <a href="https://nestwatch.org/learn/all-about-birdhouses/birds/eastern-bluebird/">bluebird boxes</a> and our group has installed and monitored these nest boxes.</p>
<p>If you’re reading this, it’s pretty likely you are familiar with the NestWatch program and I’m not going to expain it in detail here. Some key resources for learning more, include:</p>
<ul>
<li><a href="https://nestwatch.org/learn/how-to-nestwatch/">How to NestWatch</a></li>
<li><a href="https://nestwatch.org/learn/how-to-nestwatch/understanding-nestwatch-data/">Understanding NestWatch Data</a></li>
<li><a href="https://nestwatch.org/learn/how-to-nestwatch/nest-monitoring-protocol/">Nest Monitoring Protocol</a></li>
<li><a href="https://nestwatch.org/wp-content/uploads/2020/01/NestWatch_manual_20191106.pdf">Nest Monitoring Manual</a></li>
</ul>
</section>
<section id="downloading-nestwatch-data" class="level2">
<h2 class="anchored" data-anchor-id="downloading-nestwatch-data">Downloading NestWatch data</h2>
<p>Login to the <a href="https://nestwatch.org/" class="uri">https://nestwatch.org/</a> site containing your NestWatch data. For this post, I’ll pretend our username is Sialia.</p>
<p>At the bottom of the main page, after logging in, you’ll find three sections for downloading data:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/nestwatch-data-analysis/images/data_download_main.png" class="img-fluid figure-img"></p>
<figcaption>Download data main</figcaption>
</figure>
</div>
<section id="download-nest-site-descriptions" class="level3">
<h3 class="anchored" data-anchor-id="download-nest-site-descriptions">Download Nest Site Descriptions</h3>
<p>This is a table containing information about each individual nest box. You should re-download this table whenever new bird boxes are added. Since I’m using R for the data analysis, I download the CSV version of this table.</p>
<p>The downloaded file will be named <code>siteDescriptions-[username]-[yyyymmdd].csv</code>. The <code>[yyyymmdd]</code> part is the date you download the file. For all of these examples, pretend you are downloading the data on 2024-02-21.</p>
<p><strong>Example</strong>: <code>siteDescriptions-Sialia-20240221.csv</code></p>
<p>We will take a brief look at this file now just to get a sense of what’s in there.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(readr)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(janitor, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">warn.conflicts =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dplyr, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">warn.conflicts =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggplot2)</span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(lubridate, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">warn.conflicts =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb1-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyr)</span>
<span id="cb1-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(stringr)</span>
<span id="cb1-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(sf)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE</code></pre>
</div>
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(gt)</span></code></pre></div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">raw_data_path <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"./data/raw"</span></span>
<span id="cb4-2">site_stub <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"siteDescriptions-Sialia-20240221"</span></span>
<span id="cb4-3"></span>
<span id="cb4-4">site_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(raw_data_path, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(site_stub, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".csv"</span>))</span>
<span id="cb4-5">site_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> site_file, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col_names =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, </span>
<span id="cb4-6">                    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">comment =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">show_col_types =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb4-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str</span>(site_df)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>spc_tbl_ [58 × 16] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Site Name                : chr [1:58] "Kestrel Box" "BCNP07" "BCNP08" "BCNP09" ...
 $ Latitude                 : num [1:58] 42.8 42.7 42.7 42.7 42.7 ...
 $ Longitude                : num [1:58] -83.1 -83.2 -83.2 -83.2 -83.2 ...
 $ Substrate                : chr [1:58] "nest box / birdhouse" "nest box / birdhouse" "nest box / birdhouse" "nest box / birdhouse" ...
 $ Height Above Ground      : num [1:58] NA 5 5 5 5 5 5 5 12 12 ...
 $ Height Above Ground Units: logi [1:58] NA FALSE FALSE FALSE FALSE FALSE ...
 $ Entrance Diameter        : num [1:58] NA NA NA NA NA NA NA NA 3 3 ...
 $ Entrance Diameter Units  : chr [1:58] NA NA NA NA ...
 $ Entrance Orientation     : chr [1:58] NA "se" "se" "se" ...
 $ Site Elevation           : num [1:58] NA 948 948 948 948 950 950 957 908 934 ...
 $ Site Elevation Units     : logi [1:58] NA FALSE FALSE FALSE FALSE FALSE ...
 $ Habitat Info 1           : chr [1:58] NA "natural grassland and prairie" "natural grassland and prairie" "natural grassland and prairie" ...
 $ Habitat Info 2           : chr [1:58] NA "woodland/forest" "woodland/forest" "woodland/forest" ...
 $ Habitat Info 3           : logi [1:58] NA NA NA NA NA NA ...
 $ Nest Attempts            : num [1:58] NA 2 2 2 2 1 2 NA NA NA ...
 $ Comments                 : chr [1:58] "Kestrel box installed in 2021 by Tom K" NA NA NA ...
 - attr(*, "spec")=
  .. cols(
  ..   `Site Name` = col_character(),
  ..   Latitude = col_double(),
  ..   Longitude = col_double(),
  ..   Substrate = col_character(),
  ..   `Height Above Ground` = col_double(),
  ..   `Height Above Ground Units` = col_logical(),
  ..   `Entrance Diameter` = col_double(),
  ..   `Entrance Diameter Units` = col_character(),
  ..   `Entrance Orientation` = col_character(),
  ..   `Site Elevation` = col_double(),
  ..   `Site Elevation Units` = col_logical(),
  ..   `Habitat Info 1` = col_character(),
  ..   `Habitat Info 2` = col_character(),
  ..   `Habitat Info 3` = col_logical(),
  ..   `Nest Attempts` = col_double(),
  ..   Comments = col_character()
  .. )
 - attr(*, "problems")=&lt;externalptr&gt; </code></pre>
</div>
</div>
<p>Some of the column names include spaces, but otherwise this file is easy to read in. Well, it’s easy as long as you specify the <code>comment = '#'</code> argument. Rows 2-4 of the CSV file are comments.</p>
</section>
<section id="download-breeding-data" class="level3">
<h3 class="anchored" data-anchor-id="download-breeding-data">Download Breeding Data</h3>
<p>This is where we get the detailed breeding data. However, the download process is a bit confusing. First of all, there are two types of downloads:</p>
<ul>
<li>Summaries by site</li>
<li>Individual site visits</li>
</ul>
<p>You can also choose individual years or all years. Let’s look at each combination of choices.</p>
<section id="summaries-by-site---all-years" class="level4">
<h4 class="anchored" data-anchor-id="summaries-by-site---all-years">Summaries by site - All years</h4>
<p>The downloaded file is named <code>breedingSummary-Sialia-20240221.csv</code>.</p>
<p>That’s unfortunate as there is no indication that this is a summary by site nor which years are included.</p>
<p>That’s <strong>not</strong> the worst problem. There is no field indicating the year. For records containing at least one filled out date, we can infer the year. However, there might be records without any dates - a nesting attempt that resulted in no eggs. That means we need to download each year individually.</p>
</section>
<section id="summaries-by-site---a-single-year" class="level4">
<h4 class="anchored" data-anchor-id="summaries-by-site---a-single-year">Summaries by site - a single year</h4>
<p>You can pick a year from the drop down menu and click the Download Now button. The downloaded file is named <code>breedingSummary-Sialia-20240221.csv</code>. Yep, it’s named exactly the same no matter which year we choose or if we choose all years.</p>
<p>After downloading, you need to rename the file to contain the year.</p>
<p><strong>This table contains one row per nest box and summarizes the results for the year chosen. It is the primary source we use in our R based data prep machinery for the breeding details.</strong></p>
<p>Let’s take a quick look at this file.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">breeding_stub <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"breedingSummary-Sialia"</span></span>
<span id="cb6-2">yr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2024</span></span>
<span id="cb6-3"></span>
<span id="cb6-4">year_str <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.character</span>(yr)</span>
<span id="cb6-5">breeding_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(raw_data_path, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(breeding_stub, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'-'</span>, year_str,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".csv"</span>))</span>
<span id="cb6-6">breeding_summary_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> breeding_file, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col_names =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, </span>
<span id="cb6-7">                                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">comment =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">show_col_types =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb6-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str</span>(breeding_summary_df)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>spc_tbl_ [71 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Site Name       : chr [1:71] "BCNP12" "BCNP11" "BCNP08" "Watershed Barn" ...
 $ Species         : chr [1:71] "Tree Swallow" "Eastern Bluebird" "Tree Swallow" "American Robin" ...
 $ Outcome         : chr [1:71] "At least one host young fledged" "At least one host young fledged" "All young found dead in or nearby nest" NA ...
 $ 1st Egg Date    : Date[1:71], format: "2024-05-06" "2024-05-08" ...
 $ 1st Hatch Date  : Date[1:71], format: "2024-05-26" "2024-05-24" ...
 $ 1st Fledge Date : Date[1:71], format: "2024-06-11" "2024-06-12" ...
 $ No. of fledged  : num [1:71] 5 4 0 NA NA 4 NA NA NA NA ...
 $ Max clutch size : num [1:71] 5 4 4 NA NA 5 NA NA NA NA ...
 $ Total live young: num [1:71] 5 4 4 NA NA 4 NA NA NA NA ...
 $ Unhatched eggs  : num [1:71] 0 0 NA NA NA 1 NA NA NA NA ...
 $ Hatch Rate      : chr [1:71] "100.0%" "100.0%" "100.0%" NA ...
 $ Fledge Rate     : chr [1:71] "100.0%" "100.0%" "0.0%" NA ...
 - attr(*, "spec")=
  .. cols(
  ..   `Site Name` = col_character(),
  ..   Species = col_character(),
  ..   Outcome = col_character(),
  ..   `1st Egg Date` = col_date(format = ""),
  ..   `1st Hatch Date` = col_date(format = ""),
  ..   `1st Fledge Date` = col_date(format = ""),
  ..   `No. of fledged` = col_double(),
  ..   `Max clutch size` = col_double(),
  ..   `Total live young` = col_double(),
  ..   `Unhatched eggs` = col_double(),
  ..   `Hatch Rate` = col_character(),
  ..   `Fledge Rate` = col_character()
  .. )
 - attr(*, "problems")=&lt;externalptr&gt; </code></pre>
</div>
</div>
</section>
<section id="individual-site-visits---all-years" class="level4">
<h4 class="anchored" data-anchor-id="individual-site-visits---all-years">Individual site visits - All years</h4>
<p>Yep, you guessed it, the file is still named breedingSummary-OT_NestWatch-20240221.csv. At least each row has a date or datetime associated with it. I haven’t used this table yet for anything. This file is the only place to get dates related to nesting attempts that never resulted in any eggs. However, we aren’t going to address this issue in this post.</p>
</section>
</section>
<section id="download-species-summaries" class="level3">
<h3 class="anchored" data-anchor-id="download-species-summaries">Download Species Summaries</h3>
<p>Again, these can be done for an individual year or for all years. The all years version just aggregates things over all the years - it does not give you individual year summaries. So, if we want to use these, you need to download each year separately and rename the files as you go so you know what year it is.</p>
<section id="species-summary---a-single-year" class="level4">
<h4 class="anchored" data-anchor-id="species-summary---a-single-year">Species summary - a single year</h4>
<p>The filename is <code>speciesSummary-Sialia-20240221.csv</code>. Again, no indication of the year chosen in the filename. Download each year separately and rename the files as you go so you know what year it is.</p>
<p>Several of the column names have the <code>'#'</code> character in it. That’s a problem. I raised this issue with the NestWatch folks a few years ago:</p>
<blockquote class="blockquote">
<p>In the species summary, some of the column names include the ‘#’ character. This character is also used to denote comment lines and this causes problems when reading the csv using R or Python libraries where one can specify a comment characters so that readr() or read_csv() can skip these lines. With readr(), the embedded ‘#’ characters in the column headings was causing problems with proper file reading. Specifically, this:</p>
<p><code>speciesSummary_OT_NestWatch_20230221 &lt;- read_csv("data/raw/speciesSummary-OT_NestWatch-20230221.csv", comment = "#")</code></p>
<p>results in columns 2:N being collapsed into a single column named ‘Total’.</p>
</blockquote>
<p>As of the time of this post, nothing has changed.</p>
<p>Here are the columns in this file:</p>
<pre><code>Total # nesting attempts    
Earliest Egg Date   
Earliest Hatch Date 
Earliest Fledge Date    
Total # eggs    
Total # nestlings   
Total # fledglings  
Nest attempts with at least one fledgling   
Nesting success rate    
Mean clutch size    
Mean nestlings  
Mean fledglings</code></pre>
</section>
</section>
</section>
<section id="data-prep-for-analysis" class="level2">
<h2 class="anchored" data-anchor-id="data-prep-for-analysis">Data prep for analysis</h2>
<p>We have three different file types we need to read and do some pre-processing on before we can proceed to analysis of the data. The scenario that we’ll use is that we have downloaded the site file as well as all of the breeding summary and species summary files for 2019-2024. The breeding and species summary files have been renamed as described above so that they have the year appended to the filename - e.g.&nbsp;<code>breedingSummary-Sialia-2019.csv</code> and <code>speciesSummary-Sialia-2019.csv</code>. Our goal is to combine the annual files into combined breeding summary and species summary files so that we can do analysis over time. Let’s look at each of the three file types and the data pre-processing done for each. Some of these things are going to be specific to our interests and our way of doing things in our NestWatch project.</p>
<p>In practice, I have a single R script that does all of this but for purposes of this post, I’ll break it up into pieces.</p>
<section id="data-prep-for-the-site-descriptions-file" class="level3">
<h3 class="anchored" data-anchor-id="data-prep-for-the-site-descriptions-file">Data prep for the site descriptions file</h3>
<p>The first step after reading the file is to clean up the column names.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">site_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(raw_data_path, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(site_stub, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".csv"</span>))</span>
<span id="cb9-2">site_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> site_file, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col_names =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, </span>
<span id="cb9-3">                    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">comment =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">show_col_types =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb9-4">site_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> site_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> janitor<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">clean_names</span>()</span></code></pre></div>
</div>
<p>What happens in the <code>clean_names()</code> function? We rely on the <a href="https://cran.r-project.org/web/packages/janitor/vignettes/janitor.html">janitor package</a> and its <a href="https://cran.r-project.org/web/packages/janitor/vignettes/janitor.html#clean-dataframe-names-with-clean_names">clean_names()</a> function. Some of the key things it does is:</p>
<ul>
<li>Parses letter cases and separators to a consistent format (e.g.&nbsp;snake case is default)</li>
<li>Handles special characters and spaces, including transliterating characters like œ to oe.</li>
<li>Appends numbers to duplicated names</li>
<li>Converts “%” to “percent” and “#” to “number” to retain meaning</li>
<li>Spacing (or lack thereof) around numbers is preserved</li>
</ul>
<p>Let’s redo the reading of the file and see the changes made.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">site_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> site_file, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col_names =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, </span>
<span id="cb10-2">                    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">comment =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">show_col_types =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb10-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(site_df)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] "Site Name"                 "Latitude"                 
 [3] "Longitude"                 "Substrate"                
 [5] "Height Above Ground"       "Height Above Ground Units"
 [7] "Entrance Diameter"         "Entrance Diameter Units"  
 [9] "Entrance Orientation"      "Site Elevation"           
[11] "Site Elevation Units"      "Habitat Info 1"           
[13] "Habitat Info 2"            "Habitat Info 3"           
[15] "Nest Attempts"             "Comments"                 </code></pre>
</div>
</div>
<p>Now, let’s clean them.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">site_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> site_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> janitor<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">clean_names</span>()</span>
<span id="cb12-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(site_df)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] "site_name"                 "latitude"                 
 [3] "longitude"                 "substrate"                
 [5] "height_above_ground"       "height_above_ground_units"
 [7] "entrance_diameter"         "entrance_diameter_units"  
 [9] "entrance_orientation"      "site_elevation"           
[11] "site_elevation_units"      "habitat_info_1"           
[13] "habitat_info_2"            "habitat_info_3"           
[15] "nest_attempts"             "comments"                 </code></pre>
</div>
</div>
<p>Now we are going to add a <code>box_type</code> field which takes on values of either ‘Traditional’, ‘Peterson’, or ‘Kestrel’. For more about the Peterson box, see <a href="https://www.sialis.org/bluebirding-giant-dick-peterson/">this post</a> about the creator of it. We use the <code>comments</code> field in our site description table to store information about the box type. Using R’s <code>grep()</code> function, we can extract the box type and populate the <code>box_type</code> field.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Initialize all as Traditional</span></span>
<span id="cb14-2">site_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>box_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Traditional'</span></span>
<span id="cb14-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Find and set the Peterson boxes</span></span>
<span id="cb14-4">site_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>box_type[<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">grep</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Peterson'</span>, site_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>comments, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ignore.case =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Peterson'</span></span>
<span id="cb14-5"></span>
<span id="cb14-6">site_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>site_name_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_c</span>(site_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>site_name, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'-'</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_sub</span>(site_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>box_type, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb14-7"></span>
<span id="cb14-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Find and set the Kestrel boxes</span></span>
<span id="cb14-9">site_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>box_type[<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">grep</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Kestrel'</span>, site_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>comments, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ignore.case =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Kestrel'</span></span>
<span id="cb14-10">site_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>box_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.factor</span>(site_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>box_type)</span></code></pre></div>
</div>
<p>How many do we have of each?</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">site_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb15-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(box_type) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb15-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 3 × 2
# Groups:   box_type [3]
  box_type        n
  &lt;fct&gt;       &lt;int&gt;
1 Kestrel         2
2 Peterson        9
3 Traditional    47</code></pre>
</div>
</div>
<p>We use a box naming convention that starts with an abbreviation for the park and then a box number.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>(site_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>site_name)[<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">11</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>)]</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "BCNP07" "BCNP08" "BCNP09" "BCNP10" "PCT01"  "PCT02"  "PCT04"  "PCT03" </code></pre>
</div>
</div>
<p>We’ll use the park abbreviation for a new field called <code>area</code>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add area</span></span>
<span id="cb19-2">site_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>area <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_extract</span>(site_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>site_name, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'^[A-Z]+'</span>)</span>
<span id="cb19-3"></span>
<span id="cb19-4">site_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb19-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(area) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb19-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 7 × 2
# Groups:   area [7]
  area      n
  &lt;chr&gt; &lt;int&gt;
1 BCNP     12
2 CIP      24
3 DTLP      9
4 G         1
5 K         1
6 P         1
7 PCT      10</code></pre>
</div>
</div>
<p>Finally, we convert <code>site_df</code> to a <a href="">simple features object</a> using the <a href="">sf package</a>. While we have longitude and latitude fields in our dataframe, <code>site_df</code> is not really a spatially aware object. In order to make it easy to do spatial related queries, computations and visualizations, we need to fortify it with some spatial powers. To do this we’ll use the <code>sf</code> package.</p>
<blockquote class="blockquote">
<p>A package that provides <a href="https://en.wikipedia.org/wiki/Simple_Features">simple features access</a> for R.</p>
</blockquote>
<p>With <code>sf</code> we can represent simple planar features such as points, lines and polygons (and multi versions of these things) and manipulate them in dataframes or tibbles. In the Python world we could use the <a href="https://geopandas.org/en/stable/">GeoPandas</a> package (which relies on the <a href="https://shapely.readthedocs.io/en/stable/">shapely</a> package). Simple features is a general concept supported in many geocomputational tools including things like <a href="https://qgis.org/">QGIS</a> and <a href="https://postgis.net/">PostGIS.</a>. See my <a href="https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r">earlier post on creating maps of the nest box locations with R</a>, for more details on geocomputation in R.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(sf)</span>
<span id="cb21-2"></span>
<span id="cb21-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert site_df to an simple features object</span></span>
<span id="cb21-4">site_sf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_as_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> site_df, </span>
<span id="cb21-5">                    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">coords =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"longitude"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"latitude"</span>),</span>
<span id="cb21-6">                    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">crs =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"EPSG:4326"</span>)</span></code></pre></div>
</div>
<p>We will need this spatial awareness later as part of the analysis.</p>
</section>
<section id="data-prep-for-the-species-summary-file" class="level3">
<h3 class="anchored" data-anchor-id="data-prep-for-the-species-summary-file">Data prep for the species summary file</h3>
<p>This file is a little trickier due to the use of the <code>'#'</code> character within several column names. We need to fix these first before attempting to use <code>read_csv()</code> to ingest the file. I created a function that pulls out the first line from the species summary file (the header line) and uses <code>janitor::make_clean_names()</code> to return a cleaned up version. Then we can use the cleaned up names when reading in the species summary file. We saw that the <code>clean_names()</code> function takes a dataframe as an input and modifies the column names. The <code>make_clean_names()</code> function takes a character vector of names as an input and returns a cleaned up character vector of names. As you might guess, <code>clean_names()</code> uses <code>make_clean_names()</code> under the hood.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1">fix_species_colnames <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(src_path){</span>
<span id="cb22-2">  </span>
<span id="cb22-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert cleaned line to character vector of new column names</span></span>
<span id="cb22-4">  original_col_names <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_split_1</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">readLines</span>(src_path, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">","</span>)</span>
<span id="cb22-5">  </span>
<span id="cb22-6">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Now clean names (janitor handles the '#' replacement with '_number_')</span></span>
<span id="cb22-7">  clean_col_names <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> janitor<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">make_clean_names</span>(original_col_names)</span>
<span id="cb22-8">  </span>
<span id="cb22-9">}</span></code></pre></div>
</div>
<p>The main part of code loops over the years that are to be read in. At each iteration, the species summary CSV file is read in and stored in a list. This list of dataframes is then combined rowwise to create the final multi-year species summary dataframe.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1">species_stub <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"speciesSummary-Sialia"</span></span>
<span id="cb23-2"></span>
<span id="cb23-3">min_year <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2019</span></span>
<span id="cb23-4">max_year <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2024</span></span>
<span id="cb23-5">year_range <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> min_year<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>max_year</span>
<span id="cb23-6"></span>
<span id="cb23-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Clean the species file column names and read into dataframes</span></span>
<span id="cb23-8">species_summary_dfs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>()</span>
<span id="cb23-9"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (yr <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> year_range) {</span>
<span id="cb23-10">  </span>
<span id="cb23-11">  year_str <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.character</span>(yr)</span>
<span id="cb23-12">  species_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(raw_data_path, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(species_stub, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'-'</span>, year_str,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".csv"</span>))</span>
<span id="cb23-13">  </span>
<span id="cb23-14">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get new column names</span></span>
<span id="cb23-15">  cleaned_col_names <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fix_species_colnames</span>(species_file)</span>
<span id="cb23-16">  </span>
<span id="cb23-17">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Read in file, skipping first line</span></span>
<span id="cb23-18">  species_summary_dfs[[year_str]] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> species_file, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">skip =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, </span>
<span id="cb23-19">                                              <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col_names =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">comment =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#'</span>,</span>
<span id="cb23-20">                                              <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">show_col_types =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb23-21">  </span>
<span id="cb23-22">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Reset the column names</span></span>
<span id="cb23-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(species_summary_dfs[[year_str]]) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> cleaned_col_names</span>
<span id="cb23-24">  species_summary_dfs[[year_str]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>year <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> yr</span>
<span id="cb23-25">  </span>
<span id="cb23-26">}</span>
<span id="cb23-27"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Combine list of dataframes into one dataframe.</span></span>
<span id="cb23-28">species_summary_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_rows</span>(species_summary_dfs)</span>
<span id="cb23-29"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rm</span>(species_summary_dfs)</span></code></pre></div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str</span>(species_summary_df)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>spc_tbl_ [24 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ species                                  : chr [1:24] "Black-capped Chickadee" "Tree Swallow" "House Wren" "Eastern Bluebird" ...
 $ total_number_nesting_attempts            : num [1:24] 2 22 4 27 1 19 2 32 3 1 ...
 $ earliest_egg_date                        : Date[1:24], format: "2019-05-02" "2019-05-06" ...
 $ earliest_hatch_date                      : Date[1:24], format: "2019-05-22" "2019-05-24" ...
 $ earliest_fledge_date                     : Date[1:24], format: "2019-06-05" "2019-06-16" ...
 $ total_number_eggs                        : num [1:24] 8 77 10 89 4 75 3 118 0 6 ...
 $ total_number_nestlings                   : num [1:24] 8 42 9 79 4 58 3 104 0 0 ...
 $ total_number_fledglings                  : num [1:24] NA 38 9 78 4 57 3 103 0 0 ...
 $ nest_attempts_with_at_least_one_fledgling: num [1:24] 0 10 2 20 1 11 1 27 0 0 ...
 $ nesting_success_rate                     : chr [1:24] "0.0%" "45.5%" "50.0%" "74.1%" ...
 $ mean_clutch_size                         : num [1:24] 8 4.81 5 4.05 4 ...
 $ mean_nestlings                           : num [1:24] 8 3.23 4.5 3.59 4 ...
 $ mean_fledglings                          : num [1:24] NA 2.92 4.5 3.55 4 ...
 $ year                                     : int [1:24] 2019 2019 2019 2019 2019 2020 2020 2020 2020 2021 ...
 - attr(*, "spec")=
  .. cols(
  ..   X1 = col_character(),
  ..   X2 = col_double(),
  ..   X3 = col_date(format = ""),
  ..   X4 = col_date(format = ""),
  ..   X5 = col_date(format = ""),
  ..   X6 = col_double(),
  ..   X7 = col_double(),
  ..   X8 = col_double(),
  ..   X9 = col_double(),
  ..   X10 = col_character(),
  ..   X11 = col_double(),
  ..   X12 = col_double(),
  ..   X13 = col_double()
  .. )
 - attr(*, "problems")=&lt;externalptr&gt; </code></pre>
</div>
</div>
</section>
<section id="data-prep-for-the-breeding-summary-file" class="level3">
<h3 class="anchored" data-anchor-id="data-prep-for-the-breeding-summary-file">Data prep for the breeding summary file</h3>
<p>There’s a little more work to do with this file than the other two. It’s the main source of detailed data needed for our analysis. We start, much like we just did with the species summaries, by reading in each year specific file and combining them into one breeding summary dataframe.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create list of dataframes - one df for each year</span></span>
<span id="cb26-2">breeding_summary_dfs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>()</span>
<span id="cb26-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (yr <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> year_range){</span>
<span id="cb26-4">  year_str <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.character</span>(yr)</span>
<span id="cb26-5">  breeding_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(raw_data_path, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(breeding_stub, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'-'</span>, year_str,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".csv"</span>))</span>
<span id="cb26-6">  breeding_summary_dfs[[year_str]] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> breeding_file, </span>
<span id="cb26-7">                                               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col_names =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">comment =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#'</span>,</span>
<span id="cb26-8">                                               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">show_col_types =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb26-9">}</span>
<span id="cb26-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Combine list of dataframes into one dataframe.</span></span>
<span id="cb26-11">breeding_summary_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_rows</span>(breeding_summary_dfs, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.id =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"site_year"</span>)</span></code></pre></div>
</div>
<p>Then we do some changing of data types, cleaning column names, adding a new column and replacing <code>NA</code> values with 0’s.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Change site_year to numeric</span></span>
<span id="cb27-2">breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>site_year <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>site_year)</span>
<span id="cb27-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Clean up the column names</span></span>
<span id="cb27-4">breeding_summary_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> breeding_summary_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> janitor<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">clean_names</span>()</span>
<span id="cb27-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add area</span></span>
<span id="cb27-6">breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>area <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_extract</span>(breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>site_name, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'^[A-Z]+'</span>)</span>
<span id="cb27-7"></span>
<span id="cb27-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Replace NA with 0 in some of the numeric columns</span></span>
<span id="cb27-9">breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>no_of_fledged <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb27-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replace_na</span>(breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>no_of_fledged, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb27-11">breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>max_clutch_size <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb27-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replace_na</span>(breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>max_clutch_size, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb27-13">breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>total_live_young <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb27-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replace_na</span>(breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>total_live_young, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb27-15">breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>unhatched_eggs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb27-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replace_na</span>(breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>unhatched_eggs, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span></code></pre></div>
</div>
<p>Then we do computation of hatch and fledge rates.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute hatch rate and fledge rate (can compare to eBird's reported rates which are strings with %)</span></span>
<span id="cb28-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ISSUE: max_clutch_size seems like it should equal total_live_young + unhatched_eggs. While it usually does,</span></span>
<span id="cb28-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># there are cases where this is not true. In such cases, which should be used for the denominator when computing</span></span>
<span id="cb28-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># hatch_rate? It seems that max_clutch_size is the correct denominator.</span></span>
<span id="cb28-5"></span>
<span id="cb28-6">breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>hatch_rate_computed <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb28-7">  breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>total_live_young <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>max_clutch_size</span>
<span id="cb28-8">breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>fledge_rate_computed <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb28-9">  breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>no_of_fledged <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> breeding_summary_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>total_live_young </span></code></pre></div>
</div>
<p>Since many boxes see more than one nesting event per season, I thought it would be helpful to create a nesting event sequence number within each year for each nest box. As a first step, I wanted a datetime field that was some sort of proxy for the nesting event. Taking the minimum of the first egg date, the first hatch date and the first fledge date gives us such a datetime for those nesting attempts resulting in at least one egg. However, this misses all those failed nesting attempts. It seems that the only way to get at those is through the Site Visits table. Even without looking at that table, we can still tell there was a nesting event because all three first date fields have <code>NA</code> values. We just can’t sequence it. That will be a problem for another day. We are mostly focusing on true nesting attempts defined by at least one egg being documented.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Start by computing min date of the three date fields so that we have</span></span>
<span id="cb29-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># a single date field to use for sequencing if needed. </span></span>
<span id="cb29-3">breeding_summary_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> breeding_summary_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb29-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb29-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">seq_date =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(x1st_egg_date, x1st_hatch_date, x1st_fledge_date), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>))</span></code></pre></div>
</div>
<p>Finally, we can clean up and save an <code>.RData</code> file containing the three main dataframes we just created.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Remove the list</span></span>
<span id="cb30-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rm</span>(breeding_summary_dfs)</span>
<span id="cb30-3"></span>
<span id="cb30-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Save all data for research</span></span>
<span id="cb30-5">filename <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_interp</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/nestwatch_sialia_${min_year}_${max_year}.RData'</span>)</span>
<span id="cb30-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">save</span>(breeding_summary_df, site_sf, species_summary_df, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file=</span>filename)</span></code></pre></div>
</div>
</section>
</section>
<section id="basic-summary-analysis" class="level2">
<h2 class="anchored" data-anchor-id="basic-summary-analysis">Basic summary analysis</h2>
<p>We are finally ready to do some basic analysis.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">load</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/nestwatch_sialia_2019_2024.RData'</span>)</span></code></pre></div>
</div>
</section>
<section id="how-many-boxes-were-monitored-each-year" class="level2">
<h2 class="anchored" data-anchor-id="how-many-boxes-were-monitored-each-year">How many boxes were monitored each year?</h2>
<p>Let’s start with how many physical boxes we have of each type in the four main parks monitored. Not all of these boxes are active, they are just in the data table. For any query involving <code>site_sf</code>, we use the <code>st_drop_geometry()</code> function to prevent the geometry from showing up in our nice output tables.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1">site_sf <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb32-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_drop_geometry</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb32-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(area <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'BCNP'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CIP'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'DTLP'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'PCT'</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb32-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(area, box_type) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb32-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">num_boxes =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>(), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.groups=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rowwise"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb32-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from =</span> box_type, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from =</span> num_boxes) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb32-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replace</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(.), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb32-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>()</span></code></pre></div>
<div class="cell-output-display">
<div>
<div id="ovckksgqhu" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#ovckksgqhu table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#ovckksgqhu thead, #ovckksgqhu tbody, #ovckksgqhu tfoot, #ovckksgqhu tr, #ovckksgqhu td, #ovckksgqhu th {
  border-style: none;
}

#ovckksgqhu p {
  margin: 0;
  padding: 0;
}

#ovckksgqhu .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#ovckksgqhu .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#ovckksgqhu .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#ovckksgqhu .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#ovckksgqhu .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#ovckksgqhu .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#ovckksgqhu .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#ovckksgqhu .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#ovckksgqhu .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#ovckksgqhu .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#ovckksgqhu .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#ovckksgqhu .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#ovckksgqhu .gt_spanner_row {
  border-bottom-style: hidden;
}

#ovckksgqhu .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#ovckksgqhu .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#ovckksgqhu .gt_from_md > :first-child {
  margin-top: 0;
}

#ovckksgqhu .gt_from_md > :last-child {
  margin-bottom: 0;
}

#ovckksgqhu .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#ovckksgqhu .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#ovckksgqhu .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#ovckksgqhu .gt_row_group_first td {
  border-top-width: 2px;
}

#ovckksgqhu .gt_row_group_first th {
  border-top-width: 2px;
}

#ovckksgqhu .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#ovckksgqhu .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#ovckksgqhu .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#ovckksgqhu .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#ovckksgqhu .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#ovckksgqhu .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#ovckksgqhu .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#ovckksgqhu .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#ovckksgqhu .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#ovckksgqhu .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#ovckksgqhu .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#ovckksgqhu .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#ovckksgqhu .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#ovckksgqhu .gt_left {
  text-align: left;
}

#ovckksgqhu .gt_center {
  text-align: center;
}

#ovckksgqhu .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#ovckksgqhu .gt_font_normal {
  font-weight: normal;
}

#ovckksgqhu .gt_font_bold {
  font-weight: bold;
}

#ovckksgqhu .gt_font_italic {
  font-style: italic;
}

#ovckksgqhu .gt_super {
  font-size: 65%;
}

#ovckksgqhu .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#ovckksgqhu .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#ovckksgqhu .gt_indent_1 {
  text-indent: 5px;
}

#ovckksgqhu .gt_indent_2 {
  text-indent: 10px;
}

#ovckksgqhu .gt_indent_3 {
  text-indent: 15px;
}

#ovckksgqhu .gt_indent_4 {
  text-indent: 20px;
}

#ovckksgqhu .gt_indent_5 {
  text-indent: 25px;
}

#ovckksgqhu .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#ovckksgqhu div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table table table-sm table-striped small" data-quarto-postprocess="true" data-quarto-disable-processing="false" data-quarto-bootstrap="false">
<thead>
<tr class="header gt_col_headings">
<th id="area" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">area</th>
<th id="Peterson" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">Peterson</th>
<th id="Traditional" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">Traditional</th>
<th id="Kestrel" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">Kestrel</th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_left" headers="area">BCNP</td>
<td class="gt_row gt_right" headers="Peterson">3</td>
<td class="gt_row gt_right" headers="Traditional">9</td>
<td class="gt_row gt_right" headers="Kestrel">0</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="area">CIP</td>
<td class="gt_row gt_right" headers="Peterson">6</td>
<td class="gt_row gt_right" headers="Traditional">17</td>
<td class="gt_row gt_right" headers="Kestrel">1</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="area">DTLP</td>
<td class="gt_row gt_right" headers="Peterson">0</td>
<td class="gt_row gt_right" headers="Traditional">9</td>
<td class="gt_row gt_right" headers="Kestrel">0</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="area">PCT</td>
<td class="gt_row gt_right" headers="Peterson">0</td>
<td class="gt_row gt_right" headers="Traditional">10</td>
<td class="gt_row gt_right" headers="Kestrel">0</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</div>
<p>What about the number of monitored box events each year? It is possible that a single box can have more than one nesting event in a season.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1">breeding_summary_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb33-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(area <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'BCNP'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CIP'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'DTLP'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'PCT'</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb33-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">inner_join</span>(site_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"site_name"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb33-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_drop_geometry</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb33-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(site_year, box_type) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb33-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">num_boxes =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>(), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.groups=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rowwise"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb33-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Box type</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> box_type) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb33-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.factor</span>(site_year), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y=</span>num_boxes, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Box type</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label=</span>num_boxes)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb33-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_bar</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">stat=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"identity"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb33-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">position =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">position_stack</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">vjust =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb33-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Number of nest box events"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb33-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggtitle</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Number of nest box events by year"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/nestwatch-data-analysis/analyzing_nestwatch_data_r_files/figure-html/box_events_yearly-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="how-many-eggs-live-births-and-fledging-events" class="level2">
<h2 class="anchored" data-anchor-id="how-many-eggs-live-births-and-fledging-events">How many eggs, live births, and fledging events?</h2>
<p>We will compute these by year and by species.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># By year</span></span>
<span id="cb34-2">annual_counts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> breeding_summary_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb34-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"House Sparrow"</span>, species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"American Goldfinch"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb34-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(area <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'BCNP'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CIP'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'DTLP'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'PCT'</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb34-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(site_year) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb34-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.groups=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rowwise"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">num_boxes =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>(),</span>
<span id="cb34-7">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tot_fledged =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(no_of_fledged, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb34-8">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tot_clutch_size =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(max_clutch_size, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb34-9">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tot_live_young =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(total_live_young, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb34-10">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tot_unhatched_eggs =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(unhatched_eggs, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb34-11">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overall_hatch_rate_1 =</span> tot_live_young <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (tot_live_young <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> tot_unhatched_eggs),</span>
<span id="cb34-12">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overall_hatch_rate_2 =</span> tot_live_young <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> tot_clutch_size,</span>
<span id="cb34-13">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overall_fledge_rate =</span> tot_fledged <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> tot_live_young)</span>
<span id="cb34-14"></span>
<span id="cb34-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># By year by species</span></span>
<span id="cb34-16">annual_species_counts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> breeding_summary_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb34-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"House Sparrow"</span>, species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"American Goldfinch"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb34-18">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(site_year, species) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb34-19">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.groups=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rowwise"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">num_boxes =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>(),</span>
<span id="cb34-20">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tot_fledged =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(no_of_fledged, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb34-21">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tot_clutch_size =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(max_clutch_size, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb34-22">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tot_live_young =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(total_live_young, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb34-23">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tot_unhatched_eggs =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(unhatched_eggs, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb34-24">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overall_hatch_rate_1 =</span> tot_live_young <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (tot_live_young <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> tot_unhatched_eggs),</span>
<span id="cb34-25">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overall_hatch_rate_2 =</span> tot_live_young <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> tot_clutch_size,</span>
<span id="cb34-26">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overall_fledge_rate =</span> tot_fledged <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> tot_live_young)</span>
<span id="cb34-27"></span>
<span id="cb34-28"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># </span><span class="al" style="color: #AD0000;
background-color: null;
font-style: inherit;">NOTE</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: Using total clutch size in denominator for hatch rate until can</span></span>
<span id="cb34-29"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># confirm from Cornell how exactly to compute hatch rate.</span></span>
<span id="cb34-30">annual_counts_nice <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> breeding_summary_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb34-31">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"House Sparrow"</span>, species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"American Goldfinch"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb34-32">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(site_year) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb34-33">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.groups=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rowwise"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Nesting events</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>(),</span>
<span id="cb34-34">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Fledged</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(no_of_fledged, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb34-35">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Clutch</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(max_clutch_size, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb34-36">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Live Young</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(total_live_young, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb34-37">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Unhatched Eggs</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(unhatched_eggs, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb34-38">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Overall Hatch Rate</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Live Young</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Clutch</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>),</span>
<span id="cb34-39">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Overall Fledge Rate</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Fledged</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Live Young</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>)</span>
<span id="cb34-40"></span>
<span id="cb34-41"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Long version</span></span>
<span id="cb34-42">annual_counts_long <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb34-43">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(annual_counts_nice, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>site_year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"indicator"</span>)</span></code></pre></div>
</div>
<p>Now we can use these computed dataframes to drive plots and tables.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1">annual_counts_long <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb35-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">id_cols =</span> indicator, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from =</span> site_year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from =</span> value) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb35-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb35-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_header</span>(</span>
<span id="cb35-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Overall annual statistics"</span>,</span>
<span id="cb35-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"All species included"</span></span>
<span id="cb35-7">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb35-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fmt_number</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">starts_with</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">decimals =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,</span>
<span id="cb35-9">             <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">drop_trailing_zeros =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">drop_trailing_dec_mark =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>) </span></code></pre></div>
<div class="cell-output-display">
<div>
<div id="flhcutntyd" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#flhcutntyd table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#flhcutntyd thead, #flhcutntyd tbody, #flhcutntyd tfoot, #flhcutntyd tr, #flhcutntyd td, #flhcutntyd th {
  border-style: none;
}

#flhcutntyd p {
  margin: 0;
  padding: 0;
}

#flhcutntyd .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#flhcutntyd .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#flhcutntyd .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#flhcutntyd .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#flhcutntyd .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#flhcutntyd .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#flhcutntyd .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#flhcutntyd .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#flhcutntyd .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#flhcutntyd .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#flhcutntyd .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#flhcutntyd .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#flhcutntyd .gt_spanner_row {
  border-bottom-style: hidden;
}

#flhcutntyd .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#flhcutntyd .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#flhcutntyd .gt_from_md > :first-child {
  margin-top: 0;
}

#flhcutntyd .gt_from_md > :last-child {
  margin-bottom: 0;
}

#flhcutntyd .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#flhcutntyd .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#flhcutntyd .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#flhcutntyd .gt_row_group_first td {
  border-top-width: 2px;
}

#flhcutntyd .gt_row_group_first th {
  border-top-width: 2px;
}

#flhcutntyd .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#flhcutntyd .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#flhcutntyd .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#flhcutntyd .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#flhcutntyd .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#flhcutntyd .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#flhcutntyd .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#flhcutntyd .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#flhcutntyd .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#flhcutntyd .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#flhcutntyd .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#flhcutntyd .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#flhcutntyd .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#flhcutntyd .gt_left {
  text-align: left;
}

#flhcutntyd .gt_center {
  text-align: center;
}

#flhcutntyd .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#flhcutntyd .gt_font_normal {
  font-weight: normal;
}

#flhcutntyd .gt_font_bold {
  font-weight: bold;
}

#flhcutntyd .gt_font_italic {
  font-style: italic;
}

#flhcutntyd .gt_super {
  font-size: 65%;
}

#flhcutntyd .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#flhcutntyd .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#flhcutntyd .gt_indent_1 {
  text-indent: 5px;
}

#flhcutntyd .gt_indent_2 {
  text-indent: 10px;
}

#flhcutntyd .gt_indent_3 {
  text-indent: 15px;
}

#flhcutntyd .gt_indent_4 {
  text-indent: 20px;
}

#flhcutntyd .gt_indent_5 {
  text-indent: 25px;
}

#flhcutntyd .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#flhcutntyd div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table table table-sm table-striped small" data-quarto-postprocess="true" data-quarto-disable-processing="false" data-quarto-bootstrap="false">
<thead>
<tr class="header gt_heading">
<th colspan="7" class="gt_heading gt_title gt_font_normal">Overall annual statistics</th>
</tr>
<tr class="odd gt_heading">
<th colspan="7" class="gt_heading gt_subtitle gt_font_normal gt_bottom_border">All species included</th>
</tr>
<tr class="header gt_col_headings">
<th id="indicator" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">indicator</th>
<th id="2019" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2019</th>
<th id="2020" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2020</th>
<th id="2021" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2021</th>
<th id="2022" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2022</th>
<th id="2023" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2023</th>
<th id="2024" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2024</th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_left" headers="indicator">Nesting events</td>
<td class="gt_row gt_right" headers="2019">55</td>
<td class="gt_row gt_right" headers="2020">53</td>
<td class="gt_row gt_right" headers="2021">55</td>
<td class="gt_row gt_right" headers="2022">52</td>
<td class="gt_row gt_right" headers="2023">52</td>
<td class="gt_row gt_right" headers="2024">51</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="indicator">Total Fledged</td>
<td class="gt_row gt_right" headers="2019">125</td>
<td class="gt_row gt_right" headers="2020">163</td>
<td class="gt_row gt_right" headers="2021">147</td>
<td class="gt_row gt_right" headers="2022">159</td>
<td class="gt_row gt_right" headers="2023">140</td>
<td class="gt_row gt_right" headers="2024">155</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="indicator">Total Clutch</td>
<td class="gt_row gt_right" headers="2019">184</td>
<td class="gt_row gt_right" headers="2020">196</td>
<td class="gt_row gt_right" headers="2021">190</td>
<td class="gt_row gt_right" headers="2022">177</td>
<td class="gt_row gt_right" headers="2023">192</td>
<td class="gt_row gt_right" headers="2024">205</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="indicator">Total Live Young</td>
<td class="gt_row gt_right" headers="2019">138</td>
<td class="gt_row gt_right" headers="2020">165</td>
<td class="gt_row gt_right" headers="2021">153</td>
<td class="gt_row gt_right" headers="2022">166</td>
<td class="gt_row gt_right" headers="2023">168</td>
<td class="gt_row gt_right" headers="2024">173</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="indicator">Total Unhatched Eggs</td>
<td class="gt_row gt_right" headers="2019">18</td>
<td class="gt_row gt_right" headers="2020">31</td>
<td class="gt_row gt_right" headers="2021">36</td>
<td class="gt_row gt_right" headers="2022">17</td>
<td class="gt_row gt_right" headers="2023">24</td>
<td class="gt_row gt_right" headers="2024">26</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="indicator">Overall Hatch Rate</td>
<td class="gt_row gt_right" headers="2019">0.75</td>
<td class="gt_row gt_right" headers="2020">0.84</td>
<td class="gt_row gt_right" headers="2021">0.81</td>
<td class="gt_row gt_right" headers="2022">0.94</td>
<td class="gt_row gt_right" headers="2023">0.88</td>
<td class="gt_row gt_right" headers="2024">0.84</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="indicator">Overall Fledge Rate</td>
<td class="gt_row gt_right" headers="2019">0.91</td>
<td class="gt_row gt_right" headers="2020">0.99</td>
<td class="gt_row gt_right" headers="2021">0.96</td>
<td class="gt_row gt_right" headers="2022">0.96</td>
<td class="gt_row gt_right" headers="2023">0.83</td>
<td class="gt_row gt_right" headers="2024">0.9</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb36-1">annual_counts_long <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb36-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(indicator <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total Fledged"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total Clutch"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total Live Young"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb36-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">indicator =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(indicator, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">levels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Total Clutch'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Total Live Young'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Total Fledged'</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb36-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb36-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> site_year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, </span>
<span id="cb36-6">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> indicator)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb36-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total Clutch, Live Young, and Fledged"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/nestwatch-data-analysis/analyzing_nestwatch_data_r_files/figure-html/unnamed-chunk-20-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Let’s look at number fledged and number of live young by species.</p>
<section id="fledged-by-species" class="level3">
<h3 class="anchored" data-anchor-id="fledged-by-species">Fledged by species</h3>
<div class="cell">
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1">annual_species_counts <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb37-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"House Sparrow"</span>, species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"American Goldfinch"</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(species)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb37-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(site_year, species) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb37-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.groups=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rowwise"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Fledged =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(tot_fledged)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb37-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">id_cols =</span> species, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from =</span> site_year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from =</span> Fledged) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb37-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replace</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(.), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb37-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb37-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_header</span>(</span>
<span id="cb37-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total Fledged"</span>,</span>
<span id="cb37-10">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#subtitle = str_glue("{min_year} to {max_year}")</span></span>
<span id="cb37-11">  )</span></code></pre></div>
<div class="cell-output-display">
<div>
<div id="xetubsnnmv" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#xetubsnnmv table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#xetubsnnmv thead, #xetubsnnmv tbody, #xetubsnnmv tfoot, #xetubsnnmv tr, #xetubsnnmv td, #xetubsnnmv th {
  border-style: none;
}

#xetubsnnmv p {
  margin: 0;
  padding: 0;
}

#xetubsnnmv .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#xetubsnnmv .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#xetubsnnmv .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#xetubsnnmv .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#xetubsnnmv .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#xetubsnnmv .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#xetubsnnmv .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#xetubsnnmv .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#xetubsnnmv .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#xetubsnnmv .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#xetubsnnmv .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#xetubsnnmv .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#xetubsnnmv .gt_spanner_row {
  border-bottom-style: hidden;
}

#xetubsnnmv .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#xetubsnnmv .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#xetubsnnmv .gt_from_md > :first-child {
  margin-top: 0;
}

#xetubsnnmv .gt_from_md > :last-child {
  margin-bottom: 0;
}

#xetubsnnmv .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#xetubsnnmv .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#xetubsnnmv .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#xetubsnnmv .gt_row_group_first td {
  border-top-width: 2px;
}

#xetubsnnmv .gt_row_group_first th {
  border-top-width: 2px;
}

#xetubsnnmv .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#xetubsnnmv .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#xetubsnnmv .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#xetubsnnmv .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#xetubsnnmv .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#xetubsnnmv .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#xetubsnnmv .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#xetubsnnmv .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#xetubsnnmv .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#xetubsnnmv .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#xetubsnnmv .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#xetubsnnmv .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#xetubsnnmv .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#xetubsnnmv .gt_left {
  text-align: left;
}

#xetubsnnmv .gt_center {
  text-align: center;
}

#xetubsnnmv .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#xetubsnnmv .gt_font_normal {
  font-weight: normal;
}

#xetubsnnmv .gt_font_bold {
  font-weight: bold;
}

#xetubsnnmv .gt_font_italic {
  font-style: italic;
}

#xetubsnnmv .gt_super {
  font-size: 65%;
}

#xetubsnnmv .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#xetubsnnmv .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#xetubsnnmv .gt_indent_1 {
  text-indent: 5px;
}

#xetubsnnmv .gt_indent_2 {
  text-indent: 10px;
}

#xetubsnnmv .gt_indent_3 {
  text-indent: 15px;
}

#xetubsnnmv .gt_indent_4 {
  text-indent: 20px;
}

#xetubsnnmv .gt_indent_5 {
  text-indent: 25px;
}

#xetubsnnmv .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#xetubsnnmv div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table table table-sm table-striped small" data-quarto-postprocess="true" data-quarto-disable-processing="false" data-quarto-bootstrap="false">
<thead>
<tr class="header gt_heading">
<th colspan="7" class="gt_heading gt_title gt_font_normal gt_bottom_border">Total Fledged</th>
</tr>
<tr class="odd gt_col_headings">
<th id="species" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">species</th>
<th id="2019" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2019</th>
<th id="2020" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2020</th>
<th id="2021" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2021</th>
<th id="2022" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2022</th>
<th id="2023" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2023</th>
<th id="2024" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2024</th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_left" headers="species">Black-capped Chickadee</td>
<td class="gt_row gt_right" headers="2019">0</td>
<td class="gt_row gt_right" headers="2020">0</td>
<td class="gt_row gt_right" headers="2021">0</td>
<td class="gt_row gt_right" headers="2022">0</td>
<td class="gt_row gt_right" headers="2023">0</td>
<td class="gt_row gt_right" headers="2024">0</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="species">Eastern Bluebird</td>
<td class="gt_row gt_right" headers="2019">78</td>
<td class="gt_row gt_right" headers="2020">103</td>
<td class="gt_row gt_right" headers="2021">90</td>
<td class="gt_row gt_right" headers="2022">104</td>
<td class="gt_row gt_right" headers="2023">90</td>
<td class="gt_row gt_right" headers="2024">76</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="species">House Wren</td>
<td class="gt_row gt_right" headers="2019">9</td>
<td class="gt_row gt_right" headers="2020">3</td>
<td class="gt_row gt_right" headers="2021">7</td>
<td class="gt_row gt_right" headers="2022">4</td>
<td class="gt_row gt_right" headers="2023">0</td>
<td class="gt_row gt_right" headers="2024">0</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="species">Tree Swallow</td>
<td class="gt_row gt_right" headers="2019">38</td>
<td class="gt_row gt_right" headers="2020">57</td>
<td class="gt_row gt_right" headers="2021">50</td>
<td class="gt_row gt_right" headers="2022">51</td>
<td class="gt_row gt_right" headers="2023">44</td>
<td class="gt_row gt_right" headers="2024">79</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="species">Northern House Wren</td>
<td class="gt_row gt_right" headers="2019">0</td>
<td class="gt_row gt_right" headers="2020">0</td>
<td class="gt_row gt_right" headers="2021">0</td>
<td class="gt_row gt_right" headers="2022">0</td>
<td class="gt_row gt_right" headers="2023">6</td>
<td class="gt_row gt_right" headers="2024">0</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="species">American Robin</td>
<td class="gt_row gt_right" headers="2019">0</td>
<td class="gt_row gt_right" headers="2020">0</td>
<td class="gt_row gt_right" headers="2021">0</td>
<td class="gt_row gt_right" headers="2022">0</td>
<td class="gt_row gt_right" headers="2023">0</td>
<td class="gt_row gt_right" headers="2024">0</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb38-1">annual_species_counts <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb38-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Eastern Bluebird"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Tree Swallow"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb38-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb38-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> site_year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> tot_fledged, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> species)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb38-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total fledged"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total fledged by species"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/nestwatch-data-analysis/analyzing_nestwatch_data_r_files/figure-html/unnamed-chunk-21-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="live-young-by-species" class="level3">
<h3 class="anchored" data-anchor-id="live-young-by-species">Live young by species</h3>
<div class="cell">
<div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb39-1">annual_species_counts <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb39-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"House Sparrow"</span>, species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"American Goldfinch"</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(species)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb39-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(site_year, species) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb39-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.groups=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rowwise"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">live_young =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(tot_live_young)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb39-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">id_cols =</span> species, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from =</span> site_year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from =</span> live_young) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb39-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replace</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(.), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb39-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb39-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_header</span>(</span>
<span id="cb39-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total Live Young"</span>,</span>
<span id="cb39-10">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#subtitle = str_glue("{min_year} to {max_year}")</span></span>
<span id="cb39-11">  )</span></code></pre></div>
<div class="cell-output-display">
<div>
<div id="hllpepnjeb" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#hllpepnjeb table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#hllpepnjeb thead, #hllpepnjeb tbody, #hllpepnjeb tfoot, #hllpepnjeb tr, #hllpepnjeb td, #hllpepnjeb th {
  border-style: none;
}

#hllpepnjeb p {
  margin: 0;
  padding: 0;
}

#hllpepnjeb .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#hllpepnjeb .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#hllpepnjeb .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#hllpepnjeb .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#hllpepnjeb .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#hllpepnjeb .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#hllpepnjeb .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#hllpepnjeb .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#hllpepnjeb .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#hllpepnjeb .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#hllpepnjeb .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#hllpepnjeb .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#hllpepnjeb .gt_spanner_row {
  border-bottom-style: hidden;
}

#hllpepnjeb .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#hllpepnjeb .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#hllpepnjeb .gt_from_md > :first-child {
  margin-top: 0;
}

#hllpepnjeb .gt_from_md > :last-child {
  margin-bottom: 0;
}

#hllpepnjeb .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#hllpepnjeb .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#hllpepnjeb .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#hllpepnjeb .gt_row_group_first td {
  border-top-width: 2px;
}

#hllpepnjeb .gt_row_group_first th {
  border-top-width: 2px;
}

#hllpepnjeb .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#hllpepnjeb .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#hllpepnjeb .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#hllpepnjeb .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#hllpepnjeb .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#hllpepnjeb .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#hllpepnjeb .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#hllpepnjeb .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#hllpepnjeb .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#hllpepnjeb .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#hllpepnjeb .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#hllpepnjeb .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#hllpepnjeb .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#hllpepnjeb .gt_left {
  text-align: left;
}

#hllpepnjeb .gt_center {
  text-align: center;
}

#hllpepnjeb .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#hllpepnjeb .gt_font_normal {
  font-weight: normal;
}

#hllpepnjeb .gt_font_bold {
  font-weight: bold;
}

#hllpepnjeb .gt_font_italic {
  font-style: italic;
}

#hllpepnjeb .gt_super {
  font-size: 65%;
}

#hllpepnjeb .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#hllpepnjeb .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#hllpepnjeb .gt_indent_1 {
  text-indent: 5px;
}

#hllpepnjeb .gt_indent_2 {
  text-indent: 10px;
}

#hllpepnjeb .gt_indent_3 {
  text-indent: 15px;
}

#hllpepnjeb .gt_indent_4 {
  text-indent: 20px;
}

#hllpepnjeb .gt_indent_5 {
  text-indent: 25px;
}

#hllpepnjeb .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#hllpepnjeb div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table table table-sm table-striped small" data-quarto-postprocess="true" data-quarto-disable-processing="false" data-quarto-bootstrap="false">
<thead>
<tr class="header gt_heading">
<th colspan="7" class="gt_heading gt_title gt_font_normal gt_bottom_border">Total Live Young</th>
</tr>
<tr class="odd gt_col_headings">
<th id="species" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">species</th>
<th id="2019" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2019</th>
<th id="2020" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2020</th>
<th id="2021" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2021</th>
<th id="2022" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2022</th>
<th id="2023" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2023</th>
<th id="2024" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2024</th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_left" headers="species">Black-capped Chickadee</td>
<td class="gt_row gt_right" headers="2019">8</td>
<td class="gt_row gt_right" headers="2020">0</td>
<td class="gt_row gt_right" headers="2021">0</td>
<td class="gt_row gt_right" headers="2022">0</td>
<td class="gt_row gt_right" headers="2023">0</td>
<td class="gt_row gt_right" headers="2024">0</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="species">Eastern Bluebird</td>
<td class="gt_row gt_right" headers="2019">79</td>
<td class="gt_row gt_right" headers="2020">104</td>
<td class="gt_row gt_right" headers="2021">91</td>
<td class="gt_row gt_right" headers="2022">110</td>
<td class="gt_row gt_right" headers="2023">112</td>
<td class="gt_row gt_right" headers="2024">86</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="species">House Wren</td>
<td class="gt_row gt_right" headers="2019">9</td>
<td class="gt_row gt_right" headers="2020">3</td>
<td class="gt_row gt_right" headers="2021">10</td>
<td class="gt_row gt_right" headers="2022">4</td>
<td class="gt_row gt_right" headers="2023">0</td>
<td class="gt_row gt_right" headers="2024">0</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="species">Tree Swallow</td>
<td class="gt_row gt_right" headers="2019">42</td>
<td class="gt_row gt_right" headers="2020">58</td>
<td class="gt_row gt_right" headers="2021">52</td>
<td class="gt_row gt_right" headers="2022">52</td>
<td class="gt_row gt_right" headers="2023">50</td>
<td class="gt_row gt_right" headers="2024">87</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="species">Northern House Wren</td>
<td class="gt_row gt_right" headers="2019">0</td>
<td class="gt_row gt_right" headers="2020">0</td>
<td class="gt_row gt_right" headers="2021">0</td>
<td class="gt_row gt_right" headers="2022">0</td>
<td class="gt_row gt_right" headers="2023">6</td>
<td class="gt_row gt_right" headers="2024">0</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="species">American Robin</td>
<td class="gt_row gt_right" headers="2019">0</td>
<td class="gt_row gt_right" headers="2020">0</td>
<td class="gt_row gt_right" headers="2021">0</td>
<td class="gt_row gt_right" headers="2022">0</td>
<td class="gt_row gt_right" headers="2023">0</td>
<td class="gt_row gt_right" headers="2024">0</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb40-1">annual_species_counts <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb40-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Eastern Bluebird"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Tree Swallow"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb40-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb40-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> site_year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> tot_live_young, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> species)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb40-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total live young"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total live young by species"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/nestwatch-data-analysis/analyzing_nestwatch_data_r_files/figure-html/unnamed-chunk-22-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="hatch-rate" class="level3">
<h3 class="anchored" data-anchor-id="hatch-rate">Hatch rate</h3>
<p>The hatch rate is the total number of live young divided by the max clutch size observed during the nesting event.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb41" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb41-1">annual_species_counts <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb41-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Eastern Bluebird"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Tree Swallow"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"House Wren"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb41-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(site_year, species) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb41-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.groups=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rowwise"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hatch_rate =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(overall_hatch_rate_2)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb41-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">id_cols =</span> species, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from =</span> site_year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from =</span> hatch_rate) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb41-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replace</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(.), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb41-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb41-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_header</span>(</span>
<span id="cb41-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total Hatch Rate"</span>,</span>
<span id="cb41-10">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#subtitle = str_glue("{min_year} to {max_year}")</span></span>
<span id="cb41-11">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb41-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fmt_number</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">starts_with</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">decimals =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<div id="eaoacfixnn" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#eaoacfixnn table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#eaoacfixnn thead, #eaoacfixnn tbody, #eaoacfixnn tfoot, #eaoacfixnn tr, #eaoacfixnn td, #eaoacfixnn th {
  border-style: none;
}

#eaoacfixnn p {
  margin: 0;
  padding: 0;
}

#eaoacfixnn .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#eaoacfixnn .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#eaoacfixnn .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#eaoacfixnn .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#eaoacfixnn .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#eaoacfixnn .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#eaoacfixnn .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#eaoacfixnn .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#eaoacfixnn .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#eaoacfixnn .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#eaoacfixnn .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#eaoacfixnn .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#eaoacfixnn .gt_spanner_row {
  border-bottom-style: hidden;
}

#eaoacfixnn .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#eaoacfixnn .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#eaoacfixnn .gt_from_md > :first-child {
  margin-top: 0;
}

#eaoacfixnn .gt_from_md > :last-child {
  margin-bottom: 0;
}

#eaoacfixnn .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#eaoacfixnn .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#eaoacfixnn .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#eaoacfixnn .gt_row_group_first td {
  border-top-width: 2px;
}

#eaoacfixnn .gt_row_group_first th {
  border-top-width: 2px;
}

#eaoacfixnn .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#eaoacfixnn .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#eaoacfixnn .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#eaoacfixnn .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#eaoacfixnn .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#eaoacfixnn .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#eaoacfixnn .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#eaoacfixnn .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#eaoacfixnn .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#eaoacfixnn .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#eaoacfixnn .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#eaoacfixnn .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#eaoacfixnn .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#eaoacfixnn .gt_left {
  text-align: left;
}

#eaoacfixnn .gt_center {
  text-align: center;
}

#eaoacfixnn .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#eaoacfixnn .gt_font_normal {
  font-weight: normal;
}

#eaoacfixnn .gt_font_bold {
  font-weight: bold;
}

#eaoacfixnn .gt_font_italic {
  font-style: italic;
}

#eaoacfixnn .gt_super {
  font-size: 65%;
}

#eaoacfixnn .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#eaoacfixnn .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#eaoacfixnn .gt_indent_1 {
  text-indent: 5px;
}

#eaoacfixnn .gt_indent_2 {
  text-indent: 10px;
}

#eaoacfixnn .gt_indent_3 {
  text-indent: 15px;
}

#eaoacfixnn .gt_indent_4 {
  text-indent: 20px;
}

#eaoacfixnn .gt_indent_5 {
  text-indent: 25px;
}

#eaoacfixnn .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#eaoacfixnn div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table table table-sm table-striped small" data-quarto-postprocess="true" data-quarto-disable-processing="false" data-quarto-bootstrap="false">
<thead>
<tr class="header gt_heading">
<th colspan="7" class="gt_heading gt_title gt_font_normal gt_bottom_border">Total Hatch Rate</th>
</tr>
<tr class="odd gt_col_headings">
<th id="species" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">species</th>
<th id="2019" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2019</th>
<th id="2020" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2020</th>
<th id="2021" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2021</th>
<th id="2022" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2022</th>
<th id="2023" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2023</th>
<th id="2024" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2024</th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_left" headers="species">Eastern Bluebird</td>
<td class="gt_row gt_right" headers="2019">0.89</td>
<td class="gt_row gt_right" headers="2020">0.88</td>
<td class="gt_row gt_right" headers="2021">0.88</td>
<td class="gt_row gt_right" headers="2022">0.99</td>
<td class="gt_row gt_right" headers="2023">0.85</td>
<td class="gt_row gt_right" headers="2024">0.84</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="species">House Wren</td>
<td class="gt_row gt_right" headers="2019">0.90</td>
<td class="gt_row gt_right" headers="2020">1.00</td>
<td class="gt_row gt_right" headers="2021">0.71</td>
<td class="gt_row gt_right" headers="2022">Inf</td>
<td class="gt_row gt_right" headers="2023">0.00</td>
<td class="gt_row gt_right" headers="2024">0.00</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="species">Tree Swallow</td>
<td class="gt_row gt_right" headers="2019">0.55</td>
<td class="gt_row gt_right" headers="2020">0.77</td>
<td class="gt_row gt_right" headers="2021">0.79</td>
<td class="gt_row gt_right" headers="2022">0.79</td>
<td class="gt_row gt_right" headers="2023">0.91</td>
<td class="gt_row gt_right" headers="2024">0.93</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb42-1">annual_species_counts <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb42-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Eastern Bluebird"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Tree Swallow"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb42-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb42-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> site_year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> overall_hatch_rate_2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> species)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb42-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hatch rate"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hatch rate by species"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/nestwatch-data-analysis/analyzing_nestwatch_data_r_files/figure-html/unnamed-chunk-23-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="fledge-rate" class="level3">
<h3 class="anchored" data-anchor-id="fledge-rate">Fledge rate</h3>
<p>The fledge rate is the total number of birds fledged divided by the total number of live young.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb43-1">annual_species_counts <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Eastern Bluebird"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Tree Swallow"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"House Wren"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(site_year, species) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.groups=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rowwise"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fledge_rate =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(overall_fledge_rate)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">id_cols =</span> species, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from =</span> site_year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from =</span> fledge_rate) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_header</span>(</span>
<span id="cb43-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total Fledge Rate"</span>,</span>
<span id="cb43-9">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#subtitle = str_glue("{min_year} to {max_year}")</span></span>
<span id="cb43-10">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fmt_number</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">starts_with</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">decimals =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<div id="xwjdrpyhhf" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#xwjdrpyhhf table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#xwjdrpyhhf thead, #xwjdrpyhhf tbody, #xwjdrpyhhf tfoot, #xwjdrpyhhf tr, #xwjdrpyhhf td, #xwjdrpyhhf th {
  border-style: none;
}

#xwjdrpyhhf p {
  margin: 0;
  padding: 0;
}

#xwjdrpyhhf .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#xwjdrpyhhf .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#xwjdrpyhhf .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#xwjdrpyhhf .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#xwjdrpyhhf .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#xwjdrpyhhf .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#xwjdrpyhhf .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#xwjdrpyhhf .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#xwjdrpyhhf .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#xwjdrpyhhf .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#xwjdrpyhhf .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#xwjdrpyhhf .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#xwjdrpyhhf .gt_spanner_row {
  border-bottom-style: hidden;
}

#xwjdrpyhhf .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#xwjdrpyhhf .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#xwjdrpyhhf .gt_from_md > :first-child {
  margin-top: 0;
}

#xwjdrpyhhf .gt_from_md > :last-child {
  margin-bottom: 0;
}

#xwjdrpyhhf .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#xwjdrpyhhf .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#xwjdrpyhhf .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#xwjdrpyhhf .gt_row_group_first td {
  border-top-width: 2px;
}

#xwjdrpyhhf .gt_row_group_first th {
  border-top-width: 2px;
}

#xwjdrpyhhf .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#xwjdrpyhhf .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#xwjdrpyhhf .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#xwjdrpyhhf .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#xwjdrpyhhf .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#xwjdrpyhhf .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#xwjdrpyhhf .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#xwjdrpyhhf .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#xwjdrpyhhf .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#xwjdrpyhhf .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#xwjdrpyhhf .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#xwjdrpyhhf .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#xwjdrpyhhf .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#xwjdrpyhhf .gt_left {
  text-align: left;
}

#xwjdrpyhhf .gt_center {
  text-align: center;
}

#xwjdrpyhhf .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#xwjdrpyhhf .gt_font_normal {
  font-weight: normal;
}

#xwjdrpyhhf .gt_font_bold {
  font-weight: bold;
}

#xwjdrpyhhf .gt_font_italic {
  font-style: italic;
}

#xwjdrpyhhf .gt_super {
  font-size: 65%;
}

#xwjdrpyhhf .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#xwjdrpyhhf .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#xwjdrpyhhf .gt_indent_1 {
  text-indent: 5px;
}

#xwjdrpyhhf .gt_indent_2 {
  text-indent: 10px;
}

#xwjdrpyhhf .gt_indent_3 {
  text-indent: 15px;
}

#xwjdrpyhhf .gt_indent_4 {
  text-indent: 20px;
}

#xwjdrpyhhf .gt_indent_5 {
  text-indent: 25px;
}

#xwjdrpyhhf .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#xwjdrpyhhf div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table table table-sm table-striped small" data-quarto-postprocess="true" data-quarto-disable-processing="false" data-quarto-bootstrap="false">
<thead>
<tr class="header gt_heading">
<th colspan="7" class="gt_heading gt_title gt_font_normal gt_bottom_border">Total Fledge Rate</th>
</tr>
<tr class="odd gt_col_headings">
<th id="species" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">species</th>
<th id="2019" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2019</th>
<th id="2020" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2020</th>
<th id="2021" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2021</th>
<th id="2022" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2022</th>
<th id="2023" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2023</th>
<th id="2024" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">2024</th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_left" headers="species">Eastern Bluebird</td>
<td class="gt_row gt_right" headers="2019">0.99</td>
<td class="gt_row gt_right" headers="2020">0.99</td>
<td class="gt_row gt_right" headers="2021">0.99</td>
<td class="gt_row gt_right" headers="2022">0.95</td>
<td class="gt_row gt_right" headers="2023">0.80</td>
<td class="gt_row gt_right" headers="2024">0.88</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="species">House Wren</td>
<td class="gt_row gt_right" headers="2019">1.00</td>
<td class="gt_row gt_right" headers="2020">1.00</td>
<td class="gt_row gt_right" headers="2021">0.70</td>
<td class="gt_row gt_right" headers="2022">1.00</td>
<td class="gt_row gt_right" headers="2023">NA</td>
<td class="gt_row gt_right" headers="2024">NA</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="species">Tree Swallow</td>
<td class="gt_row gt_right" headers="2019">0.90</td>
<td class="gt_row gt_right" headers="2020">0.98</td>
<td class="gt_row gt_right" headers="2021">0.96</td>
<td class="gt_row gt_right" headers="2022">0.98</td>
<td class="gt_row gt_right" headers="2023">0.88</td>
<td class="gt_row gt_right" headers="2024">0.91</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb44-1">annual_species_counts <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb44-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Eastern Bluebird"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Tree Swallow"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb44-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb44-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> site_year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> overall_fledge_rate, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> species)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb44-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Fledge rate"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Fledge rate by species"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/nestwatch-data-analysis/analyzing_nestwatch_data_r_files/figure-html/unnamed-chunk-24-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>We’ll stop here for now. In the next installment we’ll look to see if there are differences in nesting success in Peterson boxes versus the traditional boxes.</p>


</section>
</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{isken2025,
  author = {Isken, Mark},
  title = {Analyzing {NestWatch} Data with {R}},
  date = {2025-03-01},
  url = {https://bitsofanalytics.org//posts/nestwatch-data-analysis/analyzing_nestwatch_data_r.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-isken2025" class="csl-entry quarto-appendix-citeas">
Isken, Mark. 2025. <span>“Analyzing NestWatch Data with R.”</span> March
1, 2025. <a href="https://bitsofanalytics.org//posts/nestwatch-data-analysis/analyzing_nestwatch_data_r.html">https://bitsofanalytics.org//posts/nestwatch-data-analysis/analyzing_nestwatch_data_r.html</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>ecology</category>
  <category>birding</category>
  <category>nestwatch</category>
  <guid>https://bitsofanalytics.org/posts/nestwatch-data-analysis/analyzing_nestwatch_data_r.html</guid>
  <pubDate>Sat, 01 Mar 2025 05:00:00 GMT</pubDate>
</item>
<item>
  <title>Using R to explore and visualize decoupling</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling.html</link>
  <description><![CDATA[ 




<p>Decoupling is the notion that economic growth can continue without accompanying increases in energy use, greenhouse gas emissions, resource use, biodiversity loss, pollution and other adverse environmental and sociological impacts. This phenomena is highlighted and relied upon as a critical linchpin in the <a href="https://www.lombardodier.com/contents/corporate-news/responsible-capital/2019/november/can-we-be-green-and-grow.html">green growth movement</a>. Without decoupling, it is hard to see how economic growth can continue indefinitely on a planet with finite resources and a finite capacity to absorb the byproducts of perpetual growth.</p>
<p>In this post we are going to use R to explore decoupling and hopefully learn a few ggplot things along the way. The data we’ll use is from the well known <a href="https://ourworldindata.org/">Our World in Data</a> (OWID) site which has done a great job of making well curated and carefully processed data about our world available for all of us. Their lead researcher is data scientist Hannah Ritchie, who recently wrote a book entitled <a href="https://www.abebooks.com/servlet/BookDetailsPL?bi=31942233688&amp;dest=usa&amp;ref_=ps_ggl_18382194370&amp;cm_mmc=ggl-_-US_Shopp_Trade0to10-_-product_id=COM9780316536752NEW-_-keyword=&amp;gad_source=1&amp;gclid=EAIaIQobChMI3vnJy7vOiwMVJDIIBR0qwzEDEAQYASABEgLRI_D_BwE">“Not the End of the World: How We Can Be the First Generation to Build a Sustainable Planet”</a> (<span class="citation" data-cites="ritchie2024not">Ritchie (2024)</span>) which features numerous visualizations based on OWID data.</p>
<p>We’ll need a few libraries.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dplyr)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyr)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggplot2)</span></code></pre></div>
</div>
<section id="creating-the-dataset" class="level2">
<h2 class="anchored" data-anchor-id="creating-the-dataset">Creating the dataset</h2>
<p>In the decoupling conversation, GDP is generally used as the measure of economic growth - one of the things that is coupled. It is the <em>driving variable</em>. GDP is tricky to measure and not necessarily a great measure of <em>prosperity</em>. For the <em>driven variable</em>, many possibilities exist but essentially it boils down to resource use or its impact (<span class="citation" data-cites="vaden2020decoupling">Vadén et al. (2020)</span>). So, decoupling could occur, or not occur, for any subset of potential driven variables and might involve resource use or environmental impact. It also may be temporary or permanent. Its magnitude may be sufficient to avert some pretty bad outcomes or it may not be. Decoupling might be local, regional or global. Of course, the earth probably isn’t interested in local decoupling unless the ideas underpinning it lead to global decoupling.</p>
<p>I created two dataframes, one at the country level and one at the global level containing a range of variables that would seem to be relevant and for which data was generally available from 1980 until the near present. Data acquisition and processing scripts were written using R and can be found in the <code>data_prep.qmd</code> file in the <a href="https://github.com/misken/decoupling">GitHub repository for this post</a>. The variables included in the global dataframe include:</p>
<ul>
<li><code>year</code> - four digit year,</li>
<li><code>population</code> - see <a href="https://ourworldindata.org/grapher/population" class="uri">https://ourworldindata.org/grapher/population</a>,</li>
<li><code>gdp_per_capita</code> - measured in international $ pegged to year 2011 and adjusted for inflation and differences in cost of living - see <a href="https://ourworldindata.org/grapher/gdp-per-capita-maddison-project-database" class="uri">https://ourworldindata.org/grapher/gdp-per-capita-maddison-project-database</a>. For the global metrics file, the Maddison source for GDP has many missing values at the World level. So, we are populating the global dataframe from FRED for this metric - see <a href="https://fred.stlouisfed.org/series/NYGDPPCAPKDWLD" class="uri">https://fred.stlouisfed.org/series/NYGDPPCAPKDWLD</a>. The units are 2010 dollars.</li>
<li><code>primary_energy_consumption_per_capita__kwh</code> - see <a href="https://ourworldindata.org/grapher/per-capita-energy-use" class="uri">https://ourworldindata.org/grapher/per-capita-energy-use</a>. Units are kilowatt-hours per person.</li>
<li><code>annual_emissions_ghg_total_co2eq_per_capita</code> - a measure of greenhouse gas emissions per person which includes land use and forestry in addition to fossil fuel use. Greenhouse gas emissions are measured in tonnes per person of carbon dioxide-equivalents over a 100-year timescale. See <a href="https://ourworldindata.org/grapher/total-greenhouse-gas-emissions-per-capita" class="uri">https://ourworldindata.org/grapher/total-greenhouse-gas-emissions-per-capita</a>.</li>
<li><code>annual_emissions_ghg_fossil_co2eq_per_capita</code> - same as previous but emissions due to land use changes and forestry are excluded. See <a href="https://ourworldindata.org/grapher/per-capita-ghg-excl-land-use" class="uri">https://ourworldindata.org/grapher/per-capita-ghg-excl-land-use</a>.</li>
<li><code>*_consumption_twh</code> - consumption of various fossil fuels (gas, oil and coal and their total) in terawatt-hours. See <a href="https://ourworldindata.org/grapher/fossil-fuel-consumption-by-fuel-type" class="uri">https://ourworldindata.org/grapher/fossil-fuel-consumption-by-fuel-type</a>.</li>
<li><code>production.*.Mine.tonnes</code> - cobalt, copper, lithium, nickel and steel production - see <a href="https://ourworldindata.org/metals-minerals" class="uri">https://ourworldindata.org/metals-minerals</a>.</li>
<li><code>freshwater_withdrawals_m3</code> - total water withdrawals, not counting evaporation and in cubic meters. See <a href="https://ourworldindata.org/grapher/annual-freshwater-withdrawals" class="uri">https://ourworldindata.org/grapher/annual-freshwater-withdrawals</a>.</li>
<li><code>mean_surface_temp_2m</code> - mean temperature of the earth in Celsius at 2m above the surface. See <a href="https://ourworldindata.org/grapher/average-monthly-surface-temperature" class="uri">https://ourworldindata.org/grapher/average-monthly-surface-temperature</a>.</li>
<li><code>ocean_heat_content_*_2000m</code> - amount of heat in the top 2000m of the oceans. See <a href="https://ourworldindata.org/grapher/ocean-heat-top-2000m" class="uri">https://ourworldindata.org/grapher/ocean-heat-top-2000m</a>. The units are relative to 1971 and based on a large multiple of joules.</li>
<li><code>under_five_mortality</code> - number of deaths per 100 births before the age of 5. See <a href="https://ourworldindata.org/grapher/child-mortality" class="uri">https://ourworldindata.org/grapher/child-mortality</a>.</li>
<li><code>children_per_woman</code> - See <a href="https://ourworldindata.org/grapher/children-per-woman-un" class="uri">https://ourworldindata.org/grapher/children-per-woman-un</a>.</li>
<li><code>gini_coefficient</code> - a measure of societal inequality on a [0, 1] scale. Values closer to 1.0 mean more inequality. See <a href="https://ourworldindata.org/what-is-the-gini-coefficient" class="uri">https://ourworldindata.org/what-is-the-gini-coefficient</a>.</li>
<li><code>life_expectancy</code> - in years at birth. See <a href="https://ourworldindata.org/grapher/life-expectancy" class="uri">https://ourworldindata.org/grapher/life-expectancy</a>.</li>
</ul>
<p>The country level dataframe includes a <code>country</code> identifier. The following aggregate country related fields were populated using the <a href="https://cran.r-project.org/web/packages/countrycode/index.html">countrycode</a> package.</p>
<ul>
<li><code>continent</code> - each country belongs to a continent.</li>
<li><code>region23</code> - each country is classified as being part of one of twenty-three regions</li>
<li><code>iso3c</code> - a unique 3 character code for each country</li>
</ul>
<p>There are a few computed fields:</p>
<ul>
<li><code>primary_energy_consumption_Twh</code> - product of <code>population</code> and <code>primary_energy_consumption_per_capita__kwh</code> and rescaled to terrawatt hours.</li>
<li><code>annual_emissions_ghg_total_co2eq</code> - product of <code>population</code> and <code>annual_emissions_ghg_total_co2eq_per_capita</code>.</li>
<li><code>annual_emissions_ghg_fossil_co2eq</code> - product of <code>population</code> and <code>annual_emissions_ghg_fossil_co2eq_per_capita</code>.</li>
<li><code>gdp_total</code> - product of <code>population</code> and <code>gdp_per_capita</code>.</li>
</ul>
<p>The basic data processing steps were:</p>
<ul>
<li>download a number of different datasets using the <a href="https://docs.owid.io/projects/etl/api/">Our World in Data API</a> into individual dataframes,</li>
<li>combine all the dataframes by left joining on <code>Entity</code> and <code>Year</code>,</li>
<li>use <code>dplyr::filter()</code> to only keep data from 1980 on,</li>
<li>use <code>dplyr::filter()</code> along with the <code>grepl()</code> function to get rid of rows that are not associated with countries (e.g.&nbsp;continents),</li>
<li>create a separate global level dataframe (filtering on `Entity == ‘World’),</li>
<li>add variables to the global dataframe that aren’t relevant at the country level,</li>
<li>create a few computed features,</li>
<li>added codes such as <code>region23</code> and <code>iso3c</code>,</li>
<li>exported the dataframes to CSV files.</li>
</ul>
<p>The final result is two CSV files, <code>country_metrics.csv</code> and <code>world_metrics.csv</code>. Let’s read these in and take a look.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">country_metrics_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read.csv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/owid/country_metrics.csv'</span>)</span>
<span id="cb2-2">world_metrics_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read.csv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/owid/world_metrics.csv'</span>)</span></code></pre></div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(world_metrics_df)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>    Entity               year        population       
 Length:44          Min.   :1980   Min.   :4.448e+09  
 Class :character   1st Qu.:1991   1st Qu.:5.396e+09  
 Mode  :character   Median :2002   Median :6.296e+09  
                    Mean   :2002   Mean   :6.298e+09  
                    3rd Qu.:2012   3rd Qu.:7.224e+09  
                    Max.   :2023   Max.   :8.092e+09  
                                                      
 primary_energy_consumption_per_capita__kwh
 Min.   :16753                             
 1st Qu.:17641                             
 Median :18067                             
 Mean   :18891                             
 3rd Qu.:20448                             
 Max.   :21394                             
                                           
 annual_emissions_ghg_total_co2eq_per_capita
 Min.   :6.440                              
 1st Qu.:6.753                              
 Median :6.907                              
 Mean   :6.904                              
 3rd Qu.:7.093                              
 Max.   :7.498                              
                                            
 annual_emissions_ghg_fossil_co2eq_per_capita gas_consumption_twh
 Min.   :4.817                                Min.   :14237      
 1st Qu.:4.961                                1st Qu.:19850      
 Median :5.160                                Median :24673      
 Mean   :5.205                                Mean   :26128      
 3rd Qu.:5.454                                3rd Qu.:33333      
 Max.   :5.652                                Max.   :40239      
                                                                 
 coal_consumption_twh oil_consumption_twh tot_fossil_fuel_consumption_twh
 Min.   :20878        Min.   :33100       Min.   : 69136                 
 1st Qu.:25766        1st Qu.:37668       1st Qu.: 83248                 
 Median :28416        Median :43508       Median : 96597                 
 Mean   :33040        Mean   :43571       Mean   :102739                 
 3rd Qu.:42851        3rd Qu.:48850       3rd Qu.:126842                 
 Max.   :45565        Max.   :54564       Max.   :140231                 
                                                                         
 production.Cobalt.Mine.tonnes production.Copper.Mine.tonnes
 Min.   : 18000                Min.   : 7200000             
 1st Qu.: 35700                1st Qu.: 9297500             
 Median : 51800                Median :13650000             
 Mean   : 73541                Mean   :13704773             
 3rd Qu.:104750                3rd Qu.:17125000             
 Max.   :230000                Max.   :22000000             
                                                            
 production.Lithium.Mine.tonnes production.Nickel.Mine.tonnes
 Min.   : 13100                 Min.   : 621000              
 1st Qu.: 19650                 1st Qu.: 968500              
 Median : 30650                 Median :1350000              
 Mean   : 48612                 Mean   :1540250              
 3rd Qu.: 77925                 3rd Qu.:2115000              
 Max.   :180000                 Max.   :3600000              
 NA's   :20                                                  
 production.Steel.Processing..crude.tonnes freshwater_withdrawals_m3
 Min.   :6.440e+08                         Min.   :3.856e+12        
 1st Qu.:7.362e+08                         1st Qu.:3.875e+12        
 Median :8.795e+08                         Median :3.900e+12        
 Mean   :1.127e+09                         Mean   :3.898e+12        
 3rd Qu.:1.542e+09                         3rd Qu.:3.920e+12        
 Max.   :1.950e+09                         Max.   :3.949e+12        
                                           NA's   :36               
 mean_surface_temp_2m under_five_mortality children_per_woman life_expectancy
 Min.   :13.85        Min.   :3.712        Min.   :2.251      Min.   :60.50  
 1st Qu.:14.10        1st Qu.:4.425        1st Qu.:2.584      1st Qu.:64.04  
 Median :14.29        Median :6.032        Median :2.699      Median :66.91  
 Mean   :14.31        Mean   :6.287        Mean   :2.865      Mean   :67.11  
 3rd Qu.:14.47        3rd Qu.:8.143        3rd Qu.:3.176      3rd Qu.:70.83  
 Max.   :14.97        Max.   :9.303        Max.   :3.744      Max.   :73.17  
                      NA's   :11                                             
 gdp_per_capita  ocean_heat_content_noaa_2000m ocean_heat_content_mri_2000m
 Min.   : 5897   Min.   :-2.249                Min.   :-2.8327             
 1st Qu.: 6791   1st Qu.: 1.666                1st Qu.: 0.9673             
 Median : 7953   Median : 8.016                Median : 7.5673             
 Mean   : 8263   Mean   : 9.976                Mean   :10.2786             
 3rd Qu.: 9651   3rd Qu.:16.803                3rd Qu.:17.0373             
 Max.   :11579   Max.   :29.105                Max.   :29.2773             
                                               NA's   :3                   
 ocean_heat_content_iap_2000m primary_energy_consumption_Twh
 Min.   :-4.85103             Min.   : 77357                
 1st Qu.: 0.02597             1st Qu.: 96337                
 Median : 8.30797             Median :113072                
 Mean   :10.20616             Mean   :120461                
 3rd Qu.:18.53997             3rd Qu.:148203                
 Max.   :29.42097             Max.   :173112                
 NA's   :3                                                  
 annual_emissions_ghg_total_co2eq annual_emissions_ghg_fossil_co2eq
 Min.   :3.232e+10                Min.   :2.302e+10                
 1st Qu.:3.733e+10                1st Qu.:2.722e+10                
 Median :4.150e+10                Median :3.070e+10                
 Mean   :4.337e+10                Mean   :3.298e+10                
 3rd Qu.:5.116e+10                3rd Qu.:4.078e+10                
 Max.   :5.382e+10                Max.   :4.411e+10                
                                                                   
   gdp_total        
 Min.   :2.657e+13  
 1st Qu.:3.658e+13  
 Median :5.007e+13  
 Mean   :5.389e+13  
 3rd Qu.:6.972e+13  
 Max.   :9.369e+13  
                    </code></pre>
</div>
</div>
<p>The <code>country_metrics_df</code> dataframe has most of the same columns but is by country by year.</p>
</section>
<section id="defining-decoupling-is-tricky" class="level2">
<h2 class="anchored" data-anchor-id="defining-decoupling-is-tricky">Defining decoupling is tricky</h2>
<p>Decoupling is a complex topic. I highly recommend <span class="citation" data-cites="parrique2019decoupling">Parrique et al. (2019)</span> to get a concise overview of what decoupling is and analysis of whether or not or to what degree it is happening and is likely to happen in the future.</p>
<ul>
<li>Parrique, Timothée, Jonathan Barth, François Briens, Christian Kerschner, Alejo Kraus-Polk, Anna Kuokkanen, and Joachim H Spangenberg. <a href="https://eeb.org/library/decoupling-debunked/">“Decoupling Debunked.” Evidence and Arguments against Green Growth as a Sole Strategy for Sustainability</a>. A Study Edited by the European Environment Bureau EEB 3 (2019).</li>
</ul>
<p>For this post, a couple of concepts are important. <em>Relative decoupling</em> is when the driven variable is growing slower than the driving variable (i.e.&nbsp;GDP). <em>Absolute decoupling</em> is when the driving variable is not growing (decreasing hopefully) even though the driving variable continues to grow. In the EEB study, it is argued that:</p>
<blockquote class="blockquote">
<p>The validity of the green growth discourse relies on the assumption of an absolute, permanent, global, large and fast enough decoupling of economic growth from all critical environmental pressures.</p>
</blockquote>
<p>There’s a lot to chew on there and we certainly aren’t going to get into all of it in this post. Instead, this is just a glimpse at a handful of metrics to see if there’s evidence of decoupling and if it’s relative or absolute. We’ll use R to do it and hopefully learn a few dplyr, tidyr, and ggplot2 tricks along the way.</p>
</section>
<section id="basic-time-series-plots-of-gdp-vs-primary-energy-consumption" class="level2">
<h2 class="anchored" data-anchor-id="basic-time-series-plots-of-gdp-vs-primary-energy-consumption">Basic time series plots of GDP vs primary energy consumption</h2>
<p>What if we want to compare two time series such as <code>gdp_total</code> and <code>primary_energy_consumption_Twh</code>? The units are different so we really shouldn’t plot together with two y-axes. We could facet them using a long version of the data.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total, primary_energy_consumption_Twh) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total, primary_energy_consumption_Twh), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb5-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_grid</span>(metric <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'free_y'</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>The slopes look similar but that’s just due to the axis limits on the graphs. We could also scatter them against each other.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> gdp_total, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> primary_energy_consumption_Twh, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> year)) </span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-5-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Looks like GDP and PEC (primary energy consumption) are not decoupled.</p>
<p>Another option is to index the different series to year 1. Since <code>world_metrics_df</code> is ordered by year, this is relatively straightforward to do for that dataframe. However, for <code>country_metrics_df</code> this will require some careful dplyr use.</p>
<p>For <code>world_metrics_df</code> we can use <code>mutate()</code> to compute the indexed series. For this first plot, I’ll use separate <code>geom_line()</code> objects. The downside of doing plots in this way is that it’s tricky to create a legend. You have to map color in <code>aes()</code> to a string representing the variable as you want it to read in the legend and then use <code>scale_color_manual</code> with a named vector of colors. See <a href="https://forum.posit.co/t/adding-manual-legend-to-ggplot2/41651" class="uri">https://forum.posit.co/t/adding-manual-legend-to-ggplot2/41651</a>. That is not straight forward and definitely not the “tidy” way to do it.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">colors <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gdp_total_idx"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"primary_energy_consumption_Twh_idx"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>)</span>
<span id="cb7-2"></span>
<span id="cb7-3">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb7-5">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">primary_energy_consumption_Twh_idx =</span> primary_energy_consumption_Twh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> primary_energy_consumption_Twh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> gdp_total_idx, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gdp_total_idx"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'solid'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> primary_energy_consumption_Twh_idx, </span>
<span id="cb7-9">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"primary_energy_consumption_Twh_idx"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'dashed'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, </span>
<span id="cb7-11">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Legend"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Primary energy consumption"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-12">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> colors)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-6-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>From this plot, we can see that the two series exhibit relative decoupling. The overall rate of increase for PEC is less than that of GDP, but it’s still increasing.</p>
<p>Instead of two geoms, we could pivot to long data and filter and facet by metric. This gives us the automatic legend creation and is the tidy approach.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb8-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb8-3">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">primary_energy_consumption_Twh_idx =</span> primary_energy_consumption_Twh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> primary_energy_consumption_Twh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb8-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, primary_energy_consumption_Twh_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb8-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, primary_energy_consumption_Twh_idx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb8-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, </span>
<span id="cb8-10">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Primary Energy Consumption"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-7-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<p>Let’s take a closer look at the most recent few years.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb9-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2020</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb9-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb9-4">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">primary_energy_consumption_Twh_idx =</span> primary_energy_consumption_Twh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> primary_energy_consumption_Twh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb9-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, primary_energy_consumption_Twh_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb9-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, primary_energy_consumption_Twh_idx), </span>
<span id="cb9-7">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb9-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (2020 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, </span>
<span id="cb9-12">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Primary Energy Consumption"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-8-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<p>What if we use per-capita values for these same two metrics?</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb10-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_per_capita_idx =</span> gdp_per_capita <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_per_capita[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb10-3">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">primary_energy_consumption_per_capita__kwh_idx =</span> primary_energy_consumption_per_capita__kwh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> primary_energy_consumption_per_capita__kwh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb10-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_per_capita_idx, primary_energy_consumption_per_capita__kwh_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb10-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_per_capita_idx, primary_energy_consumption_per_capita__kwh_idx), </span>
<span id="cb10-6">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb10-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb10-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb10-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb10-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, </span>
<span id="cb10-11">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Primary Energy Consumption (per capita)"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-9-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<p>So, one sees a greater degree of relative decoupling in the per capita versions of these two metrics. From the earth’s perspective, it doesn’t really care so much about per capita statistics - it’s the overall amount that really matters.</p>
<p>Looking at the last few years, the slopes of the two series are more similar than in the plot starting at 1980. When looking at decoupling, the date range can matter. One might get relative (or even absolute) decoupling for a period of time, only to see that revert to more coupled behavior.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb11-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2020</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb11-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_per_capita_idx =</span> gdp_per_capita <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_per_capita[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb11-4">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">primary_energy_consumption_per_capita__kwh_idx =</span> primary_energy_consumption_per_capita__kwh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> primary_energy_consumption_per_capita__kwh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb11-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_per_capita_idx, primary_energy_consumption_per_capita__kwh_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb11-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_per_capita_idx, primary_energy_consumption_per_capita__kwh_idx), </span>
<span id="cb11-7">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb11-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb11-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb11-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb11-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (2020 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>,</span>
<span id="cb11-12">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Primary Energy Consumption (per capita)"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-10-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="greenhouse-gas-emissions" class="level2">
<h2 class="anchored" data-anchor-id="greenhouse-gas-emissions">Greenhouse gas emissions</h2>
<p>There was a high profile study done by the World Resources Institute in 2016 entitled <a href="https://www.wri.org/insights/roads-decoupling-21-countries-are-reducing-carbon-emissions-while-growing-gdp">The Roads to Decoupling: 21 Countries Are Reducing Carbon Emissions While Growing GDP</a>. Reading the report, it would seem the analysis was production based and not consumption based - a comment is made near the end:</p>
<blockquote class="blockquote">
<p>Beyond the aggregate trends described here, more information is needed on the potential leakage of carbon emissions to other countries as nations move their industries overseas, factors that enable sustained and absolute decoupling, and what’s needed to support larger-scale emissions mitigation.</p>
</blockquote>
<p>Greenhouse gas emissions are measured in tonnes per person of carbon dioxide-equivalents over a 100-year timescale. See <a href="https://ourworldindata.org/grapher/total-greenhouse-gas-emissions-per-capita" class="uri">https://ourworldindata.org/grapher/total-greenhouse-gas-emissions-per-capita</a> and <a href="https://ourworldindata.org/grapher/per-capita-ghg-excl-land-use" class="uri">https://ourworldindata.org/grapher/per-capita-ghg-excl-land-use</a>.</p>
<p>Let’s start by looking at the whole world and just at fossil fuel emissions.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb12-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb12-3">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">annual_emissions_ghg_fossil_co2eq_idx =</span> annual_emissions_ghg_fossil_co2eq <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> annual_emissions_ghg_fossil_co2eq[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb12-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, annual_emissions_ghg_fossil_co2eq_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb12-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, annual_emissions_ghg_fossil_co2eq_idx), </span>
<span id="cb12-6">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb12-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb12-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb12-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb12-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>,</span>
<span id="cb12-11">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Annual GHG emissions (excluding land use)"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-11-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<p>This plot reveals relative decoupling at the global level but still we have emissions increasing. That’s not terribly surprising if one keeps up with the various attempts at forging international agreements for reducing greenhouse gas emissions. There have been decades of talk with not much in terms of results. Meanwhile, <a href="https://www.nature.com/articles/s41558-025-02246-9">the world keeps warming</a> and the <a href="https://www.researchgate.net/profile/Cassio-Pereira-6/publication/384759743_The_2024_state_of_the_climate_report_Perilous_times_on_planet_Earth/links/670686b9cc8c9a65638fc3ef/The-2024-state-of-the-climate-report-Perilous-times-on-planet-Earth.pdf">outlook isn’t great for our climate</a>.</p>
<p>The last few years look like this.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb13-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2020</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb13-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb13-4">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">annual_emissions_ghg_fossil_co2eq_idx =</span> annual_emissions_ghg_fossil_co2eq <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> annual_emissions_ghg_fossil_co2eq[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb13-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, annual_emissions_ghg_fossil_co2eq_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb13-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, annual_emissions_ghg_fossil_co2eq_idx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb13-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb13-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb13-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>))<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb13-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (2020 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>,</span>
<span id="cb13-11">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Annual GHG emissions (excluding land use)"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-12-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<p>Now let’s consider greenhouse gas emissions that also factor in land use changes and forestry.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb14-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb14-3">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">annual_emissions_ghg_total_co2eq_idx =</span> annual_emissions_ghg_total_co2eq <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> annual_emissions_ghg_total_co2eq[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb14-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, annual_emissions_ghg_total_co2eq_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb14-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, annual_emissions_ghg_total_co2eq_idx), </span>
<span id="cb14-6">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb14-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb14-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb14-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb14-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>,</span>
<span id="cb14-11">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Annual GHG emissions (including land use)"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-13-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb15-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2020</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb15-4">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">annual_emissions_ghg_total_co2eq_idx =</span> annual_emissions_ghg_total_co2eq <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> annual_emissions_ghg_total_co2eq[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb15-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, annual_emissions_ghg_total_co2eq_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb15-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, annual_emissions_ghg_total_co2eq_idx), </span>
<span id="cb15-7">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb15-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb15-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb15-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb15-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (2020 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>,</span>
<span id="cb15-12">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Annual GHG emissions (including land use)"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-14-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<p>Still seeing relative decoupling.</p>
<p>What about those 21 countries in the 2016 report mentioned above?</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">wri_21 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Austria'</span>,</span>
<span id="cb16-2">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Belgium'</span>,</span>
<span id="cb16-3">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Bulgaria'</span>,</span>
<span id="cb16-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Czechia'</span>,</span>
<span id="cb16-5">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Denmark'</span>,</span>
<span id="cb16-6">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Finland'</span>,</span>
<span id="cb16-7">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'France'</span>,</span>
<span id="cb16-8">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Germany'</span>,</span>
<span id="cb16-9">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Hungary'</span>,</span>
<span id="cb16-10">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Ireland'</span>,</span>
<span id="cb16-11">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Netherlands'</span>,</span>
<span id="cb16-12">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Portugal'</span>,</span>
<span id="cb16-13">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Romania'</span>,</span>
<span id="cb16-14">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Slovakia'</span>,</span>
<span id="cb16-15">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Spain'</span>,</span>
<span id="cb16-16">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Sweden'</span>,</span>
<span id="cb16-17">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Switzerland'</span>,</span>
<span id="cb16-18">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Ukraine'</span>,</span>
<span id="cb16-19">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'United Kingdom'</span>,</span>
<span id="cb16-20">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'United States'</span>,</span>
<span id="cb16-21">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Uzbekistan'</span></span>
<span id="cb16-22">  )</span></code></pre></div>
</div>
<p>Here we need to be careful when computing the indexed values of our metrics. Notice we use a pattern that looks like this:</p>
<pre><code>  arrange(country, year) |&gt; 
  group_by(country, year) |&gt; 
  mutate(indexed_variable = value / value[1]) |&gt; 
  ungroup()</code></pre>
<p>We need to do a <code>group_by()</code> so that we divide by the group level base value. Then we need to <code>ungroup()</code> to get our detailed level data back for plotting.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">country_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb18-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> wri_21, year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2023</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb18-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(country, year) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb18-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(country) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb18-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb18-6">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">annual_emissions_ghg_total_co2eq_idx =</span> annual_emissions_ghg_total_co2eq <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> annual_emissions_ghg_total_co2eq[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb18-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb18-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, country, gdp_total_idx, annual_emissions_ghg_total_co2eq_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb18-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, annual_emissions_ghg_total_co2eq_idx), </span>
<span id="cb18-10">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb18-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb18-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb18-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>country, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'free_y'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb18-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb18-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>,</span>
<span id="cb18-16">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Annual GHG emissions (including land use)"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-16-1.png" class="img-fluid figure-img" width="1920"></p>
</figure>
</div>
</div>
</div>
<p>For these 21 countries, absolute decoupling of overall greenhouse gas emissions (production) from GDP seems to be ongoing for the most part. However, as mentioned earlier, one has to wonder about consumption based measures. It was shown by <span class="citation" data-cites="davisConsumptionbasedAccountingCO22010">Davis and Caldeira (2010)</span> that in developed countries the difference between production and consumption of greenhouse gas emissions to be around 30%. Developed countries can essentially offload their emissions to less developed countries, who might also have less efficient production technology, and then import the goods produced by those less developed countries. And of course, since globally we definitely do not see absolute decoupling, there must be several populous countries for which absolute decoupling is not happening. For example:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1">country_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb19-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'China'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'India'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Brazil'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Mexico'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'South Africa'</span>,</span>
<span id="cb19-3">                        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Vietnam'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Venezuela'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Iran'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Saudi Arabia'</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb19-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(country, year) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb19-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(country) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb19-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb19-7">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">annual_emissions_ghg_total_co2eq_idx =</span> annual_emissions_ghg_total_co2eq <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> annual_emissions_ghg_total_co2eq[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb19-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb19-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, country, gdp_total_idx, annual_emissions_ghg_total_co2eq_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb19-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, annual_emissions_ghg_total_co2eq_idx), </span>
<span id="cb19-11">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb19-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb19-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>country, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'free_y'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>,</span>
<span id="cb19-17">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Annual GHG emissions (including land use)"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-17-1.png" class="img-fluid figure-img" width="1920"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="fossil-fuel-consumption" class="level2">
<h2 class="anchored" data-anchor-id="fossil-fuel-consumption">Fossil fuel consumption</h2>
<p>In countries such as the US, coal plants have been closing in recent years. What does coal consumption look like on a global scale?</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb20-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb20-3">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">coal_consumption_twh_idx =</span> coal_consumption_twh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> coal_consumption_twh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb20-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, coal_consumption_twh_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb20-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, coal_consumption_twh_idx), </span>
<span id="cb20-6">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb20-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>,</span>
<span id="cb20-11">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Coal consumption"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-18-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<p>Looks like coal use is back on the rise.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb21-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2020</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb21-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb21-4">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">coal_consumption_twh_idx =</span> coal_consumption_twh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> coal_consumption_twh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb21-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, coal_consumption_twh_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb21-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, coal_consumption_twh_idx), </span>
<span id="cb21-7">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb21-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb21-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb21-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb21-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (2020 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>,</span>
<span id="cb21-12">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Coal consumption"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-19-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<p>What about gas and oil?</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb22-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb22-3">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">oil_consumption_twh_idx =</span> oil_consumption_twh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> oil_consumption_twh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb22-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, oil_consumption_twh_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb22-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, oil_consumption_twh_idx), </span>
<span id="cb22-6">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb22-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>,</span>
<span id="cb22-11">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Oil consumption"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-20-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb23-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2020</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb23-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb23-4">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">oil_consumption_twh_idx =</span> oil_consumption_twh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> oil_consumption_twh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb23-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, oil_consumption_twh_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb23-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, oil_consumption_twh_idx), </span>
<span id="cb23-7">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb23-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (2020 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>,</span>
<span id="cb23-12">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Oil consumption"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-21-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb24-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb24-3">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gas_consumption_twh_idx =</span> gas_consumption_twh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gas_consumption_twh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb24-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, gas_consumption_twh_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb24-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, gas_consumption_twh_idx), </span>
<span id="cb24-6">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb24-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb24-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb24-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb24-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>,</span>
<span id="cb24-11">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Gas consumption"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-22-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb25-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2020</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb25-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb25-4">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gas_consumption_twh_idx =</span> gas_consumption_twh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gas_consumption_twh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb25-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, gas_consumption_twh_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb25-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, gas_consumption_twh_idx), </span>
<span id="cb25-7">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb25-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb25-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb25-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb25-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (2020 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>,</span>
<span id="cb25-12">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Gas consumption"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-23-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb26-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb26-3">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tot_fossil_fuel_consumption_idx =</span> tot_fossil_fuel_consumption_twh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> tot_fossil_fuel_consumption_twh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb26-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, tot_fossil_fuel_consumption_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb26-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, tot_fossil_fuel_consumption_idx), </span>
<span id="cb26-6">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb26-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb26-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb26-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb26-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>,</span>
<span id="cb26-11">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total fossil fuel consumption"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-24-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb27-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2020</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb27-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb27-4">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tot_fossil_fuel_consumption_idx =</span> tot_fossil_fuel_consumption_twh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> tot_fossil_fuel_consumption_twh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb27-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, tot_fossil_fuel_consumption_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb27-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, tot_fossil_fuel_consumption_idx), </span>
<span id="cb27-7">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb27-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb27-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb27-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb27-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (2020 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>,</span>
<span id="cb27-12">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total fossil fuel consumption"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-25-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<p>Safe to say there’s no absolute decoupling of fossil fuel use and GDP.</p>
</section>
<section id="minerals-and-metal-production" class="level2">
<h2 class="anchored" data-anchor-id="minerals-and-metal-production">Minerals and metal production</h2>
<p>We will look at cobalt, copper, lithium, nickel and steel production.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb28-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb28-3">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">production.Steel.Processing..crude.tonnes_idx =</span> production.Steel.Processing..crude.tonnes <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> production.Steel.Processing..crude.tonnes[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb28-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, production.Steel.Processing..crude.tonnes_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb28-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, production.Steel.Processing..crude.tonnes_idx), </span>
<span id="cb28-6">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb28-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb28-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb28-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb28-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>,</span>
<span id="cb28-11">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Steel production"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-26-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<p>Let’s look at the sum of cobalt, copper, lithium and nickel production.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb29-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">CCLN_production =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowSums</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cbind</span>(production.Cobalt.Mine.tonnes,</span>
<span id="cb29-3">           production.Copper.Mine.tonnes,</span>
<span id="cb29-4">           production.Lithium.Mine.tonnes,</span>
<span id="cb29-5">           production.Nickel.Mine.tonnes), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> T)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb29-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb29-7">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">CCLN_production_idx =</span> CCLN_production <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> CCLN_production[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb29-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, CCLN_production_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb29-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, CCLN_production_idx), </span>
<span id="cb29-10">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb29-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb29-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb29-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb29-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, </span>
<span id="cb29-15">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total cobalt, copper, lithium, and nickel production"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-27-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<p>No decoupling here.</p>
</section>
<section id="water" class="level2">
<h2 class="anchored" data-anchor-id="water">Water</h2>
<p>Water consumption is another important resource use to consider. Data availability only lets us look at 2014-2021.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb30-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">between</span>(year, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2014</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2021</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb30-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb30-4">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">freshwater_withdrawals_m3_idx =</span> freshwater_withdrawals_m3 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> freshwater_withdrawals_m3[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb30-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, freshwater_withdrawals_m3_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb30-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, freshwater_withdrawals_m3_idx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb30-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (2014 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, </span>
<span id="cb30-11">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Freshwater withdrawals"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-28-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<p>This will be one to watch as temperatures rise and the boom in data center construction continues to meet the energy needs of the recent advances in AI models. Data centers needs plenty of water for cooling.</p>
</section>
<section id="us-canada-mexico" class="level2">
<h2 class="anchored" data-anchor-id="us-canada-mexico">US, Canada, Mexico</h2>
<p>Let’s look at these three countries in comparison to each other and as a group.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1">north_america_metrics_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> country_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb31-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'United States'</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Canada'</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Mexico'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb31-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(year) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb31-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarise</span>(</span>
<span id="cb31-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tot_fossil_fuel_consumption_twh =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(tot_fossil_fuel_consumption_twh),</span>
<span id="cb31-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">primary_energy_consumption_Twh =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(primary_energy_consumption_Twh),</span>
<span id="cb31-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">annual_emissions_ghg_total_co2eq =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(annual_emissions_ghg_total_co2eq),</span>
<span id="cb31-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(gdp_total)</span>
<span id="cb31-9">  )</span></code></pre></div>
</div>
<section id="primary-energy-consumption-in-us-ca-mx" class="level3">
<h3 class="anchored" data-anchor-id="primary-energy-consumption-in-us-ca-mx">Primary energy consumption in US, CA, MX</h3>
<div class="cell">
<div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1">north_america_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb32-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb32-3">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">primary_energy_consumption_Twh_idx =</span> primary_energy_consumption_Twh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> primary_energy_consumption_Twh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb32-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, primary_energy_consumption_Twh_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb32-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, primary_energy_consumption_Twh_idx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb32-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb32-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb32-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb32-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, </span>
<span id="cb32-10">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Primary energy consumption - US, CA, MX"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-30-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1">country_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb33-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'United States'</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Canada'</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Mexico'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb33-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(country) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb33-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb33-5">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">primary_energy_consumption_Twh_idx =</span> primary_energy_consumption_Twh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> primary_energy_consumption_Twh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb33-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb33-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(country, year, gdp_total_idx, primary_energy_consumption_Twh_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb33-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, primary_energy_consumption_Twh_idx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb33-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb33-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb33-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_grid</span>(country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'free_y'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb33-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb33-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, </span>
<span id="cb33-14">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Primary energy consumption - US, CA, MX"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-31-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="total-fossil-fuel-consumption-in-us-ca-mx" class="level3">
<h3 class="anchored" data-anchor-id="total-fossil-fuel-consumption-in-us-ca-mx">Total fossil fuel consumption in US, CA, MX</h3>
<div class="cell">
<div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1">north_america_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb34-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb34-3">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tot_fossil_fuel_consumption_twh_idx =</span> tot_fossil_fuel_consumption_twh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> tot_fossil_fuel_consumption_twh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb34-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, tot_fossil_fuel_consumption_twh_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb34-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, tot_fossil_fuel_consumption_twh_idx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb34-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb34-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb34-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb34-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, </span>
<span id="cb34-10">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total fossil fuel consumption - US, CA, MX"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-32-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1">country_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb35-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'United States'</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Canada'</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Mexico'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb35-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(country) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb35-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb35-5">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tot_fossil_fuel_consumption_twh_idx =</span> tot_fossil_fuel_consumption_twh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> tot_fossil_fuel_consumption_twh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb35-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb35-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(country, year, gdp_total_idx, tot_fossil_fuel_consumption_twh_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb35-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, tot_fossil_fuel_consumption_twh_idx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb35-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb35-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb35-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_grid</span>(country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'free_y'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb35-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb35-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, </span>
<span id="cb35-14">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total fossil fuel consumption - US, CA, MX"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-33-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="total-greenhouse-gas-emissions-in-us-ca-mx" class="level3">
<h3 class="anchored" data-anchor-id="total-greenhouse-gas-emissions-in-us-ca-mx">Total greenhouse gas emissions in US, CA, MX</h3>
<div class="cell">
<div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb36-1">north_america_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb36-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb36-3">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">annual_emissions_ghg_total_co2eq_idx =</span> annual_emissions_ghg_total_co2eq <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> annual_emissions_ghg_total_co2eq[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb36-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, annual_emissions_ghg_total_co2eq_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb36-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, annual_emissions_ghg_total_co2eq_idx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb36-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb36-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb36-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb36-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, </span>
<span id="cb36-10">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total greenhouse gas emissions - US, CA, MX"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-34-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<p>This looks promising.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1">country_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb37-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'United States'</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Canada'</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Mexico'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb37-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(country) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb37-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb37-5">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">annual_emissions_ghg_total_co2eq_idx =</span> annual_emissions_ghg_total_co2eq <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> annual_emissions_ghg_total_co2eq[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb37-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb37-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(country, year, gdp_total_idx, annual_emissions_ghg_total_co2eq_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb37-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, annual_emissions_ghg_total_co2eq_idx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb37-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb37-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_grid</span>(country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'free_y'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb37-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb37-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, </span>
<span id="cb37-13">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total greenhouse gas emissions - US, CA, MX"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-35-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<p>Recent increases in wind and solar likely the cause of the stabilization of GHG emissions in US and Canada. Mexico is still seeing increases.</p>
<p>What about coal production in these three countries?</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb38-1">country_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb38-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'United States'</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Canada'</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Mexico'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb38-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(country) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb38-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb38-5">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">coal_consumption_twh_idx =</span> coal_consumption_twh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> coal_consumption_twh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb38-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb38-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(country, year, gdp_total_idx, coal_consumption_twh_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb38-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, coal_consumption_twh_idx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb38-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb38-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_grid</span>(country <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'free_y'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb38-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb38-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, </span>
<span id="cb38-13">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Coal production - US, CA, MX"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-36-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
</section>
</section>
<section id="regional-summaries" class="level2">
<h2 class="anchored" data-anchor-id="regional-summaries">Regional summaries</h2>
<p><strong>NOTE</strong> Missing data series in some of the following plots are due to lack of data in the index year of 1980.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb39-1">country_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb39-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2022</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb39-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(year, region23) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb39-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarise</span>(</span>
<span id="cb39-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tot_fossil_fuel_consumption_twh =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(tot_fossil_fuel_consumption_twh, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb39-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">primary_energy_consumption_Twh =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(primary_energy_consumption_Twh, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb39-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">annual_emissions_ghg_total_co2eq =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(annual_emissions_ghg_total_co2eq, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb39-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(gdp_total, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb39-9">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb39-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(region23, year) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb39-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(region23) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb39-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb39-13">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">primary_energy_consumption_Twh_idx =</span> primary_energy_consumption_Twh <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> primary_energy_consumption_Twh[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb39-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb39-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, region23, gdp_total_idx, primary_energy_consumption_Twh_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb39-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, primary_energy_consumption_Twh_idx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb39-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb39-18">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb39-19">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>region23, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'free_y'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb39-20">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb39-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, </span>
<span id="cb39-22">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Primary energy consumption"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-37-1.png" class="img-fluid figure-img" width="1920"></p>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb40-1">country_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb40-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2022</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb40-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(year, region23) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb40-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarise</span>(</span>
<span id="cb40-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tot_fossil_fuel_consumption_twh =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(tot_fossil_fuel_consumption_twh, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb40-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">primary_energy_consumption_Twh =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(primary_energy_consumption_Twh, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb40-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">annual_emissions_ghg_total_co2eq =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(annual_emissions_ghg_total_co2eq, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb40-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(gdp_total, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb40-9">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb40-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(region23, year) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb40-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(region23) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb40-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb40-13">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">annual_emissions_ghg_total_co2eq_idx =</span> annual_emissions_ghg_total_co2eq <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> annual_emissions_ghg_total_co2eq[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb40-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb40-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, region23, gdp_total_idx, annual_emissions_ghg_total_co2eq_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb40-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, annual_emissions_ghg_total_co2eq_idx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb40-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb40-18">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>region23, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'free_y'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb40-19">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb40-20">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, </span>
<span id="cb40-21">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total greenhouse gas emissions"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-38-1.png" class="img-fluid figure-img" width="1920"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="population" class="level2">
<h2 class="anchored" data-anchor-id="population">Population</h2>
<p>Population marches steadily on.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb41" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb41-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb41-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb41-3">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">population_idx =</span> population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> population[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb41-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, population_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb41-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, population_idx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb41-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb41-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb41-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb41-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1980 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, </span>
<span id="cb41-10">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Population"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-39-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb42-1">world_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb42-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2020</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb42-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb42-4">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">population_idx =</span> population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> population[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb42-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, gdp_total_idx, population_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb42-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, population_idx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb42-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb42-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb42-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb42-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (2020 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, </span>
<span id="cb42-11">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Population"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-40-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb43-1">country_metrics_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">between</span>(year, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1990</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2022</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(year, region23) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarise</span>(</span>
<span id="cb43-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">population =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(population, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb43-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(gdp_total, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb43-7">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(region23, year) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(region23) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gdp_total_idx =</span> gdp_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gdp_total[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb43-11">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">population_idx =</span> population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> population[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(year, region23, gdp_total_idx, population_idx) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(gdp_total_idx, population_idx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'metric'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb43-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>region23, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'free_y'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb43-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#6C5043FF"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#A3BAC2FF"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb43-18">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year (1990 is year 1)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Multiple of year 1"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Population"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling_files/figure-html/unnamed-chunk-41-1.png" class="img-fluid figure-img" width="1920"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="parting-words" class="level2">
<h2 class="anchored" data-anchor-id="parting-words">Parting words</h2>
<p>From p3 of the 2019 EEB study:</p>
<blockquote class="blockquote">
<p>The fact that decoupling on its own, i.e.&nbsp;without addressing the issue of economic growth, has not been and will not be sufficient to reduce environmental pressures to the required extent is not a reason to oppose decoupling (in the literal sense of separating the environmental pressures curve from the GDP curve) or the measures that achieve decoupling - on the contrary, without many such measures the situation would be far worse. It is a reason to have major concerns about the predominant focus of policymakers on green growth, this focus being based on the flawed assumption that sufficient decoupling can be achieved through increased efficiency without limiting economic production and consumption.</p>
</blockquote>
</section>
<section id="data-source-citations" class="level2">
<h2 class="anchored" data-anchor-id="data-source-citations">Data source citations</h2>
<p>Here are the citations from the various OWID data sources used as part of this post.</p>
<ul>
<li>HYDE (2023); Gapminder (2022); UN WPP (2024) – with major processing by Our World in Data. “Population” [dataset]. PBL Netherlands Environmental Assessment Agency, “History Database of the Global Environment 3.3”; Gapminder, “Population v7”; United Nations, “World Population Prospects”; Gapminder, “Systema Globalis” [original data]. Retrieved February 13, 2025 from https://ourworldindata.org/grapher/population</li>
<li>Bolt and van Zanden - Maddison Project Database 2023 – with minor processing by Our World in Data. “GDP per capita – Maddison Project Database – In constant international-$. Historical data” [dataset]. Bolt and van Zanden, “Maddison Project Database 2023” [original data]. Retrieved February 13, 2025 from https://ourworldindata.org/grapher/gdp-per-capita-maddison-project-database</li>
<li>U.S. Energy Information Administration (2023); Energy Institute - Statistical Review of World Energy (2024); Population based on various sources (2023) – with major processing by Our World in Data. “Primary energy consumption per capita” [dataset]. U.S. Energy Information Administration, “International Energy Data”; Energy Institute, “Statistical Review of World Energy”; Various sources, “Population” [original data]. Retrieved February 17, 2025 from https://ourworldindata.org/grapher/per-capita-energy-use</li>
<li>Jones et al.&nbsp;(2024); Population based on various sources (2024) – with major processing by Our World in Data. “Per capita greenhouse gas emissions including land use” [dataset]. Jones et al., “National contributions to climate change 2024.2”; Various sources, “Population” [original data]. Retrieved February 17, 2025 from https://ourworldindata.org/grapher/total-greenhouse-gas-emissions-per-capita</li>
<li>Jones et al.&nbsp;(2024); Population based on various sources (2024) – with major processing by Our World in Data. “Per capita greenhouse gas emissions from fossil fuels and industry” [dataset]. Jones et al., “National contributions to climate change 2024.2”; Various sources, “Population” [original data]. Retrieved February 13, 2025 from https://ourworldindata.org/grapher/per-capita-ghg-excl-land-use</li>
<li>Energy Institute - Statistical Review of World Energy (2024) – with major processing by Our World in Data. “Coal consumption” [dataset]. Energy Institute, “Statistical Review of World Energy” [original data].</li>
<li>USGS - Mineral Commodity Summaries (2024); USGS - Historical Statistics for Mineral and Material Commodities (2023); BGS - World Mineral Statistics (2023) – with major processing by Our World in Data. “Copper production” [dataset]. United States Geological Survey, “Mineral Commodity Summaries”; United States Geological Survey, “Historical Statistics for Mineral and Material Commodities”; British Geological Survey, “World Mineral Statistics” [original data].</li>
<li>Food and Agriculture Organization of the United Nations (via World Bank) (2025) – processed by Our World in Data. “Annual freshwater withdrawals” [dataset]. Food and Agriculture Organization of the United Nations (via World Bank), “World Development Indicators” [original data]. Retrieved February 13, 2025 from https://ourworldindata.org/grapher/annual-freshwater-withdrawals</li>
<li>Contains modified Copernicus Climate Change Service information (2025) – with major processing by Our World in Data. “Annual average” [dataset]. Contains modified Copernicus Climate Change Service information, “ERA5 monthly averaged data on single levels from 1940 to present 2” [original data].</li>
<li>Gapminder (2020); UN Inter-agency Group for Child Mortality Estimation (2024) – with major processing by Our World in Data. “Under-five mortality rate – UN IGME; Gapminder – Long-run data” [dataset]. United Nations Inter-agency Group for Child Mortality Estimation, “United Nations Inter-agency Group for Child Mortality Estimation”; Gapminder, “Child mortality rate under age five v7”; Various sources, “Population” [original data]. Retrieved February 20, 2025 from https://ourworldindata.org/grapher/child-mortality</li>
<li>UN, World Population Prospects (2024) – processed by Our World in Data. “Fertility rate, total – UN WPP” [dataset]. United Nations, “World Population Prospects” [original data]. Retrieved February 20, 2025 from https://ourworldindata.org/grapher/children-per-woman-un</li>
<li>World Bank Poverty and Inequality Platform (2024) – with major processing by Our World in Data. “Gini Coefficient – World Bank” [dataset]. World Bank Poverty and Inequality Platform, “World Bank Poverty and Inequality Platform (PIP) 20240627_2017, 20240627_2011” [original data].</li>
<li>UN WPP (2024); HMD (2024); Zijdeman et al.&nbsp;(2015); Riley (2005) – with minor processing by Our World in Data. “Life expectancy at birth – Various sources – period tables” [dataset]. Human Mortality Database, “Human Mortality Database”; United Nations, “World Population Prospects”; Zijdeman et al., “Life Expectancy at birth 2”; James C. Riley, “Estimates of Regional and Global Life Expectancy, 1800-2001” [original data]. Retrieved February 20, 2025 from https://ourworldindata.org/grapher/life-expectancy</li>
<li>U.S. Bureau of Economic Analysis via FRED®</li>
</ul>



</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-bibliography"><h2 class="anchored quarto-appendix-heading">References</h2><div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0">
<div id="ref-davisConsumptionbasedAccountingCO22010" class="csl-entry">
Davis, Steven J., and Ken Caldeira. 2010. <span>“Consumption-Based Accounting of <span>CO</span><span><sub>2</sub></span> Emissions.”</span> <em>Proceedings of the National Academy of Sciences</em> 107 (12): 5687–92. <a href="https://doi.org/10.1073/pnas.0906974107">https://doi.org/10.1073/pnas.0906974107</a>.
</div>
<div id="ref-parrique2019decoupling" class="csl-entry">
Parrique, Timothée, Jonathan Barth, François Briens, Christian Kerschner, Alejo Kraus-Polk, Anna Kuokkanen, and Joachim H Spangenberg. 2019. <span>“Decoupling Debunked.”</span> <em>Evidence and Arguments Against Green Growth as a Sole Strategy for Sustainability. A Study Edited by the European Environment Bureau EEB</em> 3.
</div>
<div id="ref-ritchie2024not" class="csl-entry">
Ritchie, Hannah. 2024. <em>Not the End of the World: <span>How</span> We Can Be the First Generation to Build a Sustainable Planet</em>. Random House.
</div>
<div id="ref-vaden2020decoupling" class="csl-entry">
Vadén, Tere, Ville Lähde, Antti Majava, Paavo Järvensivu, Tero Toivanen, Emma Hakala, and Jussi T Eronen. 2020. <span>“Decoupling for Ecological Sustainability: <span>A</span> Categorisation and Review of Research Literature.”</span> <em>Environmental Science &amp; Policy</em> 112: 236–44.
</div>
</div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{isken2025,
  author = {Isken, Mark},
  title = {Using {R} to Explore and Visualize Decoupling},
  date = {2025-02-20},
  url = {https://bitsofanalytics.org//posts/searching-for-decoupling/searching_for_decoupling.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-isken2025" class="csl-entry quarto-appendix-citeas">
Isken, Mark. 2025. <span>“Using R to Explore and Visualize
Decoupling.”</span> February 20, 2025. <a href="https://bitsofanalytics.org//posts/searching-for-decoupling/searching_for_decoupling.html">https://bitsofanalytics.org//posts/searching-for-decoupling/searching_for_decoupling.html</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>ecology</category>
  <category>economics</category>
  <guid>https://bitsofanalytics.org/posts/searching-for-decoupling/searching_for_decoupling.html</guid>
  <pubDate>Thu, 20 Feb 2025 05:00:00 GMT</pubDate>
</item>
<item>
  <title>Plotting Great Lakes water level data using R - a 2025 update</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/great-lakes-water-levels-2024/get_plot_gl_water_levels.html</link>
  <description><![CDATA[ 




<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>This post is an update of a post originally done back in 2018. The location and format for data downloading has changed.</p>
</div>
</div>
<p>The past few years we have seen record or near record high water levels in all of the Great Lakes. Recently, the levels have dropped. While plots of historical water levels are available on the web, I thought it would be interesting create my own plots using R. In this post I will summarize the steps for using R to:</p>
<ul>
<li>download raw historical water level data for the Great Lakes using a few different methods,</li>
<li>process the raw data to get it into shape for time series plots,</li>
<li>create time series plots of monthly water levels using <code>ggplot</code>.</li>
</ul>
<section id="web-resources-related-to-great-lakes-water-levels" class="level2">
<h2 class="anchored" data-anchor-id="web-resources-related-to-great-lakes-water-levels">Web resources related to Great Lakes water levels</h2>
<p>There are a number of good resources for learning about Great Lakes water level monitoring. A few that I’ve found useful as I’ve started to explore this topic include the following. Many of these links have changed since the original 2018 post.</p>
<section id="noaa---great-lakes-environmental-research-laboratory" class="level3">
<h3 class="anchored" data-anchor-id="noaa---great-lakes-environmental-research-laboratory">NOAA - Great Lakes Environmental Research Laboratory</h3>
<ul>
<li>Main - <a href="https://www.glerl.noaa.gov/" class="uri">https://www.glerl.noaa.gov/</a></li>
<li>Water levels - <a href="https://www.glerl.noaa.gov/data/wlevels/" class="uri">https://www.glerl.noaa.gov/data/wlevels/</a>
<ul>
<li>tabs for monitoring network, observations and forecasts</li>
</ul></li>
<li>Dashboard - <a href="https://www.glerl.noaa.gov/data/wlevels/dashboard/" class="uri">https://www.glerl.noaa.gov/data/wlevels/dashboard/</a></li>
<li>Data - <a href="https://www.glerl.noaa.gov/data/" class="uri">https://www.glerl.noaa.gov/data/</a></li>
<li>APIs - <a href="https://tidesandcurrents.noaa.gov/web_services_info.html" class="uri">https://tidesandcurrents.noaa.gov/web_services_info.html</a></li>
<li>Water levels and hydrology - <a href="https://www.glerl.noaa.gov/data/#hydro" class="uri">https://www.glerl.noaa.gov/data/#hydro</a></li>
<li>Tides and Currents - Water levels - <a href="https://tidesandcurrents.noaa.gov/water_level_info.html" class="uri">https://tidesandcurrents.noaa.gov/water_level_info.html</a></li>
<li>Monthly hydrological data - <a href="https://www.glerl.noaa.gov/ahps/mnth-hydro.html" class="uri">https://www.glerl.noaa.gov/ahps/mnth-hydro.html</a></li>
<li>Large Lake Statistical Water Balance Model (L2SWBM) - <a href="https://github.com/NOAA-GLERL/L2SWBM" class="uri">https://github.com/NOAA-GLERL/L2SWBM</a></li>
<li>Great Lakes Seasonal Hydrological Forecasting Products (Experimental) <a href="https://www.glerl.noaa.gov/ahps/curfcst.html" class="uri">https://www.glerl.noaa.gov/ahps/curfcst.html</a></li>
<li>Great Lakes water budget - <a href="https://www.glerl.noaa.gov/pubs/brochures/GreatLakesWaterBudgets.png" class="uri">https://www.glerl.noaa.gov/pubs/brochures/GreatLakesWaterBudgets.png</a></li>
</ul>
</section>
<section id="other-resources" class="level3">
<h3 class="anchored" data-anchor-id="other-resources">Other resources</h3>
<ul>
<li><a href="https://www.lrd.usace.army.mil/detroit/">US Army Corp of Engineers - Great Lakes Information</a></li>
<li><a href="http://www.greatlakescc.org/wp36/">Great Lakes Coordination Committee (US and CA)</a></li>
<li><a href="https://www.tides.gc.ca/en/tides-and-water-levels-data-archive">Fisheries and Oceans CA</a> - observational data and forecasts</li>
</ul>
</section>
</section>
<section id="getting-monthly-water-level-data" class="level2">
<h2 class="anchored" data-anchor-id="getting-monthly-water-level-data">Getting monthly water level data</h2>
<p>We will need a few libraries.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(readr)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(lubridate)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(stringr)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyr)</span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dplyr)</span>
<span id="cb1-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggplot2)</span></code></pre></div>
</div>
<p>You can download CSV files from NOAA containing monthly water levels for all of the Great Lakes at the following location:</p>
<ul>
<li><a href="https://www.lrd.usace.army.mil/Water-Information/Water-Management/Great-Lakes-and-Harbors/Water-Level-Data/" class="uri">https://www.lrd.usace.army.mil/Water-Information/Water-Management/Great-Lakes-and-Harbors/Water-Level-Data/</a></li>
</ul>
<p>This URL has buttons for downloading long term monthly mean water levels in units desired. The links are:</p>
<ul>
<li><a href="https://lre-wm.usace.army.mil/ForecastData/WaterLevelData/GLHYD_data_english.csv" class="uri">https://lre-wm.usace.army.mil/ForecastData/WaterLevelData/GLHYD_data_english.csv</a></li>
<li><a href="https://lre-wm.usace.army.mil/ForecastData/WaterLevelData/GLHYD_data_metric.csv" class="uri">https://lre-wm.usace.army.mil/ForecastData/WaterLevelData/GLHYD_data_metric.csv</a></li>
</ul>
<p>I’ve already downloaded it to a local location to avoid repeated downloads while writing this post.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#data_raw_loc &lt;- "https://lre-wm.usace.army.mil/ForecastData/WaterLevelData/GLHYD_data_metric.csv"</span></span>
<span id="cb2-2">data_raw_loc <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data_water_level/raw/GLHYD_data_metric.csv"</span></span>
<span id="cb2-3">data_loc <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data_water_level"</span></span></code></pre></div>
</div>
<p>Before reading it into an R dataframe, let’s look at the file format.</p>
<pre><code># Coordinated Monthly Mean Lakewide Average Water Levels,,,,,,
# Period of record: 1918-2023,,,,,,
"# Units: meters, IGLD 1985",,,,,,
"# Calculated using the coordinated gage network, consisting of:",,,,,,
"# Superior: Marquette and Point Iroquois, MI; Duluth, MN; Michipicoten and Thunder Bay, Ontario",,,,,,
"# Michigan-Huron: Harbor Beach, Mackinaw City and Ludington, MI; Milwaukee, WI; Thessalon and Tobermory, Ontario",,,,,,
"# St. Clair: St. Clair Shores, MI and Belle River, Ontario",,,,,,
"# Erie: Toledo and Cleveland, OH; Port Stanley and Port Colborne, Ontario",,,,,,
"# Ontario: Oswego and Rochester, NY; Cobourg, Port Weller, Toronto, and Kingston, Ontario",,,,,,
#,,,,,,
# Last modified March 2024 Contact: Deanna.C.Fielder@usace.army.mil,,,,,,
#,,,,,,
month,year,Superior,Michigan-Huron,St. Clair,Erie,Ontario
jan,1918,183.25,176.71,174.59,173.9,74.74
feb,1918,183.2,176.73,174.74,173.82,74.72
mar,1918,183.17,176.8,174.74,174.01,74.92
apr,1918,183.14,176.89,174.84,174.02,75.1
may,1918,183.22,176.99,175,173.98,75.09
jun,1918,183.34,177.07,175.14,174.1,75.06
jul,1918,183.4,177.07,175.17,174.12,74.99</code></pre>
<p>A few things to note:</p>
<ul>
<li>The top twelve lines are metadata,</li>
<li>Data goes back to 1918,</li>
<li>Data is in <em>wide format</em> in that each lake is in its own column.</li>
</ul>
<p>I used the <code>readr</code> package to read in this csv file. Notice that there is a single column for Lakes Michigan and Huron as they are actually just one big lake that we divide at the Straits of Mackinac - see <a href="https://www.glerl.noaa.gov/res/straits/" class="uri">https://www.glerl.noaa.gov/res/straits/</a>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">mean_lake_level_raw <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(data_raw_loc, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">skip =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>)</span></code></pre></div>
</div>
</section>
<section id="data-reshaping" class="level2">
<h2 class="anchored" data-anchor-id="data-reshaping">Data reshaping</h2>
<p>In order to facilitate plotting with <code>ggplot2</code>, we are going to need to reshape this data into <em>long format</em>. Each row will be a single monthly reading and there will be a <code>month</code> column and a <code>lake</code> column. Let’s use the <code>tidyr::pivot_longer</code> function.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">mean_lake_level_long <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(mean_lake_level_raw, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lake"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"level_m"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>)</span>
<span id="cb5-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rm</span>(mean_lake_level_raw)</span></code></pre></div>
</div>
<p>Take a peek at the data in long format.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>(mean_lake_level_long)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 4
  month  year lake           level_m
  &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt;            &lt;dbl&gt;
1 jan    1918 Superior         183. 
2 jan    1918 Michigan-Huron   177. 
3 jan    1918 St. Clair        175. 
4 jan    1918 Erie             174. 
5 jan    1918 Ontario           74.7
6 feb    1918 Superior         183. </code></pre>
</div>
</div>
<p>The months are three character strings. Let’s create a date column that we can use for joining the individual tables as well as proper sorting.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">mean_lake_level_long<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>date <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.POSIXct</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_c</span>(mean_lake_level_long<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>month, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" 1, "</span>,</span>
<span id="cb8-2">                                                mean_lake_level_long<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>year),</span>
<span id="cb8-3">                                          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%b %d, %Y"</span>)</span>
<span id="cb8-4"></span>
<span id="cb8-5">mean_lake_level_long <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mean_lake_level_long <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb8-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(lake, date)</span></code></pre></div>
</div>
<p>Check out our work.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>(mean_lake_level_long)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 5
  month  year lake  level_m date               
  &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt;   &lt;dbl&gt; &lt;dttm&gt;             
1 jan    1918 Erie     174. 1918-01-01 00:00:00
2 feb    1918 Erie     174. 1918-02-01 00:00:00
3 mar    1918 Erie     174. 1918-03-01 00:00:00
4 apr    1918 Erie     174. 1918-04-01 00:00:00
5 may    1918 Erie     174. 1918-05-01 00:00:00
6 jun    1918 Erie     174. 1918-06-01 00:00:00</code></pre>
</div>
</div>
<section id="alternative-data-source-from-glcc" class="level3">
<h3 class="anchored" data-anchor-id="alternative-data-source-from-glcc">Alternative data source from GLCC</h3>
<p>The <a href="http://www.greatlakescc.org/wp36/">Great Lakes Coordination Committee (US and CA)</a> does what it sounds like it does.</p>
<blockquote class="blockquote">
<p>The Coordinating Committee on Great Lakes Basic Hydraulic and Hydrologic Data (Coordinating Committee) is a collaboration of the Governments of the United States and Canada for the purpose of agreeing upon the basic hydraulic, hydrologic and vertical control data that is required to manage the Great Lakes and St.&nbsp;Lawrence River.</p>
</blockquote>
<p>They also maintain a repository of Great Lakes water level data at <a href="https://www.greatlakescc.org/en/coordinating-committee-products-and-datasets/" class="uri">https://www.greatlakescc.org/en/coordinating-committee-products-and-datasets/</a>. There are CSV files stored on Google Drive for each lake containing mean monthly water levels since 1918. These are in a different wide format than the data above. Let’s look at the data for Lake Michigan-Huron and compare to the data we just downloaded. Here is what the raw data looks like:</p>
<pre><code># NAME: LAKE MICHIGAN-HURON MONTHLY MEAN LEVELS 
# UNITS: metres (m) (DATUM: International Great Lakes Datum of 1985) 
# PERIOD OF RECORD: 1918 TO 2024
# DESCRIPTION: Water levels are measured by the National Oceanic and Atmospheric Administration and the  
# Canadian Department of Fisheries and Oceans.  Mean levels are computed as an average
# of a network of gauges and coordinated under the auspices of the Coordinating Committee on 
# Great Lakes Basic Hydraulic and Hydrologic Data      www.greatlakescc.org 
# # All data for the previous 12 months should be considered provisional. 
# note: -9999 means that no data is available for that date 
Year, Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec 
1918,176.71,176.73,176.8,176.89,176.99,177.07,177.07,177.01,176.94,176.83,176.82,176.78 
1919,176.74,176.68,176.68,176.77,176.89,176.92,176.88,176.83,176.72,176.66,176.62,176.55 
1920,176.5,176.47,176.49,176.63,176.68,176.73,176.79,176.76,176.73,176.65,176.57,176.5 
1921,176.44,176.42,176.42,176.54,176.63,176.64,176.6,176.54,176.5,176.44,176.35,176.34 
1922,176.27,176.24,176.28,176.43,176.57,176.63,176.67,176.63,176.56,176.46,176.35,176.25 </code></pre>
<p>A few things to note:</p>
<ul>
<li>The top nine lines are metadata,</li>
<li>Data goes back to 1918,</li>
<li>Data is in <em>wide format</em> in that each <strong>month</strong> (instead of lake) is in its own column.</li>
</ul>
<p>We need to combine the separate lake files and do some reshaping and column manipulation. Again, I’ve already downloaded the five data files to a local folder.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">data_cc_raw_loc <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data_water_level/raw/"</span></span>
<span id="cb12-2"></span>
<span id="cb12-3">gl_data_raw <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>()</span>
<span id="cb12-4">gl_data_long <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>()</span>
<span id="cb12-5"></span>
<span id="cb12-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># regex pattern for filename matching</span></span>
<span id="cb12-7">file_pattern <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">w+_MonthlyMeanWaterLevels_1918to2024</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">.csv$"</span></span>
<span id="cb12-8">gl_files <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list.files</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">path =</span> data_cc_raw_loc, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pattern =</span> file_pattern)</span>
<span id="cb12-9"></span>
<span id="cb12-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Loop over the list of matched filenames</span></span>
<span id="cb12-11"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (i <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq_along</span>(gl_files)) {</span>
<span id="cb12-12">    filename <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(data_cc_raw_loc, gl_files[i])</span>
<span id="cb12-13">    gl_data_raw[[i]] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> filename, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">skip =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>)</span>
<span id="cb12-14">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Which lake?</span></span>
<span id="cb12-15">    lake <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_match</span>(filename, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"(Lake</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">w+)_Month"</span>)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]</span>
<span id="cb12-16">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Pivot to long format</span></span>
<span id="cb12-17">    gl_data_long[[i]] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(gl_data_raw[[i]], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"month"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"level_m"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span>)</span>
<span id="cb12-18">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add a lake identifier column</span></span>
<span id="cb12-19">    gl_data_long[[i]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>lake <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> lake</span>
<span id="cb12-20">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add date based on month and year</span></span>
<span id="cb12-21">    gl_data_long[[i]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>date <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.POSIXct</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_c</span>(gl_data_long[[i]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>month, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" 1, "</span>,</span>
<span id="cb12-22">                                               gl_data_long[[i]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Year),</span>
<span id="cb12-23">                                         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%b %d, %Y"</span>)</span>
<span id="cb12-24"></span>
<span id="cb12-25">}</span>
<span id="cb12-26"></span>
<span id="cb12-27"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Bind all the rows together from the list of dataframes</span></span>
<span id="cb12-28">mean_lake_level_cc_long <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_rows</span>(gl_data_long)</span>
<span id="cb12-29"></span>
<span id="cb12-30"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Column renaming and factor seting</span></span>
<span id="cb12-31">mean_lake_level_cc_long <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mean_lake_level_cc_long <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb12-32">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">year =</span> Year)</span>
<span id="cb12-33">mean_lake_level_cc_long<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>lake <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(mean_lake_level_cc_long<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>lake, </span>
<span id="cb12-34">                                       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">levels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LakeSuperior"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LakeMichiganHuron"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LakeStClair"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LakeErie"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LakeOntario"</span>))</span></code></pre></div>
</div>
<p>This data goes through 2024 whereas the one from the NOAA site hasn’t been updated yet for 2024. Let’s check to see if data is the same through 2023.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">level_m_noaa <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mean_lake_level_long <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb13-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(level_m) </span>
<span id="cb13-3"></span>
<span id="cb13-4">level_m_cc <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mean_lake_level_cc_long <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb13-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2024</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb13-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(level_m)</span>
<span id="cb13-7"></span>
<span id="cb13-8">max_abs_diff <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(level_m_noaa<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>level_m <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> level_m_cc<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>level_m), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb13-9">assertthat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">assert_that</span>(max_abs_diff <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] TRUE</code></pre>
</div>
</div>
<p>Confirms that the data is identical from the two sources.</p>
</section>
</section>
<section id="lake-level-plotting" class="level2">
<h2 class="anchored" data-anchor-id="lake-level-plotting">Lake level plotting</h2>
<p>If we plot all the lake level time series on one plot, we can see the differences in levels between lakes but the intra-lake variation is hidden by the scale.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(mean_lake_level_cc_long) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>date, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> level_m, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> lake)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb15-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Water level (m)"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/great-lakes-water-levels-2024/get_plot_gl_water_levels_files/figure-html/unnamed-chunk-10-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Let’s find the latest month for which we have data and save a version of this dataframe and tag it by the month. Note the use of <code>pull()</code> to convert the the resulting 1x1 dataframe into a value.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">last_month_d <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mean_lake_level_cc_long <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb16-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(level_m)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb16-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(date) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb16-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">whichmonth =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(date)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb16-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>()</span>
<span id="cb16-6"></span>
<span id="cb16-7">last_month_d</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "2024-12-01 EST"</code></pre>
</div>
</div>
<p>Save as an rds file.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">rdsname <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(data_loc, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lake_level_"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">year</span>(last_month_d), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">strftime</span>(last_month_d,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%m"</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".rds"</span>))</span>
<span id="cb18-2"></span>
<span id="cb18-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">saveRDS</span>(mean_lake_level_long, rdsname)</span></code></pre></div>
</div>
</section>
<section id="plotting-monthly-water-level-data" class="level2">
<h2 class="anchored" data-anchor-id="plotting-monthly-water-level-data">Plotting monthly water level data</h2>
<p>Let’s compute a bunch of overall historical statistics and include a few as reference lines on the plots. I will use <code>dplyr</code> for the stats.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1">lake_level_stats <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mean_lake_level_cc_long <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb19-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(lake) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb19-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb19-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean_level =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(level_m, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb19-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">min_level =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(level_m, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb19-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">max_level =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(level_m, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb19-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p05_level =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">quantile</span>(level_m, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb19-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p95_level =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">quantile</span>(level_m, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.95</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb19-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd_level =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sd</span>(level_m, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb19-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cv_level =</span> sd_level <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> mean_level</span>
<span id="cb19-11">  )</span>
<span id="cb19-12"></span>
<span id="cb19-13">lake_level_stats</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 5 × 8
  lake      mean_level min_level max_level p05_level p95_level sd_level cv_level
  &lt;fct&gt;          &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;
1 LakeSupe…      183.      183.      184.      183.      184.     0.203  0.00110
2 LakeMich…      176.      176.      178.      176.      177.     0.406  0.00230
3 LakeStCl…      175.      174.      176.      174.      176.     0.395  0.00226
4 LakeErie       174.      173.      175.      174.      175.     0.369  0.00212
5 LakeOnta…       74.8      73.7      75.9      74.2      75.3    0.343  0.00459</code></pre>
</div>
</div>
<p>Let’s try to combine time series with lines from the stats dataframe just created. There are a few approaches to doing this. One way is to add <code>geom_hline</code> objects to the plot. See the following posts for some good ideas on using <code>geom_hline()</code>, adding the reference lines to the legend and controlling their style and color, and controlling the order of plots in a faceted grid.</p>
<ul>
<li><a href="https://stackoverflow.com/questions/11846295/how-to-add-different-lines-for-facets" class="uri">https://stackoverflow.com/questions/11846295/how-to-add-different-lines-for-facets</a></li>
<li><a href="https://forum.posit.co/t/scale-linetype-manual-not-working/70225" class="uri">https://forum.posit.co/t/scale-linetype-manual-not-working/70225</a></li>
<li><a href="https://stackoverflow.com/questions/75882636/how-to-specify-geom-hline-color-and-linetype-in-ggplot-legend" class="uri">https://stackoverflow.com/questions/75882636/how-to-specify-geom-hline-color-and-linetype-in-ggplot-legend</a></li>
<li><a href="https://stackoverflow.com/questions/75450315/r-facet-grid-order-not-following-levels-with-plotmath" class="uri">https://stackoverflow.com/questions/75450315/r-facet-grid-order-not-following-levels-with-plotmath</a></li>
</ul>
<div class="cell">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a vector of line style assignments</span></span>
<span id="cb21-2">LINES <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mean"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"solid"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"5%"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dotted"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"95%"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dotted"</span>)</span>
<span id="cb21-3"></span>
<span id="cb21-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Need to explicitly specify order in facet_grid even though levels already set in dataframe</span></span>
<span id="cb21-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(mean_lake_level_cc_long, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> date, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> level_m, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> lake)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb21-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_grid</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(lake, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">levels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LakeSuperior"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LakeMichiganHuron"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LakeStClair"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LakeErie"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LakeOntario"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"free"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb21-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_hline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> lake_level_stats, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mean"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">yintercept=</span>mean_level)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb21-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_hline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> lake_level_stats, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"5%"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">yintercept=</span>p05_level)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb21-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_hline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> lake_level_stats, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"95%"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">yintercept=</span>p95_level)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb21-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb21-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_linetype_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> LINES) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb21-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Lake Level (m)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb21-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">strip.text.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>),</span>
<span id="cb21-14">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text.x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>),</span>
<span id="cb21-15">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>),</span>
<span id="cb21-16">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title.x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">18</span>),</span>
<span id="cb21-17">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">18</span>))</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/great-lakes-water-levels-2024/get_plot_gl_water_levels_files/figure-html/unnamed-chunk-14-1.png" class="img-fluid figure-img" width="960"></p>
</figure>
</div>
</div>
</div>
<p>Wow, the lakes are back below their historic (2018 on) mean level. What are the driving factors behind the decrease from the recent peak period?</p>
<ul>
<li>less precipitation?</li>
<li>more evaporation?</li>
<li>less basin runoff?</li>
<li>human controls?</li>
<li>thermal contraction?</li>
<li>some combination?</li>
</ul>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{isken2025,
  author = {Isken, Mark},
  title = {Plotting {Great} {Lakes} Water Level Data Using {R} - a 2025
    Update},
  date = {2025-02-09},
  url = {https://bitsofanalytics.org//posts/great-lakes-water-levels-2024/get_plot_gl_water_levels.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-isken2025" class="csl-entry quarto-appendix-citeas">
Isken, Mark. 2025. <span>“Plotting Great Lakes Water Level Data Using R
- a 2025 Update.”</span> February 9, 2025. <a href="https://bitsofanalytics.org//posts/great-lakes-water-levels-2024/get_plot_gl_water_levels.html">https://bitsofanalytics.org//posts/great-lakes-water-levels-2024/get_plot_gl_water_levels.html</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>ecology</category>
  <guid>https://bitsofanalytics.org/posts/great-lakes-water-levels-2024/get_plot_gl_water_levels.html</guid>
  <pubDate>Sun, 09 Feb 2025 05:00:00 GMT</pubDate>
  <media:content url="https://bitsofanalytics.org/posts/great-lakes-water-levels-2024/R_logo.svg.png" medium="image" type="image/png"/>
</item>
<item>
  <title>Computing daily averages from transaction data using LLMs can be tricky - Part 1: Claude</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/llms-cycleshare-part1/daily_averages_cycleshare_part1.html</link>
  <description><![CDATA[ 




<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>A while back, I did a few blog posts on some challenges in properly computing statistics from bike share share transaction data such as the mean and 95th percentile of the number of bikes rented by day of week and hour (or any other time bin) of day.</p>
<ul>
<li><a href="https://bitsofanalytics.org/posts/daily-averages-cycleshare-part1/daily_averages_cycleshare_part1">https://bitsofanalytics.org/posts/daily-averages-cycleshare-part1/daily_averages_cycleshare_part1</a></li>
<li><a href="https://bitsofanalytics.org/posts/daily-averages-cycleshare-part2/daily_averages_cycleshare_part2">https://bitsofanalytics.org/posts/daily-averages-cycleshare-part2/daily_averages_cycleshare_part2</a></li>
</ul>
<p>When ChatGPT was launched a few years ago, this was one of the first things I tried and the results were horrifyingly wrong. Sometimes I’ll assign a similar problem in my classes using either cycle share or airline flight data. For example, I’ll ask students to compute the average number of departing flights from DTW by day of week. Several used some LLM and confidently reported that there were less than 10 flights per day on most days of the week out of DTW. Umm…</p>
<p>I fully expected LLMs to improve and figured now was a good time to see how a few of them might do on this type of problem.</p>
<p>We’ll use the <code>trip.csv</code> datafile from the <a href="https://www.kaggle.com/pronto/cycle-share-dataset">Pronto Cycleshare Dataset</a>. Let’s explore the file a bit.</p>
<div id="cell-2" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb1-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(pd.__version__)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>2.2.3</code></pre>
</div>
</div>
<div id="cell-3" class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1">trip <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/trip.csv'</span>, parse_dates <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'stoptime'</span>])</span></code></pre></div>
</div>
<div id="cell-4" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1">trip.head()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<div>
<div>


<table class="dataframe table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">trip_id</th>
<th data-quarto-table-cell-role="th">starttime</th>
<th data-quarto-table-cell-role="th">stoptime</th>
<th data-quarto-table-cell-role="th">bikeid</th>
<th data-quarto-table-cell-role="th">tripduration</th>
<th data-quarto-table-cell-role="th">from_station_name</th>
<th data-quarto-table-cell-role="th">to_station_name</th>
<th data-quarto-table-cell-role="th">from_station_id</th>
<th data-quarto-table-cell-role="th">to_station_id</th>
<th data-quarto-table-cell-role="th">usertype</th>
<th data-quarto-table-cell-role="th">gender</th>
<th data-quarto-table-cell-role="th">birthyear</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>431</td>
<td>2014-10-13 10:31:00</td>
<td>2014-10-13 10:48:00</td>
<td>SEA00298</td>
<td>985.935</td>
<td>2nd Ave &amp; Spring St</td>
<td>Occidental Park / Occidental Ave S &amp; S Washing...</td>
<td>CBD-06</td>
<td>PS-04</td>
<td>Member</td>
<td>Male</td>
<td>1960.0</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>432</td>
<td>2014-10-13 10:32:00</td>
<td>2014-10-13 10:48:00</td>
<td>SEA00195</td>
<td>926.375</td>
<td>2nd Ave &amp; Spring St</td>
<td>Occidental Park / Occidental Ave S &amp; S Washing...</td>
<td>CBD-06</td>
<td>PS-04</td>
<td>Member</td>
<td>Male</td>
<td>1970.0</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>433</td>
<td>2014-10-13 10:33:00</td>
<td>2014-10-13 10:48:00</td>
<td>SEA00486</td>
<td>883.831</td>
<td>2nd Ave &amp; Spring St</td>
<td>Occidental Park / Occidental Ave S &amp; S Washing...</td>
<td>CBD-06</td>
<td>PS-04</td>
<td>Member</td>
<td>Female</td>
<td>1988.0</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>434</td>
<td>2014-10-13 10:34:00</td>
<td>2014-10-13 10:48:00</td>
<td>SEA00333</td>
<td>865.937</td>
<td>2nd Ave &amp; Spring St</td>
<td>Occidental Park / Occidental Ave S &amp; S Washing...</td>
<td>CBD-06</td>
<td>PS-04</td>
<td>Member</td>
<td>Female</td>
<td>1977.0</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>435</td>
<td>2014-10-13 10:34:00</td>
<td>2014-10-13 10:49:00</td>
<td>SEA00202</td>
<td>923.923</td>
<td>2nd Ave &amp; Spring St</td>
<td>Occidental Park / Occidental Ave S &amp; S Washing...</td>
<td>CBD-06</td>
<td>PS-04</td>
<td>Member</td>
<td>Male</td>
<td>1971.0</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</div>
<p>I’m going to rename a few columns to match the column names I told Claude were in the file.</p>
<div id="cell-6" class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">trip <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> trip.rename(mapper <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>,</span>
<span id="cb5-2">                      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'stoptime'</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'stop_rental'</span>,</span>
<span id="cb5-3">                      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from_station_name'</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from_station'</span>,</span>
<span id="cb5-4">                      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'to_station_name'</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'to_station'</span>}, axis <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb5-5"></span>
<span id="cb5-6">trip.info()</span>
<span id="cb5-7"></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;class 'pandas.core.frame.DataFrame'&gt;
RangeIndex: 286857 entries, 0 to 286856
Data columns (total 12 columns):
 #   Column           Non-Null Count   Dtype         
---  ------           --------------   -----         
 0   trip_id          286857 non-null  int64         
 1   start_rental     286857 non-null  datetime64[ns]
 2   stop_rental      286857 non-null  datetime64[ns]
 3   bikeid           286857 non-null  object        
 4   tripduration     286857 non-null  float64       
 5   from_station     286857 non-null  object        
 6   to_station       286857 non-null  object        
 7   from_station_id  286857 non-null  object        
 8   to_station_id    286857 non-null  object        
 9   usertype         286857 non-null  object        
 10  gender           181557 non-null  object        
 11  birthyear        181553 non-null  float64       
dtypes: datetime64[ns](2), float64(2), int64(1), object(7)
memory usage: 26.3+ MB</code></pre>
</div>
</div>
<p>The dataframe is pretty big. Let’s create a smaller version to speed up testing.</p>
<div id="cell-8" class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1">trip_small <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> trip.sample(n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50000</span>)</span>
<span id="cb7-2">trip_small.to_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/bike_trips.csv'</span>, index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span></code></pre></div>
</div>
<p>Ok, let’s give Claude a whirl.</p>
</section>
<section id="claude-3.5" class="level2">
<h2 class="anchored" data-anchor-id="claude-3.5">Claude 3.5</h2>
<p>I thought I’d work up to the most complex computation by starting easy.</p>
<section id="claude-prompt-1" class="level3">
<h3 class="anchored" data-anchor-id="claude-prompt-1">Claude prompt 1</h3>
<blockquote class="blockquote">
<p>You have a csv file named bike_trips.csv. Each row is a bike rental from a bike share system. The fields start_rental and stop_rental are datetime values specifying when the bike is rented and returned. The fields from_station and to_station are strings indicating the bike rental stations that the bike was rented from and returned to. Using Python with pandas, generate the code to compute the total number of rentals from each station.</p>
</blockquote>
<p>It responded with the following.</p>
</section>
<section id="claude-response-1" class="level3">
<h3 class="anchored" data-anchor-id="claude-response-1">Claude response 1</h3>
<p>I’ll help you write Python code to analyze the bike rental data and count rentals by station.</p>
<div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb8-2"></span>
<span id="cb8-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Read the CSV file</span></span>
<span id="cb8-4">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'bike_trips.csv'</span>)</span>
<span id="cb8-5"></span>
<span id="cb8-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Count rentals from each station</span></span>
<span id="cb8-7">station_rentals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from_station'</span>].value_counts()</span>
<span id="cb8-8"></span>
<span id="cb8-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert to a DataFrame for better formatting</span></span>
<span id="cb8-10">rental_summary <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.DataFrame({</span>
<span id="cb8-11">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'station'</span>: station_rentals.index,</span>
<span id="cb8-12">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total_rentals'</span>: station_rentals.values</span>
<span id="cb8-13">})</span>
<span id="cb8-14"></span>
<span id="cb8-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sort by number of rentals in descending order</span></span>
<span id="cb8-16">rental_summary <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rental_summary.sort_values(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total_rentals'</span>, ascending<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb8-17"></span>
<span id="cb8-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Reset index for cleaner display</span></span>
<span id="cb8-19">rental_summary <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rental_summary.reset_index(drop<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb8-20"></span>
<span id="cb8-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Display the results</span></span>
<span id="cb8-22"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Total Rentals by Station:"</span>)</span>
<span id="cb8-23"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(rental_summary.to_string(index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>))</span></code></pre></div>
<p>This code will:</p>
<ul>
<li>Read your bike_trips.csv file into a pandas DataFrame</li>
<li>Use value_counts() to count how many times each station appears in the from_station column</li>
<li>Convert the results to a clean DataFrame with station names and rental counts</li>
<li>Sort the results by number of rentals in descending order</li>
<li>Display a formatted table showing each station and its total number of rentals</li>
</ul>
<p>Would you like me to modify the analysis in any way? For example, I could:</p>
<ul>
<li>Add percentage of total rentals for each station</li>
<li>Filter for a specific time period</li>
<li>Group by month or day of week</li>
<li>Include additional statistics about the rentals</li>
</ul>
</section>
<section id="my-reaction-to-response-1" class="level3">
<h3 class="anchored" data-anchor-id="my-reaction-to-response-1">My reaction to response 1</h3>
<p>Nicely done. The explanation and suggestions for future analysis are defintely reasonable. There are a few issues with the code but that’s mostly easy stuff like fixing the path to the data file. Let’s test it.</p>
<div id="cell-13" class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb9-2"></span>
<span id="cb9-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Read the CSV file</span></span>
<span id="cb9-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## MY FIX: modified the path</span></span>
<span id="cb9-5">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/bike_trips.csv'</span>)</span>
<span id="cb9-6"></span>
<span id="cb9-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Count rentals from each station</span></span>
<span id="cb9-8">station_rentals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from_station'</span>].value_counts()</span>
<span id="cb9-9"></span>
<span id="cb9-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert to a DataFrame for better formatting</span></span>
<span id="cb9-11">rental_summary <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.DataFrame({</span>
<span id="cb9-12">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'station'</span>: station_rentals.index,</span>
<span id="cb9-13">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total_rentals'</span>: station_rentals.values</span>
<span id="cb9-14">})</span>
<span id="cb9-15"></span>
<span id="cb9-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sort by number of rentals in descending order</span></span>
<span id="cb9-17">rental_summary <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rental_summary.sort_values(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total_rentals'</span>, ascending<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb9-18"></span>
<span id="cb9-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Reset index for cleaner display</span></span>
<span id="cb9-20">rental_summary <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rental_summary.reset_index(drop<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb9-21"></span>
<span id="cb9-22"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Display the results</span></span>
<span id="cb9-23"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Total Rentals by Station:"</span>)</span>
<span id="cb9-24"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(rental_summary.to_string(index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Total Rentals by Station:
                                                               station  total_rentals
                                       Pier 69 / Alaskan Way &amp; Clay St           2276
                                                  E Pine St &amp; 16th Ave           2052
                                                    3rd Ave &amp; Broad St           1915
                                                     2nd Ave &amp; Pine St           1823
                                                Westlake Ave &amp; 6th Ave           1711
                                        E Harrison St &amp; Broadway Ave E           1629
                                Cal Anderson Park / 11th Ave &amp; Pine St           1607
                                                     2nd Ave &amp; Vine St           1498
                                            REI / Yale Ave N &amp; John St           1480
                                   Key Arena / 1st Ave N &amp; Harrison St           1326
                                         PATH / 9th Ave &amp; Westlake Ave           1315
                                              15th Ave E &amp; E Thomas St           1302
                                              Summit Ave &amp; E Denny Way           1187
                                               Dexter Ave N &amp; Aloha St           1184
                                        Summit Ave E &amp; E Republican St           1150
                                                12th Ave &amp; E Mercer St           1110
                  Seattle Aquarium / Alaskan Way S &amp; Elliott Bay Trail           1086
                                                     Pine St &amp; 9th Ave           1077
                                                2nd Ave &amp; Blanchard St           1041
                                          E Blaine St &amp; Fairview Ave E           1000
                  Occidental Park / Occidental Ave S &amp; S Washington St            983
                                        Republican St &amp; Westlake Ave N            976
                             Lake Union Park / Valley St &amp; Boren Ave N            964
                                                   1st Ave &amp; Marion St            954
                                                    7th Ave &amp; Union St            950
                                                   2nd Ave &amp; Spring St            947
                                               Harvard Ave &amp; E Pine St            942
                                                 9th Ave N &amp; Mercer St            928
                                              Bellevue Ave &amp; E Pine St            916
                                                6th Ave &amp; Blanchard St            865
                                                Dexter Ave &amp; Denny Way            817
                         Seattle University / E Columbia St &amp; 12th Ave            752
                                         Eastlake Ave E &amp; E Allison St            681
                                        City Hall / 4th Ave &amp; James St            667
                     Burke-Gilman Trail / NE Blakeley St &amp; 24th Ave NE            664
        King Street Station Plaza / 2nd Ave Extension S &amp; S Jackson St            661
     Fred Hutchinson Cancer Research Center / Fairview Ave N &amp; Ward St            646
                                                12th Ave &amp; E Denny Way            644
                                             12th Ave &amp; NE Campus Pkwy            601
                             Frye Art Museum / Terry Ave &amp; Columbia St            566
                                              15th Ave NE &amp; NE 40th St            489
                                                 6th Ave S &amp; S King St            480
                                                    Union St &amp; 4th Ave            475
              UW Engineering Library / E Stevens Way NE &amp; Jefferson Rd            438
                                              NE 47th St &amp; 12th Ave NE            393
                     Burke Museum / E Stevens Way NE &amp; Memorial Way NE            392
                                                Terry Ave &amp; Stewart St            361
                                     UW Intramural Activities Building            303
                  Children's Hospital / Sandpoint Way NE &amp; 40th Ave NE            278
                                        NE 42nd St &amp; University Way NE            271
UW Magnuson Health Sciences Center Rotunda / Columbia Rd &amp; San Juan Rd            238
                                              Broadway and E Denny Way            224
                                       NE Pacific St/UW Medical Center            171
                                                 Mercer St &amp; 9th Ave N            165
                                               12th Ave &amp; E Yesler Way            153
                                          UW McCarty Hall / Whitman Ct            124
                                                 Thomas St &amp; 5th Ave N            123
                                       Pier 66 / Alaskan Way &amp; Bell St             27
                                                             8D OPS 02              2</code></pre>
</div>
</div>
<p>Works correctly.</p>
</section>
<section id="claude-prompt-2" class="level3">
<h3 class="anchored" data-anchor-id="claude-prompt-2">Claude prompt 2</h3>
<blockquote class="blockquote">
<p>Great. Now using Python and pandas, compute the average length of time in hours that bikes are rented.</p>
</blockquote>
<p>It responded with the following code. Again, there were a few easily fixable issues:</p>
<ul>
<li>modified the path to point to the data subfolder (I should have told Claude the path in the first place)</li>
<li>could have used the <code>parse_dates =</code> parameter to do the datetime conversion on the file read, but no harm.</li>
<li>forgot the import for matplotlib</li>
</ul>
</section>
<section id="claude-response-2" class="level3">
<h3 class="anchored" data-anchor-id="claude-response-2">Claude response 2</h3>
<div id="cell-16" class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb11-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## MY FIX: added necessary import</span></span>
<span id="cb11-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> matplotlib.pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb11-4"></span>
<span id="cb11-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Read the CSV file</span></span>
<span id="cb11-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## MY FIX: modified the path</span></span>
<span id="cb11-7">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/bike_trips.csv'</span>)</span>
<span id="cb11-8"></span>
<span id="cb11-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert datetime strings to datetime objects</span></span>
<span id="cb11-10">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.to_datetime(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>])</span>
<span id="cb11-11">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'stop_rental'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.to_datetime(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'stop_rental'</span>])</span>
<span id="cb11-12"></span>
<span id="cb11-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate duration in hours for each rental</span></span>
<span id="cb11-14">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_duration'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'stop_rental'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>]).dt.total_seconds() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3600</span></span>
<span id="cb11-15"></span>
<span id="cb11-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate average duration</span></span>
<span id="cb11-17">average_duration <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_duration'</span>].mean()</span>
<span id="cb11-18"></span>
<span id="cb11-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate additional statistics</span></span>
<span id="cb11-20">duration_stats <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {</span>
<span id="cb11-21">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_hours'</span>: average_duration,</span>
<span id="cb11-22">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'median_hours'</span>: df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_duration'</span>].median(),</span>
<span id="cb11-23">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'min_hours'</span>: df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_duration'</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>(),</span>
<span id="cb11-24">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'max_hours'</span>: df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_duration'</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>(),</span>
<span id="cb11-25">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'std_hours'</span>: df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_duration'</span>].std()</span>
<span id="cb11-26">}</span>
<span id="cb11-27"></span>
<span id="cb11-28"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Print results</span></span>
<span id="cb11-29"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Rental Duration Statistics (in hours):"</span>)</span>
<span id="cb11-30"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> stat, value <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> duration_stats.items():</span>
<span id="cb11-31">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>stat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>replace(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'_'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">' '</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>title()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>value<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb11-32"></span>
<span id="cb11-33"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a histogram of rental durations for rentals under 24 hours</span></span>
<span id="cb11-34">short_rentals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_duration'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">24</span>]</span>
<span id="cb11-35">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>))</span>
<span id="cb11-36">plt.hist(short_rentals[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rental_duration'</span>], bins<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">48</span>, edgecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'black'</span>)</span>
<span id="cb11-37">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Distribution of Rental Durations (≤ 24 hours)'</span>)</span>
<span id="cb11-38">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Duration (hours)'</span>)</span>
<span id="cb11-39">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Number of Rentals'</span>)</span>
<span id="cb11-40">plt.grid(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)</span>
<span id="cb11-41">plt.show()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Rental Duration Statistics (in hours):
Average Hours: 0.32
Median Hours: 0.17
Min Hours: -0.28
Max Hours: 7.98
Std Hours: 0.56</code></pre>
</div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/llms-cycleshare-part1/daily_averages_cycleshare_part1_files/figure-html/cell-8-output-2.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="my-reaction-to-response-2" class="level3">
<h3 class="anchored" data-anchor-id="my-reaction-to-response-2">My reaction to response 2</h3>
<p>Let’s confirm that the negative minimum is correct. Nice touch with the histogram.</p>
<div id="cell-18" class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1">trip_small[trip_small[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'stop_rental'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> trip_small[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>]]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="8">
<div>
<div>


<table class="dataframe table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">trip_id</th>
<th data-quarto-table-cell-role="th">start_rental</th>
<th data-quarto-table-cell-role="th">stop_rental</th>
<th data-quarto-table-cell-role="th">bikeid</th>
<th data-quarto-table-cell-role="th">tripduration</th>
<th data-quarto-table-cell-role="th">from_station</th>
<th data-quarto-table-cell-role="th">to_station</th>
<th data-quarto-table-cell-role="th">from_station_id</th>
<th data-quarto-table-cell-role="th">to_station_id</th>
<th data-quarto-table-cell-role="th">usertype</th>
<th data-quarto-table-cell-role="th">gender</th>
<th data-quarto-table-cell-role="th">birthyear</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">57832</td>
<td>8660</td>
<td>2014-11-02 01:29:00</td>
<td>2014-11-02 01:12:00</td>
<td>SEA00384</td>
<td>2571.017</td>
<td>Pine St &amp; 9th Ave</td>
<td>Westlake Ave &amp; 6th Ave</td>
<td>SLU-16</td>
<td>SLU-15</td>
<td>Short-Term Pass Holder</td>
<td>NaN</td>
<td>NaN</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</div>
<p>Great. Nice job, Claude.</p>
</section>
<section id="claude-prompt-3" class="level3">
<h3 class="anchored" data-anchor-id="claude-prompt-3">Claude prompt 3</h3>
<blockquote class="blockquote">
<p>Great. Now using Python and pandas, compute the average number of bike rentals by day of week.</p>
</blockquote>
</section>
<section id="claude-response-3" class="level3">
<h3 class="anchored" data-anchor-id="claude-response-3">Claude response 3</h3>
<p>I’ll help you analyze the rental patterns by day of week.</p>
<div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb14-2"></span>
<span id="cb14-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Read the CSV file</span></span>
<span id="cb14-4">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'bike_trips.csv'</span>)</span>
<span id="cb14-5"></span>
<span id="cb14-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert start_rental to datetime</span></span>
<span id="cb14-7">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.to_datetime(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>])</span>
<span id="cb14-8"></span>
<span id="cb14-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract day of week (0 = Monday, 6 = Sunday)</span></span>
<span id="cb14-10">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].dt.dayofweek</span>
<span id="cb14-11"></span>
<span id="cb14-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create day name column for better readability</span></span>
<span id="cb14-13">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_name'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].dt.day_name()</span>
<span id="cb14-14"></span>
<span id="cb14-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate average rentals by day of week</span></span>
<span id="cb14-16">daily_rentals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_name'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].count().reset_index()</span>
<span id="cb14-17">daily_rentals.columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total_rentals'</span>]</span>
<span id="cb14-18"></span>
<span id="cb14-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate the number of unique dates for each day of week</span></span>
<span id="cb14-20">unique_days <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_name'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].dt.date.nunique().reset_index()</span>
<span id="cb14-21">unique_days.columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'number_of_days'</span>]</span>
<span id="cb14-22"></span>
<span id="cb14-23"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Merge and calculate average</span></span>
<span id="cb14-24">daily_averages <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.merge(daily_rentals, unique_days, on<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>)</span>
<span id="cb14-25">daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total_rentals'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'number_of_days'</span>]</span>
<span id="cb14-26"></span>
<span id="cb14-27"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sort by day of week (Monday first)</span></span>
<span id="cb14-28">day_order <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Monday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Tuesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Wednesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Thursday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Friday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Saturday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Sunday'</span>]</span>
<span id="cb14-29">daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.Categorical(daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>], categories<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>day_order, ordered<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb14-30">daily_averages <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daily_averages.sort_values(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>)</span>
<span id="cb14-31"></span>
<span id="cb14-32"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Round averages to 1 decimal place</span></span>
<span id="cb14-33">daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb14-34"></span>
<span id="cb14-35"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Display results</span></span>
<span id="cb14-36"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Average Daily Rentals by Day of Week:"</span>)</span>
<span id="cb14-37"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(daily_averages[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>]].to_string(index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>))</span>
<span id="cb14-38"></span>
<span id="cb14-39"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a bar plotPython</span></span>
<span id="cb14-40">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>))</span>
<span id="cb14-41">plt.bar(daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>], daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>])</span>
<span id="cb14-42">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Average Number of Bike Rentals by Day of Week'</span>)</span>
<span id="cb14-43">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Day of Week'</span>)</span>
<span id="cb14-44">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Average Number of Rentals'</span>)</span>
<span id="cb14-45">plt.xticks(rotation<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">45</span>)</span>
<span id="cb14-46">plt.grid(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)</span>
<span id="cb14-47">plt.tight_layout()</span>
<span id="cb14-48">plt.show()</span></code></pre></div>
<p>This code will:</p>
<ul>
<li>Convert the start_rental timestamps to datetime objects</li>
<li>Extract the day of week from each rental</li>
<li>Count the total rentals for each day</li>
<li>Calculate the number of unique dates for each day of week (to account for the total number of each day in the dataset)</li>
<li>Compute the average daily rentals</li>
<li>Create a bar plot visualizing the daily patterns</li>
</ul>
<p>I’ve included:</p>
<ul>
<li>Proper handling of the day names (sorted from Monday to Sunday)</li>
<li>Both total counts and averages to account for any differences in the number of each day in your dataset</li>
<li>A bar plot to visualize the weekly pattern</li>
</ul>
<p>Would you like me to modify the analysis? For example, I could: - Break down the daily patterns by station - Add hour-of-day analysis within each day - Compare weekday vs weekend patterns - Add seasonal trends by day of week</p>
<p>Let’s try out the code.</p>
<div id="cell-23" class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb15-2"></span>
<span id="cb15-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## MY FIX: added necessary import</span></span>
<span id="cb15-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> matplotlib.pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb15-5"></span>
<span id="cb15-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Read the CSV file</span></span>
<span id="cb15-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## MY FIX: modified the path</span></span>
<span id="cb15-8">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/bike_trips.csv'</span>)</span>
<span id="cb15-9"></span>
<span id="cb15-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert start_rental to datetime</span></span>
<span id="cb15-11">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.to_datetime(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>])</span>
<span id="cb15-12"></span>
<span id="cb15-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract day of week (0 = Monday, 6 = Sunday)</span></span>
<span id="cb15-14">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].dt.dayofweek</span>
<span id="cb15-15"></span>
<span id="cb15-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create day name column for better readability</span></span>
<span id="cb15-17">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_name'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].dt.day_name()</span>
<span id="cb15-18"></span>
<span id="cb15-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate average rentals by day of week</span></span>
<span id="cb15-20">daily_rentals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_name'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].count().reset_index()</span>
<span id="cb15-21">daily_rentals.columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total_rentals'</span>]</span>
<span id="cb15-22"></span>
<span id="cb15-23"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate the number of unique dates for each day of week</span></span>
<span id="cb15-24">unique_days <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_name'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].dt.date.nunique().reset_index()</span>
<span id="cb15-25">unique_days.columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'number_of_days'</span>]</span>
<span id="cb15-26"></span>
<span id="cb15-27"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Merge and calculate average</span></span>
<span id="cb15-28">daily_averages <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.merge(daily_rentals, unique_days, on<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>)</span>
<span id="cb15-29">daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total_rentals'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'number_of_days'</span>]</span>
<span id="cb15-30"></span>
<span id="cb15-31"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sort by day of week (Monday first)</span></span>
<span id="cb15-32">day_order <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Monday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Tuesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Wednesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Thursday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Friday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Saturday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Sunday'</span>]</span>
<span id="cb15-33">daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.Categorical(daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>], categories<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>day_order, ordered<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb15-34">daily_averages <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daily_averages.sort_values(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>)</span>
<span id="cb15-35"></span>
<span id="cb15-36"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Round averages to 1 decimal place</span></span>
<span id="cb15-37">daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb15-38"></span>
<span id="cb15-39"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Display results</span></span>
<span id="cb15-40"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Average Daily Rentals by Day of Week:"</span>)</span>
<span id="cb15-41"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(daily_averages[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>]].to_string(index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>))</span>
<span id="cb15-42"></span>
<span id="cb15-43"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a bar plot</span></span>
<span id="cb15-44">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>))</span>
<span id="cb15-45">plt.bar(daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>], daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>])</span>
<span id="cb15-46">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Average Number of Bike Rentals by Day of Week'</span>)</span>
<span id="cb15-47">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Day of Week'</span>)</span>
<span id="cb15-48">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Average Number of Rentals'</span>)</span>
<span id="cb15-49">plt.xticks(rotation<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">45</span>)</span>
<span id="cb15-50">plt.grid(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)</span>
<span id="cb15-51">plt.tight_layout()</span>
<span id="cb15-52">plt.show()</span></code></pre></div>
<div class="cell-output cell-output-error">
<pre><code>AttributeError: 'SeriesGroupBy' object has no attribute 'dt'</code></pre>
</div>
</div>
</section>
<section id="my-reaction-to-response-3" class="level3">
<h3 class="anchored" data-anchor-id="my-reaction-to-response-3">My reaction to response 3</h3>
<p>Well, we have a problem. Reading through the code, Claude definitely has the right idea (partially). He first creates a dataframe with the counts of the total number of rides by day of week.</p>
<div id="cell-25" class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate average rentals by day of week</span></span>
<span id="cb17-2">daily_rentals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_name'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].count().reset_index()</span>
<span id="cb17-3">daily_rentals.columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total_rentals'</span>]</span></code></pre></div>
</div>
<div id="cell-26" class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb18-1">daily_rentals</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="11">
<div>
<div>


<table class="dataframe table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">day_of_week</th>
<th data-quarto-table-cell-role="th">total_rentals</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>Friday</td>
<td>7403</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>Monday</td>
<td>7480</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>Saturday</td>
<td>6848</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>Sunday</td>
<td>5706</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>Thursday</td>
<td>7659</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">5</td>
<td>Tuesday</td>
<td>7451</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">6</td>
<td>Wednesday</td>
<td>7453</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</div>
<p>Now Claude attempts to figure out how many of each weekday appear in the range of dates represented in the dataframe.</p>
<div id="cell-28" class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate the number of unique dates for each day of week</span></span>
<span id="cb19-2">unique_days <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_name'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].dt.date.nunique().reset_index()</span></code></pre></div>
<div class="cell-output cell-output-error">
<pre><code>AttributeError: 'SeriesGroupBy' object has no attribute 'dt'</code></pre>
</div>
</div>
<p>While the <code>dt</code> accessor is usable with a <code>Series</code> or <code>Dataframe</code> object, it’s not usable with a <code>SeriesGroupBy</code> object. Here’s how we can do this while still using the same approach as Claude.</p>
<div id="cell-30" class="cell" data-execution_count="13">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb21-1">unique_dates_df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.DataFrame(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].dt.date.unique(), columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>])</span>
<span id="cb21-2">unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.to_datetime(unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>])</span>
<span id="cb21-3">unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>].dt.dayofweek</span>
<span id="cb21-4">unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_name'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>].dt.day_name()</span>
<span id="cb21-5"></span>
<span id="cb21-6">unique_days <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> unique_dates_df.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_name'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>].size().reset_index()</span>
<span id="cb21-7">unique_days.columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'number_of_days'</span>]</span></code></pre></div>
</div>
<div id="cell-31" class="cell" data-execution_count="14">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb22-1">unique_days</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="14">
<div>
<div>


<table class="dataframe table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">day_of_week</th>
<th data-quarto-table-cell-role="th">number_of_days</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>Friday</td>
<td>98</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>Monday</td>
<td>99</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>Saturday</td>
<td>98</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>Sunday</td>
<td>98</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>Thursday</td>
<td>98</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">5</td>
<td>Tuesday</td>
<td>99</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">6</td>
<td>Wednesday</td>
<td>99</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</div>
<p>Now, the rest of the code should work (hopefully).</p>
<div id="cell-33" class="cell" data-execution_count="15">
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb23-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Merge and calculate average</span></span>
<span id="cb23-2">daily_averages <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.merge(daily_rentals, unique_days, on<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>)</span>
<span id="cb23-3">daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total_rentals'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'number_of_days'</span>]</span>
<span id="cb23-4"></span>
<span id="cb23-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sort by day of week (Monday first)</span></span>
<span id="cb23-6">day_order <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Monday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Tuesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Wednesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Thursday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Friday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Saturday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Sunday'</span>]</span>
<span id="cb23-7">daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.Categorical(daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>], categories<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>day_order, ordered<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb23-8">daily_averages <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daily_averages.sort_values(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>)</span>
<span id="cb23-9"></span>
<span id="cb23-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Round averages to 1 decimal place</span></span>
<span id="cb23-11">daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb23-12"></span>
<span id="cb23-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Display results</span></span>
<span id="cb23-14"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Average Daily Rentals by Day of Week:"</span>)</span>
<span id="cb23-15"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(daily_averages[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>]].to_string(index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>))</span>
<span id="cb23-16"></span>
<span id="cb23-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a bar plot</span></span>
<span id="cb23-18">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>))</span>
<span id="cb23-19">plt.bar(daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>], daily_averages[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>])</span>
<span id="cb23-20">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Average Number of Bike Rentals by Day of Week'</span>)</span>
<span id="cb23-21">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Day of Week'</span>)</span>
<span id="cb23-22">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Average Number of Rentals'</span>)</span>
<span id="cb23-23">plt.xticks(rotation<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">45</span>)</span>
<span id="cb23-24">plt.grid(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)</span>
<span id="cb23-25">plt.tight_layout()</span>
<span id="cb23-26">plt.show()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Average Daily Rentals by Day of Week:
day_of_week  average_rentals
     Monday             75.6
    Tuesday             75.3
  Wednesday             75.3
   Thursday             78.2
     Friday             75.5
   Saturday             69.9
     Sunday             58.2</code></pre>
</div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/llms-cycleshare-part1/daily_averages_cycleshare_part1_files/figure-html/cell-16-output-2.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Well, only one major error. I think a pandas novice might have a tough time getting this code working.</p>
<p>But, there’s a more subtle issue here. This approach works fine for computing means under the assumption that we have at least one rental on every date in the range spanned by the dataset (more on this shortly). It’s not going to work if we ask for something like a percentile. In the case of means, we can easily do sums and counts if we don’t have to worry about the case of a date having zero rentals. If we are also interested in such statistics at the station level, then we are likely going to get dates with zero rentals. You can’t ignore these zero periods when computing ride volume statistics. If you do, you will overestimate the statistic (because the 0’s aren’t being included).</p>
</section>
<section id="claude-prompt-4" class="level3">
<h3 class="anchored" data-anchor-id="claude-prompt-4">Claude prompt 4</h3>
<blockquote class="blockquote">
<p>Great. Now using Python and pandas, compute the average number of bike rentals by day of week and hour of day.</p>
</blockquote>
</section>
<section id="claude-response-4" class="level3">
<h3 class="anchored" data-anchor-id="claude-response-4">Claude response 4</h3>
<p>I’ll help you analyze the rental patterns by both day of week and hour of day.</p>
<p><strong>I moved the code below</strong></p>
<p>This code creates a comprehensive analysis of rental patterns by day and hour:</p>
<ol type="1">
<li>Creates a heatmap showing:
<ul>
<li>Days of week on the y-axis</li>
<li>Hours of day on the x-axis</li>
<li>Color intensity indicating average number of rentals</li>
<li>Actual values annotated in each cell</li>
</ul></li>
<li>Provides summary statistics including:
<ul>
<li>Top 5 peak usage times (day and hour combinations)</li>
<li>Busiest hours averaged across all days</li>
</ul></li>
<li>Uses proper averaging by:
<ul>
<li>Counting total rentals for each day-hour combination</li>
<li>Dividing by the number of each unique day in the dataset</li>
<li>Handling day sorting correctly (Monday through Sunday)</li>
</ul></li>
</ol>
<p>The heatmap visualization makes it easy to: - Identify rush hour patterns - Compare weekday vs weekend patterns - Spot peak usage times - See the daily cycle of rentals</p>
<p>Would you like me to modify the analysis? I could: - Add seasonal patterns to see how these patterns change throughout the year - Break down patterns by station - Add weather data correlation if available - Create separate visualizations for weekdays vs weekends - Calculate additional statistics about peak vs off-peak usage</p>
</section>
<section id="my-reaction-to-response-4" class="level3">
<h3 class="anchored" data-anchor-id="my-reaction-to-response-4">My reaction to response 4</h3>
<p>Just looking through the code, we’re going to have the same error we say about regarding using the <code>dt</code> accessor on a <code>SeriesGroupBy</code> object. Let’s explore the code a bit. I’ll do it in pieces.</p>
<div id="cell-38" class="cell" data-execution_count="16">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb25-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb25-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> seaborn <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> sns</span>
<span id="cb25-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> matplotlib.pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb25-4"></span>
<span id="cb25-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Read the CSV file</span></span>
<span id="cb25-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## MY FIX: modified the path</span></span>
<span id="cb25-7">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/bike_trips.csv'</span>)</span>
<span id="cb25-8"></span>
<span id="cb25-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert start_rental to datetime</span></span>
<span id="cb25-10">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.to_datetime(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>])</span>
<span id="cb25-11"></span>
<span id="cb25-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract day of week and hour</span></span>
<span id="cb25-13">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].dt.day_name()</span>
<span id="cb25-14">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'hour'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].dt.hour</span></code></pre></div>
</div>
<div id="cell-39" class="cell" data-execution_count="17">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb26-1">df.info()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;class 'pandas.core.frame.DataFrame'&gt;
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   trip_id          50000 non-null  int64         
 1   start_rental     50000 non-null  datetime64[ns]
 2   stop_rental      50000 non-null  object        
 3   bikeid           50000 non-null  object        
 4   tripduration     50000 non-null  float64       
 5   from_station     50000 non-null  object        
 6   to_station       50000 non-null  object        
 7   from_station_id  50000 non-null  object        
 8   to_station_id    50000 non-null  object        
 9   usertype         50000 non-null  object        
 10  gender           31726 non-null  object        
 11  birthyear        31726 non-null  float64       
 12  day_of_week      50000 non-null  object        
 13  hour             50000 non-null  int32         
dtypes: datetime64[ns](1), float64(2), int32(1), int64(1), object(9)
memory usage: 5.1+ MB</code></pre>
</div>
</div>
<p>So far, so good.</p>
<div id="cell-41" class="cell" data-execution_count="18">
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb28-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Count rentals by day and hour</span></span>
<span id="cb28-2">rentals_by_day_hour <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'hour'</span>]).size().reset_index()</span>
<span id="cb28-3">rentals_by_day_hour.columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'hour'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'count'</span>]</span></code></pre></div>
</div>
<div id="cell-42" class="cell" data-execution_count="19">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb29-1">rentals_by_day_hour</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="19">
<div>
<div>


<table class="dataframe table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">day_of_week</th>
<th data-quarto-table-cell-role="th">hour</th>
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>Friday</td>
<td>0</td>
<td>38</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>Friday</td>
<td>1</td>
<td>22</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>Friday</td>
<td>2</td>
<td>19</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>Friday</td>
<td>3</td>
<td>13</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>Friday</td>
<td>4</td>
<td>18</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">163</td>
<td>Wednesday</td>
<td>19</td>
<td>355</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">164</td>
<td>Wednesday</td>
<td>20</td>
<td>260</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">165</td>
<td>Wednesday</td>
<td>21</td>
<td>202</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">166</td>
<td>Wednesday</td>
<td>22</td>
<td>127</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">167</td>
<td>Wednesday</td>
<td>23</td>
<td>62</td>
</tr>
</tbody>
</table>

<p>168 rows × 3 columns</p>
</div>
</div>
</div>
</div>
<p>Ok, here comes the anticipated problems.</p>
<div id="cell-44" class="cell" data-execution_count="20">
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb30-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate unique days for each day of week</span></span>
<span id="cb30-2">unique_days <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].dt.date.nunique()</span></code></pre></div>
<div class="cell-output cell-output-error">
<pre><code>AttributeError: 'SeriesGroupBy' object has no attribute 'dt'</code></pre>
</div>
</div>
<p>Yep, same error. Let’s adapt the code we wrote to do what Claude is tryng to do.</p>
<div id="cell-46" class="cell" data-execution_count="21">
<div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb32-1">unique_dates_df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.DataFrame(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].dt.date.unique(), columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>])</span>
<span id="cb32-2">unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.to_datetime(unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>])</span>
<span id="cb32-3">unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>].dt.dayofweek</span>
<span id="cb32-4">unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_name'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> unique_dates_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>].dt.day_name()</span>
<span id="cb32-5"></span>
<span id="cb32-6">unique_days <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> unique_dates_df.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_name'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>].size()</span>
<span id="cb32-7">unique_days.columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'number_of_days'</span>]</span></code></pre></div>
</div>
<div id="cell-47" class="cell" data-execution_count="22">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb33-1">unique_days</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="22">
<pre><code>day_name
Friday       98
Monday       99
Saturday     98
Sunday       98
Thursday     98
Tuesday      99
Wednesday    99
Name: date, dtype: int64</code></pre>
</div>
</div>
<p>Okay, here’s Claude’s final step in computing the average number of bike rentals by day of week and hour of day.</p>
<div id="cell-49" class="cell" data-execution_count="23">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb35-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate averages</span></span>
<span id="cb35-2">rentals_by_day_hour[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rentals_by_day_hour.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">apply</span>(</span>
<span id="cb35-3">    <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: x[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'count'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> unique_days[x[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>]], </span>
<span id="cb35-4">    axis<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb35-5">)</span></code></pre></div>
</div>
<div id="cell-50" class="cell" data-execution_count="24">
<div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb36-1">rentals_by_day_hour</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="24">
<div>
<div>


<table class="dataframe table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">day_of_week</th>
<th data-quarto-table-cell-role="th">hour</th>
<th data-quarto-table-cell-role="th">count</th>
<th data-quarto-table-cell-role="th">average_rentals</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>Friday</td>
<td>0</td>
<td>38</td>
<td>0.387755</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>Friday</td>
<td>1</td>
<td>22</td>
<td>0.224490</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>Friday</td>
<td>2</td>
<td>19</td>
<td>0.193878</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>Friday</td>
<td>3</td>
<td>13</td>
<td>0.132653</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>Friday</td>
<td>4</td>
<td>18</td>
<td>0.183673</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">163</td>
<td>Wednesday</td>
<td>19</td>
<td>355</td>
<td>3.585859</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">164</td>
<td>Wednesday</td>
<td>20</td>
<td>260</td>
<td>2.626263</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">165</td>
<td>Wednesday</td>
<td>21</td>
<td>202</td>
<td>2.040404</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">166</td>
<td>Wednesday</td>
<td>22</td>
<td>127</td>
<td>1.282828</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">167</td>
<td>Wednesday</td>
<td>23</td>
<td>62</td>
<td>0.626263</td>
</tr>
</tbody>
</table>

<p>168 rows × 4 columns</p>
</div>
</div>
</div>
</div>
<div id="cell-51" class="cell" data-execution_count="25">
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb37-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Pivot the data for heatmap</span></span>
<span id="cb37-2">heatmap_data <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rentals_by_day_hour.pivot(</span>
<span id="cb37-3">    index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>,</span>
<span id="cb37-4">    columns<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'hour'</span>,</span>
<span id="cb37-5">    values<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span></span>
<span id="cb37-6">)</span>
<span id="cb37-7"></span>
<span id="cb37-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sort days in correct order</span></span>
<span id="cb37-9">day_order <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Monday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Tuesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Wednesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Thursday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Friday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Saturday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Sunday'</span>]</span>
<span id="cb37-10">heatmap_data <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> heatmap_data.reindex(day_order)</span>
<span id="cb37-11"></span>
<span id="cb37-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create heatmap</span></span>
<span id="cb37-13">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>))</span>
<span id="cb37-14">sns.heatmap(</span>
<span id="cb37-15">    heatmap_data,</span>
<span id="cb37-16">    cmap<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'YlOrRd'</span>,</span>
<span id="cb37-17">    cbar_kws<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>{<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'label'</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Average Number of Rentals'</span>},</span>
<span id="cb37-18">    fmt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'.1f'</span>,</span>
<span id="cb37-19">    annot<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb37-20">)</span>
<span id="cb37-21">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Average Bike Rentals by Day of Week and Hour'</span>)</span>
<span id="cb37-22">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Hour of Day'</span>)</span>
<span id="cb37-23">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Day of Week'</span>)</span>
<span id="cb37-24">plt.tight_layout()</span>
<span id="cb37-25">plt.show()</span>
<span id="cb37-26"></span>
<span id="cb37-27"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Print summary statistics</span></span>
<span id="cb37-28"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Peak Usage Times:"</span>)</span>
<span id="cb37-29"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Find top 5 busiest day-hour combinations</span></span>
<span id="cb37-30">peak_times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rentals_by_day_hour.nlargest(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>)</span>
<span id="cb37-31"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> _, row <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> peak_times.iterrows():</span>
<span id="cb37-32">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>row[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> at </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>row[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'hour'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:02d}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">:00 - </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>row[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> average rentals"</span>)</span>
<span id="cb37-33"></span>
<span id="cb37-34"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate overall hourly patterns</span></span>
<span id="cb37-35">hourly_avg <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rentals_by_day_hour.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'hour'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'average_rentals'</span>].mean()</span>
<span id="cb37-36"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Busiest Hours (averaged across all days):"</span>)</span>
<span id="cb37-37"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> hour <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> hourly_avg.nlargest(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>).index:</span>
<span id="cb37-38">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>hour<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:02d}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">:00 - </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>hourly_avg[hour]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> average rentals"</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/llms-cycleshare-part1/daily_averages_cycleshare_part1_files/figure-html/cell-26-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>
Peak Usage Times:
Tuesday at 08:00 - 8.9 average rentals
Wednesday at 17:00 - 8.6 average rentals
Monday at 17:00 - 8.6 average rentals
Thursday at 08:00 - 8.5 average rentals
Tuesday at 17:00 - 8.3 average rentals

Busiest Hours (averaged across all days):
17:00 - 7.3 average rentals
16:00 - 6.0 average rentals
08:00 - 6.0 average rentals</code></pre>
</div>
</div>
<p>Well, that seems impressive. Again, it only works with means as long as there are no dates with zero volume.</p>
</section>
<section id="claude-prompt-5" class="level3">
<h3 class="anchored" data-anchor-id="claude-prompt-5">Claude prompt 5</h3>
<p>Now let’s add computing the 95th percentile of rental volume.</p>
<blockquote class="blockquote">
<p>Using Python and pandas, can you compute the mean and 95th percentile of the number of bikes rented by day of week?</p>
</blockquote>
</section>
<section id="claude-response-5" class="level3">
<h3 class="anchored" data-anchor-id="claude-response-5">Claude response 5</h3>
<p><strong>I moved the code below</strong></p>
<p>This code:</p>
<ol type="1">
<li>Groups rentals by date and day of week to get daily counts</li>
<li>Calculates the mean and 95th percentile for each day of the week</li>
<li>Presents the results in a clear tabular format</li>
<li>Provides summary statistics including overall means and peak days</li>
</ol>
<p>Would you like me to add any additional statistics or modify the analysis in any way?</p>
</section>
<section id="my-reaction-to-response-5" class="level3">
<h3 class="anchored" data-anchor-id="my-reaction-to-response-5">My reaction to response 5</h3>
<p>Let’s work our way through the code.</p>
<div id="cell-56" class="cell" data-execution_count="26">
<div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb39-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb39-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb39-3"></span>
<span id="cb39-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Read the CSV file</span></span>
<span id="cb39-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## MY FIX: modified the path</span></span>
<span id="cb39-6">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/bike_trips.csv'</span>)</span>
<span id="cb39-7"></span>
<span id="cb39-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert start_rental to datetime</span></span>
<span id="cb39-9">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.to_datetime(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>])</span>
<span id="cb39-10"></span>
<span id="cb39-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract date and day of week</span></span>
<span id="cb39-12">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].dt.date</span>
<span id="cb39-13">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].dt.day_name()</span>
<span id="cb39-14"></span>
<span id="cb39-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get daily counts</span></span>
<span id="cb39-16">daily_counts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>]).size().reset_index(name<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'count'</span>)</span></code></pre></div>
</div>
<div id="cell-57" class="cell" data-execution_count="27">
<div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb40-1">daily_counts</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="27">
<div>
<div>


<table class="dataframe table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">date</th>
<th data-quarto-table-cell-role="th">day_of_week</th>
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>2014-10-13</td>
<td>Monday</td>
<td>146</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>2014-10-14</td>
<td>Tuesday</td>
<td>158</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>2014-10-15</td>
<td>Wednesday</td>
<td>111</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>2014-10-16</td>
<td>Thursday</td>
<td>150</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>2014-10-17</td>
<td>Friday</td>
<td>92</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">684</td>
<td>2016-08-27</td>
<td>Saturday</td>
<td>65</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">685</td>
<td>2016-08-28</td>
<td>Sunday</td>
<td>80</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">686</td>
<td>2016-08-29</td>
<td>Monday</td>
<td>64</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">687</td>
<td>2016-08-30</td>
<td>Tuesday</td>
<td>59</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">688</td>
<td>2016-08-31</td>
<td>Wednesday</td>
<td>54</td>
</tr>
</tbody>
</table>

<p>689 rows × 3 columns</p>
</div>
</div>
</div>
</div>
<p>Well this is already much better than earlier versions of LLMs did on this problem. Claude computed the number of rentals by date, which was not done when just computing the mean number of rentals. To do percentiles, we need the full dataset of rides by date, whereas for a mean we could get by with a summation of rides by day of week and a count of the number of days of the week in the date range of the dataset.</p>
<div id="cell-59" class="cell" data-execution_count="28">
<div class="sourceCode cell-code" id="cb41" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb41-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate statistics by day of week</span></span>
<span id="cb41-2">stats <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daily_counts.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>).agg({</span>
<span id="cb41-3">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'count'</span>: [</span>
<span id="cb41-4">        (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>),</span>
<span id="cb41-5">        (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'95th_percentile'</span>, <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: np.percentile(x, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">95</span>))</span>
<span id="cb41-6">    ]</span>
<span id="cb41-7">}).<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb41-8"></span>
<span id="cb41-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Flatten column names and reset index</span></span>
<span id="cb41-10">stats.columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'95th_percentile'</span>]</span>
<span id="cb41-11">stats <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> stats.reset_index()</span>
<span id="cb41-12"></span>
<span id="cb41-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sort days in correct order</span></span>
<span id="cb41-14">day_order <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Monday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Tuesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Wednesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Thursday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Friday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Saturday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Sunday'</span>]</span>
<span id="cb41-15">stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.Categorical(stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>], categories<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>day_order, ordered<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb41-16">stats <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> stats.sort_values(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>)</span>
<span id="cb41-17"></span>
<span id="cb41-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Display results</span></span>
<span id="cb41-19"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Bike Rental Statistics by Day of Week:"</span>)</span>
<span id="cb41-20"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"(All values rounded to 1 decimal place)"</span>)</span>
<span id="cb41-21"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Day of Week | Mean Rentals | 95th Percentile"</span>)</span>
<span id="cb41-22"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"-"</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">45</span>)</span>
<span id="cb41-23"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> _, row <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> stats.iterrows():</span>
<span id="cb41-24">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>row[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&lt;10}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> | </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>row[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;11.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> | </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>row[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'95th_percentile'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;15.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb41-25"></span>
<span id="cb41-26"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate overall summary statistics</span></span>
<span id="cb41-27"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Summary Statistics:"</span>)</span>
<span id="cb41-28"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Overall Mean Daily Rentals: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>mean()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb41-29"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Highest Mean Daily Rentals: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> (</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>stats<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>loc[stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>].idxmax(), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">)"</span>)</span>
<span id="cb41-30"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Highest 95th Percentile: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'95th_percentile'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> (</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>stats<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>loc[stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'95th_percentile'</span>].idxmax(), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">)"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Bike Rental Statistics by Day of Week:
(All values rounded to 1 decimal place)

Day of Week | Mean Rentals | 95th Percentile
---------------------------------------------
Monday     |        75.6 |           135.1
Tuesday    |        75.3 |           123.0
Wednesday  |        75.3 |           127.2
Thursday   |        78.2 |           133.4
Friday     |        75.5 |           123.7
Saturday   |        69.9 |           144.0
Sunday     |        58.2 |           103.2

Summary Statistics:
Overall Mean Daily Rentals: 72.6
Highest Mean Daily Rentals: 78.2 (Thursday)
Highest 95th Percentile: 144.0 (Saturday)</code></pre>
</div>
</div>
<p>Impressive. Again, this approach assumes we have at least one rental on each date in the range of dates spanned by our dataset. It’s a totally reasonable assumption for the entire bike share system. But, what happens if we want to do this analysis at the station level and not all stations have rentals on every date? Will Claude handle this correctly?</p>
</section>
<section id="claude-prompt-6" class="level3">
<h3 class="anchored" data-anchor-id="claude-prompt-6">Claude prompt 6</h3>
<blockquote class="blockquote">
<p>Using Python and pandas, can you compute the mean and 95th percentile of the number of bikes rented by day of week for station CH-06?</p>
<h3 id="my-reaction-to-response-6" class="anchored">My reaction to response 6</h3>
</blockquote>
<p>Let’s jump right to the generated code.</p>
<div id="cell-62" class="cell" data-execution_count="29">
<div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb43-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb43-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb43-3"></span>
<span id="cb43-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Read the CSV file</span></span>
<span id="cb43-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## MY FIX: modified the path</span></span>
<span id="cb43-6">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/bike_trips.csv'</span>)</span>
<span id="cb43-7"></span>
<span id="cb43-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert start_rental to datetime</span></span>
<span id="cb43-9">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.to_datetime(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>])</span>
<span id="cb43-10"></span>
<span id="cb43-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Filter for station CH-06 (12th Ave &amp; E Denny Way)</span></span>
<span id="cb43-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## MY FIX: Using the station name instead of the ID because that's the field I told Claude about</span></span>
<span id="cb43-13">station_df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from_station'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'12th Ave &amp; E Denny Way'</span>]</span>
<span id="cb43-14"></span>
<span id="cb43-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract date and day of week</span></span>
<span id="cb43-16">station_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> station_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].dt.date</span>
<span id="cb43-17">station_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> station_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].dt.day_name()</span>
<span id="cb43-18"></span>
<span id="cb43-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get daily counts</span></span>
<span id="cb43-20">daily_counts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> station_df.groupby([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>]).size().reset_index(name<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'count'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>/tmp/ipykernel_22505/1411289504.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  station_df['date'] = station_df['start_rental'].dt.date
/tmp/ipykernel_22505/1411289504.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  station_df['day_of_week'] = station_df['start_rental'].dt.day_name()</code></pre>
</div>
</div>
<div id="cell-63" class="cell" data-execution_count="30">
<div class="sourceCode cell-code" id="cb45" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb45-1">daily_counts</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="30">
<div>
<div>


<table class="dataframe table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">date</th>
<th data-quarto-table-cell-role="th">day_of_week</th>
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>2014-10-13</td>
<td>Monday</td>
<td>1</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>2014-10-14</td>
<td>Tuesday</td>
<td>3</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>2014-10-15</td>
<td>Wednesday</td>
<td>3</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>2014-10-17</td>
<td>Friday</td>
<td>1</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>2014-10-19</td>
<td>Sunday</td>
<td>2</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">399</td>
<td>2016-08-20</td>
<td>Saturday</td>
<td>1</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">400</td>
<td>2016-08-24</td>
<td>Wednesday</td>
<td>1</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">401</td>
<td>2016-08-25</td>
<td>Thursday</td>
<td>1</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">402</td>
<td>2016-08-26</td>
<td>Friday</td>
<td>1</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">403</td>
<td>2016-08-27</td>
<td>Saturday</td>
<td>1</td>
</tr>
</tbody>
</table>

<p>404 rows × 3 columns</p>
</div>
</div>
</div>
</div>
<p>Uh oh. There are only 404 dates on which there were one or more rentals. On the other <img src="https://latex.codecogs.com/png.latex?689-404"> days, there were zero rentals. We need these 285 zero values to compute the correct mean and 95th percentile of the number of rentals by day of week at this station. Ignoring the zero values will inflate the statistics.</p>
<div id="cell-65" class="cell" data-execution_count="31">
<div class="sourceCode cell-code" id="cb46" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb46-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate statistics by day of week</span></span>
<span id="cb46-2">stats <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daily_counts.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>).agg({</span>
<span id="cb46-3">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'count'</span>: [</span>
<span id="cb46-4">        (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>),</span>
<span id="cb46-5">        (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'95th_percentile'</span>, <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: np.percentile(x, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">95</span>))</span>
<span id="cb46-6">    ]</span>
<span id="cb46-7">}).<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb46-8"></span>
<span id="cb46-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Flatten column names and reset index</span></span>
<span id="cb46-10">stats.columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'95th_percentile'</span>]</span>
<span id="cb46-11">stats <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> stats.reset_index()</span>
<span id="cb46-12"></span>
<span id="cb46-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sort days in correct order</span></span>
<span id="cb46-14">day_order <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Monday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Tuesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Wednesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Thursday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Friday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Saturday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Sunday'</span>]</span>
<span id="cb46-15">stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.Categorical(stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>], categories<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>day_order, ordered<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb46-16">stats <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> stats.sort_values(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>)</span>
<span id="cb46-17"></span>
<span id="cb46-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Display results</span></span>
<span id="cb46-19"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Bike Rental Statistics for Station CH-06 by Day of Week:"</span>)</span>
<span id="cb46-20"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"(All values rounded to 1 decimal place)"</span>)</span>
<span id="cb46-21"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Day of Week | Mean Rentals | 95th Percentile"</span>)</span>
<span id="cb46-22"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"-"</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">45</span>)</span>
<span id="cb46-23"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> _, row <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> stats.iterrows():</span>
<span id="cb46-24">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>row[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&lt;10}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> | </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>row[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;11.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> | </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>row[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'95th_percentile'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;15.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb46-25"></span>
<span id="cb46-26"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate overall summary statistics</span></span>
<span id="cb46-27"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Summary Statistics for Station CH-06:"</span>)</span>
<span id="cb46-28"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Overall Mean Daily Rentals: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>mean()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb46-29"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Highest Mean Daily Rentals: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> (</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>stats<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>loc[stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>].idxmax(), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">)"</span>)</span>
<span id="cb46-30"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Highest 95th Percentile: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'95th_percentile'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> (</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>stats<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>loc[stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'95th_percentile'</span>].idxmax(), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">)"</span>)</span>
<span id="cb46-31"></span>
<span id="cb46-32"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add total number of rentals info</span></span>
<span id="cb46-33">total_rentals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(station_df)</span>
<span id="cb46-34">total_days <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(daily_counts[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>].unique())</span>
<span id="cb46-35"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Total number of rentals from CH-06: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>total_rentals<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb46-36"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Number of days in dataset: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>total_days<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Bike Rental Statistics for Station CH-06 by Day of Week:
(All values rounded to 1 decimal place)

Day of Week | Mean Rentals | 95th Percentile
---------------------------------------------
Monday     |         1.7 |             3.0
Tuesday    |         1.6 |             3.0
Wednesday  |         1.5 |             3.0
Thursday   |         1.7 |             3.0
Friday     |         1.6 |             3.0
Saturday   |         1.5 |             3.0
Sunday     |         1.5 |             2.0

Summary Statistics for Station CH-06:
Overall Mean Daily Rentals: 1.6
Highest Mean Daily Rentals: 1.7 (Monday)
Highest 95th Percentile: 3.0 (Monday)

Total number of rentals from CH-06: 644
Number of days in dataset: 404</code></pre>
</div>
</div>
<p>On the surface looks reasonable, but these values are all inflated due to ignoring the zero days.</p>
<p>Let’s compute the correct statistics.</p>
<p>We’ll start by getting the date range of our dataset.</p>
<div id="cell-68" class="cell" data-execution_count="32">
<div class="sourceCode cell-code" id="cb48" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb48-1">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'tripdate'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_rental'</span>].dt.date</span>
<span id="cb48-2">min_date <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'tripdate'</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>()</span>
<span id="cb48-3">max_date <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'tripdate'</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>()</span>
<span id="cb48-4">num_days <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> max_date <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> min_date <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> pd.Timedelta(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'d'</span>)</span>
<span id="cb48-5"></span>
<span id="cb48-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>num_days<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>days<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> days - </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>min_date<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> to </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>max_date<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>689 days - 2014-10-13 to 2016-08-31</code></pre>
</div>
</div>
<p>Now we’ll create a dataframe containing all the dates.</p>
<div id="cell-70" class="cell" data-execution_count="33">
<div class="sourceCode cell-code" id="cb50" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb50-1">dates_seeded <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.DataFrame(pd.date_range(min_date, max_date), columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'tripdate'</span>])</span>
<span id="cb50-2">dates_seeded[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dates_seeded[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'tripdate'</span>].dt.day_name()</span>
<span id="cb50-3">dates_seeded</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="33">
<div>
<div>


<table class="dataframe table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">tripdate</th>
<th data-quarto-table-cell-role="th">day_of_week</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>2014-10-13</td>
<td>Monday</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>2014-10-14</td>
<td>Tuesday</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>2014-10-15</td>
<td>Wednesday</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>2014-10-16</td>
<td>Thursday</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>2014-10-17</td>
<td>Friday</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">...</td>
<td>...</td>
<td>...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">684</td>
<td>2016-08-27</td>
<td>Saturday</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">685</td>
<td>2016-08-28</td>
<td>Sunday</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">686</td>
<td>2016-08-29</td>
<td>Monday</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">687</td>
<td>2016-08-30</td>
<td>Tuesday</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">688</td>
<td>2016-08-31</td>
<td>Wednesday</td>
</tr>
</tbody>
</table>

<p>689 rows × 2 columns</p>
</div>
</div>
</div>
</div>
<div id="cell-71" class="cell" data-execution_count="34">
<div class="sourceCode cell-code" id="cb51" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb51-1">dates_seeded.info()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;class 'pandas.core.frame.DataFrame'&gt;
RangeIndex: 689 entries, 0 to 688
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   tripdate     689 non-null    datetime64[ns]
 1   day_of_week  689 non-null    object        
dtypes: datetime64[ns](1), object(1)
memory usage: 10.9+ KB</code></pre>
</div>
</div>
<p>Now do a left join with <code>daily_counts</code> and <code>dates_seeded</code>.</p>
<div id="cell-73" class="cell" data-execution_count="35">
<div class="sourceCode cell-code" id="cb53" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb53-1">daily_counts[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'tripdate'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.to_datetime(daily_counts[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>])</span>
<span id="cb53-2">trips_by_date_merged <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.merge(dates_seeded, daily_counts, how<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'left'</span>, </span>
<span id="cb53-3">      left_on<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'tripdate'</span>, right_on<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'tripdate'</span>, sort<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, suffixes<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'_y'</span>))</span>
<span id="cb53-4"></span>
<span id="cb53-5">trips_by_date_merged</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="35">
<div>
<div>


<table class="dataframe table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">tripdate</th>
<th data-quarto-table-cell-role="th">day_of_week</th>
<th data-quarto-table-cell-role="th">date</th>
<th data-quarto-table-cell-role="th">day_of_week_y</th>
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>2014-10-13</td>
<td>Monday</td>
<td>2014-10-13</td>
<td>Monday</td>
<td>1.0</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>2014-10-14</td>
<td>Tuesday</td>
<td>2014-10-14</td>
<td>Tuesday</td>
<td>3.0</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>2014-10-15</td>
<td>Wednesday</td>
<td>2014-10-15</td>
<td>Wednesday</td>
<td>3.0</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>2014-10-16</td>
<td>Thursday</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>2014-10-17</td>
<td>Friday</td>
<td>2014-10-17</td>
<td>Friday</td>
<td>1.0</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">684</td>
<td>2016-08-27</td>
<td>Saturday</td>
<td>2016-08-27</td>
<td>Saturday</td>
<td>1.0</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">685</td>
<td>2016-08-28</td>
<td>Sunday</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">686</td>
<td>2016-08-29</td>
<td>Monday</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">687</td>
<td>2016-08-30</td>
<td>Tuesday</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">688</td>
<td>2016-08-31</td>
<td>Wednesday</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
</tbody>
</table>

<p>689 rows × 5 columns</p>
</div>
</div>
</div>
</div>
<p>You can see the <code>NaN</code> values for the missing dates. We need to update the counts to 0.</p>
<div id="cell-75" class="cell" data-execution_count="36">
<div class="sourceCode cell-code" id="cb54" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb54-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fill in any missing values with 0.</span></span>
<span id="cb54-2">trips_by_date_merged[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'count'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> trips_by_date_merged[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'count'</span>].fillna(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb54-3">trips_by_date_merged</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="36">
<div>
<div>


<table class="dataframe table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">tripdate</th>
<th data-quarto-table-cell-role="th">day_of_week</th>
<th data-quarto-table-cell-role="th">date</th>
<th data-quarto-table-cell-role="th">day_of_week_y</th>
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>2014-10-13</td>
<td>Monday</td>
<td>2014-10-13</td>
<td>Monday</td>
<td>1.0</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>2014-10-14</td>
<td>Tuesday</td>
<td>2014-10-14</td>
<td>Tuesday</td>
<td>3.0</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>2014-10-15</td>
<td>Wednesday</td>
<td>2014-10-15</td>
<td>Wednesday</td>
<td>3.0</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>2014-10-16</td>
<td>Thursday</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>2014-10-17</td>
<td>Friday</td>
<td>2014-10-17</td>
<td>Friday</td>
<td>1.0</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">684</td>
<td>2016-08-27</td>
<td>Saturday</td>
<td>2016-08-27</td>
<td>Saturday</td>
<td>1.0</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">685</td>
<td>2016-08-28</td>
<td>Sunday</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">686</td>
<td>2016-08-29</td>
<td>Monday</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">687</td>
<td>2016-08-30</td>
<td>Tuesday</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">688</td>
<td>2016-08-31</td>
<td>Wednesday</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
</tr>
</tbody>
</table>

<p>689 rows × 5 columns</p>
</div>
</div>
</div>
</div>
<p>Now we can use the rest of Claude’s code (with appropriate dataframe):</p>
<div id="cell-77" class="cell" data-execution_count="37">
<div class="sourceCode cell-code" id="cb55" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb55-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate statistics by day of week</span></span>
<span id="cb55-2">stats <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> trips_by_date_merged.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>).agg({</span>
<span id="cb55-3">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'count'</span>: [</span>
<span id="cb55-4">        (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>),</span>
<span id="cb55-5">        (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'95th_percentile'</span>, <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: np.percentile(x, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">95</span>))</span>
<span id="cb55-6">    ]</span>
<span id="cb55-7">}).<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb55-8"></span>
<span id="cb55-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Flatten column names and reset index</span></span>
<span id="cb55-10">stats.columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'95th_percentile'</span>]</span>
<span id="cb55-11">stats <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> stats.reset_index()</span>
<span id="cb55-12"></span>
<span id="cb55-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sort days in correct order</span></span>
<span id="cb55-14">day_order <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Monday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Tuesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Wednesday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Thursday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Friday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Saturday'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Sunday'</span>]</span>
<span id="cb55-15">stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.Categorical(stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>], categories<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>day_order, ordered<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb55-16">stats <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> stats.sort_values(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>)</span>
<span id="cb55-17"></span>
<span id="cb55-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Display results</span></span>
<span id="cb55-19"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Bike Rental Statistics for Station CH-06 by Day of Week:"</span>)</span>
<span id="cb55-20"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"(All values rounded to 1 decimal place)"</span>)</span>
<span id="cb55-21"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Day of Week | Mean Rentals | 95th Percentile"</span>)</span>
<span id="cb55-22"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"-"</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">45</span>)</span>
<span id="cb55-23"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> _, row <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> stats.iterrows():</span>
<span id="cb55-24">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>row[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&lt;10}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> | </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>row[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;11.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> | </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>row[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'95th_percentile'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;15.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb55-25"></span>
<span id="cb55-26"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate overall summary statistics</span></span>
<span id="cb55-27"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Summary Statistics for Station CH-06:"</span>)</span>
<span id="cb55-28"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Overall Mean Daily Rentals: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>mean()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb55-29"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Highest Mean Daily Rentals: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> (</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>stats<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>loc[stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>].idxmax(), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">)"</span>)</span>
<span id="cb55-30"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Highest 95th Percentile: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'95th_percentile'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> (</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>stats<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>loc[stats[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'95th_percentile'</span>].idxmax(), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">)"</span>)</span>
<span id="cb55-31"></span>
<span id="cb55-32"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add total number of rentals info</span></span>
<span id="cb55-33">total_rentals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(station_df)</span>
<span id="cb55-34">total_days <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(trips_by_date_merged[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'tripdate'</span>].unique())</span>
<span id="cb55-35"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Total number of rentals from CH-06: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>total_rentals<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb55-36"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Number of days in dataset: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>total_days<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Bike Rental Statistics for Station CH-06 by Day of Week:
(All values rounded to 1 decimal place)

Day of Week | Mean Rentals | 95th Percentile
---------------------------------------------
Monday     |         1.0 |             3.0
Tuesday    |         0.9 |             3.0
Wednesday  |         1.1 |             3.0
Thursday   |         1.2 |             3.0
Friday     |         0.9 |             2.1
Saturday   |         0.7 |             2.0
Sunday     |         0.7 |             2.0

Summary Statistics for Station CH-06:
Overall Mean Daily Rentals: 0.9
Highest Mean Daily Rentals: 1.2 (Thursday)
Highest 95th Percentile: 3.0 (Monday)

Total number of rentals from CH-06: 644
Number of days in dataset: 689</code></pre>
</div>
</div>
<p>Now all the values are lower than the inflated values from Claude’s attempt.</p>
<p>Just for fun, let’s plot the number of trips by date so we can actually see the zero days and see that the station appears to have been open for the entire analysis period. Such a plot would be part of the process of determining the analysis timeframe.</p>
<div id="cell-80" class="cell" data-execution_count="38">
<div class="sourceCode cell-code" id="cb57" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb57-1">trips_by_date_merged.plot(y<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'count'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/llms-cycleshare-part1/daily_averages_cycleshare_part1_files/figure-html/cell-39-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>In this particular case, differences in computed stats, while not huge, are certainly present. Taking zero days into account is important in computing temporal statistics such averages and percentiles of number of trips per day.</p>
</section>
</section>
<section id="wrap-up" class="level2">
<h2 class="anchored" data-anchor-id="wrap-up">Wrap up</h2>
<p>While there have been improvements, Claude 3.5 Sonnet is not up to the task of correctly computing statistics such as the mean and 95th percentile of rental volume by day of week for cases in which there are dates with no rentals. It takes a naive approach that inflates the statistics in this case. The differences might not be readily apparent to the analyst that does not carefully check the code and output generated by Claude.</p>
<p>In the next part of this series, I’ll see how MS Copilot does on this same problem.</p>
<p>There are additional subtleties involved in analyzing bike share data. See my previous posts for more on this.</p>
<ul>
<li><a href="https://bitsofanalytics.org/posts/daily-averages-cycleshare-part1/daily_averages_cycleshare_part1">https://bitsofanalytics.org/posts/daily-averages-cycleshare-part1/daily_averages_cycleshare_part1</a></li>
<li><a href="https://bitsofanalytics.org/posts/daily-averages-cycleshare-part2/daily_averages_cycleshare_part2">https://bitsofanalytics.org/posts/daily-averages-cycleshare-part2/daily_averages_cycleshare_part2</a></li>
</ul>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{isken2025,
  author = {Isken, Mark},
  title = {Computing Daily Averages from Transaction Data Using {LLMs}
    Can Be Tricky - {Part} 1: {Claude}},
  date = {2025-02-07},
  url = {https://bitsofanalytics.org//posts/llms-cycleshare-part1/daily_averages_cycleshare_part1.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-isken2025" class="csl-entry quarto-appendix-citeas">
Isken, Mark. 2025. <span>“Computing Daily Averages from Transaction Data
Using LLMs Can Be Tricky - Part 1: Claude.”</span> February 7, 2025. <a href="https://bitsofanalytics.org//posts/llms-cycleshare-part1/daily_averages_cycleshare_part1.html">https://bitsofanalytics.org//posts/llms-cycleshare-part1/daily_averages_cycleshare_part1.html</a>.
</div></div></section></div> ]]></description>
  <category>python</category>
  <category>pandas</category>
  <category>bikeshare</category>
  <category>llm</category>
  <category>ai</category>
  <guid>https://bitsofanalytics.org/posts/llms-cycleshare-part1/daily_averages_cycleshare_part1.html</guid>
  <pubDate>Fri, 07 Feb 2025 05:00:00 GMT</pubDate>
</item>
<item>
  <title>Allen Downey’s keynote at posit::conf(2024)</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/dow24pos/dow24pos.html</link>
  <description><![CDATA[ 




<p><a href="https://www.allendowney.com/wp/">Allen Downey</a> gave a keynote address at the posit::conf(2024) conference this year. I’ve followed his writings (e.g., <a href="https://greenteapress.com/wp/think-python-3rd-edition/">Think Python</a> and <a href="https://greenteapress.com/wp/modsimpy/">Modeling and Simulation in Python</a>) for quite a while and have been inspired by his clear explanations and talent for making tricky quantitative concepts more understandable.</p>
<p>His blog, <a href="https://www.allendowney.com/blog/">Probably Overthinking It</a>, is packed with thought provoking explorations of a wide range of probability and stats problems. No wonder the blog led to a <a href="https://greenteapress.com/wp/probably-overthinking-it/">book</a> of the same name.</p>
<p>Recently, he released yet another book, <a href="https://allendowney.github.io/ElementsOfDataScience/">Elements of Data Science</a>, which introduces data science to those with little to no programming background. It has its roots in work he did while visiting at Harvard which led to <a href="https://hdsr.mitpress.mit.edu/pub/kk63wpuo/release/2">this paper</a>.</p>
<p>So, when I saw he was giving a keynote at this year’s Posit conference, I knew it would be good. Then I forgot about it. Thankfully, he posted a link in LinkedIn to the <a href="http://tinyurl.com/dow24pos">slides from his talk</a>. The title is “A future of data science” and I can tell that the slides were created with <a href="https://quarto.org/">Quarto</a> - it’s my new goto for slidemaking. And of course, the keynote delivers. It hits home on many levels, but none more so than the topic of doomscrolling and the question of how the world is really doing. As an amateur student of nature, there seems to be quite a bit of actual doom that is warranted. For now, I’ll leave it at that and try to follow his lead in doing more careful analysis of things.</p>
<p>I’m mostly making this post to remind myself to read more of AD’s work and do more analysis of things that I find interesting and important. That’s what the great teachers do - inspire us to keep learning. Thanks AD.</p>



<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{isken2024,
  author = {Isken, Mark},
  title = {Allen {Downey’s} Keynote at Posit::conf(2024)},
  date = {2024-08-15},
  url = {https://bitsofanalytics.org//posts/dow24pos/dow24pos.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-isken2024" class="csl-entry quarto-appendix-citeas">
Isken, Mark. 2024. <span>“Allen Downey’s Keynote at
Posit::conf(2024).”</span> August 15, 2024. <a href="https://bitsofanalytics.org//posts/dow24pos/dow24pos.html">https://bitsofanalytics.org//posts/dow24pos/dow24pos.html</a>.
</div></div></section></div> ]]></description>
  <category>data science</category>
  <guid>https://bitsofanalytics.org/posts/dow24pos/dow24pos.html</guid>
  <pubDate>Thu, 15 Aug 2024 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Create an eBird DuckDB database for use with R</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/ebird_duckdb/duckdb_create_database.html</link>
  <description><![CDATA[ 




<section id="the-challenge" class="level2">
<h2 class="anchored" data-anchor-id="the-challenge">The challenge</h2>
<p>There is a large amount of data available in the <a href="https://science.ebird.org/en/use-ebird-data">eBird database</a> based on user submitted eBird checklists. Any eBird user can apply for access and download all or a subset of the data as tab delimited text files. One file contains what is known as the eBird Basic Data (EBD) and the other contains Sampling Event Data (SED). For the EBD, each row is an individual species observation on an individual eBird checklist. The SED file contains one row per checklist. There is well over 100Gb of data available representing hundreds of millions of checklists submitted from around the world. A Custom Download page makes it easy to select a subset of the data for download based on a given species, region, or time period.</p>
<p>Before attempting to load eBird data into R dataframes for analysis, I worked through the extremely informative <a href="https://ebird.github.io/ebird-best-practices/">Best Practices for Using eBird Data</a> guide. If you haven’t already, you should do this before attempting the kind of thing described in this blog post.</p>
<p>It’s not terribly difficult to bump up against memory (RAM) limits if you get a little over eager and try to load a large eBird dataset into an R dataframe. Thankfully, there are various ways to use R for analysis on data that resides in an external database. That’s what this post is all about. I decided to try out the newish <a href="https://duckdb.org/">DuckDB database</a> which can be used with R and which bills itself as a “fast, in-process, analytical database”. Seems perfect for analyzing large eBird datasets.</p>
<p>However, creating and populating a DuckDB database for a large eBird data download can be challenging. The main tool for working with downloaded eBird data is the R library, <code>auk</code>. In addition to facilitating filtering eBird data, the <code>auk</code> package is used for several important data preparation steps. Since R wants to load data entirely into memory, large datasets can be problematic. In this post we will show how you can load eBird data into a DuckDB database to facilitate analysis of large eBird data files. Along the way we will learn about various other challenges associated with eBird data.</p>
</section>
<section id="the-first-analysis-task" class="level2">
<h2 class="anchored" data-anchor-id="the-first-analysis-task">The first analysis task</h2>
<p>Let’s start by analyzing a small amount of eBird data using R. We will:</p>
<ul>
<li>use <code>auk</code> to create small text files of observations and checklists,</li>
<li>use <code>auk::read_edb()</code> and <code>auk::read_sampling()</code> to read the text files into R dataframes,</li>
<li>create an empty database and load the R dataframes into database tables using the <code>duckdb</code>, <code>dbplyr</code> and <code>DBI</code> packages,</li>
<li>run some basic <code>dplyr</code> queries on the observation and sampling event data.</li>
</ul>
<p>After that, we’ll explore the challenges that arise when the eBird text files are not small and show how we can load large datasets into a DuckDB database and query it with <code>dplyr</code> or <code>duckplyr</code> (a “drop-in” replacement for <code>dplyr</code>).</p>
<p>For this part we will need the following libraries.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(auk)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dplyr)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggplot2)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(lubridate)</span></code></pre></div>
</div>
</section>
<section id="filtering-ebird-data-with-auk" class="level2">
<h2 class="anchored" data-anchor-id="filtering-ebird-data-with-auk">Filtering eBird data with auk</h2>
<p>The <a href="https://cornelllabofornithology.github.io/auk/index.html">auk: eBird Data Extraction and Processing in R</a> package makes it easy to create and apply text filters to eBird data to create smaller datasets that can comfortably be analyzed in R. Under the hood, <code>auk</code> uses <a href="eBird Data Extraction and Processing in R">AWK</a>, the widely available Linux/Mac utility (available for Windows via <a href="https://cygwin.com/">Cygwin</a>) for text file processing. The output of the filtering step are just text files - one for observations and one for sampling events. These text files are formatted exactly like the original text files <a href="https://science.ebird.org/en/use-ebird-data">downloaded from eBird</a>.</p>
<p><code>auk</code> makes it easy to filter by various criterion:</p>
<ul>
<li>species,</li>
<li>observation date,</li>
<li>region (country, state, BCR, bounding box),</li>
<li>time of day,</li>
<li>effort (duration, distance),</li>
<li>specific projects (e.g.&nbsp;breeding bird survey),</li>
<li>protocol (e.g.&nbsp;traveling),</li>
<li>complete checklist status.</li>
</ul>
<p>On June 19, 2024 I submitted a request for all of the eBird data for the state of Michigan. On the 15th of each month, the previous month’s data becomes available. So, I would get data through May of 2024. The next day I received an email from eBird with a link to a compressed archive containing the raw data as well as various metadata files.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ls</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-l</span> data/ebd_US-MI_smp_relMay-2024/</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>total 12673100
-rw-rw-r-- 1 mark mark        1960 Jun 18 13:09 BCRCodes.txt
-rw-rw-r-- 1 mark mark      632528 Jun 18 13:09 BirdLifeKBACodes.txt
-rw-rw-r-- 1 mark mark   654510629 Jun 18 13:06 ebd_US-MI_smp_relMay-2024_sampling.txt
-rw-rw-r-- 1 mark mark 12321474745 Jun 18 10:25 ebd_US-MI_smp_relMay-2024.txt
-rw-rw-r-- 1 mark mark      431439 Jun 18 13:09 eBird_Basic_Dataset_Metadata_v1.15.pdf
-rw-rw-r-- 1 mark mark      126957 Jun 18 13:09 IBACodes.txt
-rw-rw-r-- 1 mark mark         103 Jun 18 13:09 recommended_citation.txt
-rw-rw-r-- 1 mark mark        6703 Jun 18 13:09 terms_of_use.txt
-rw-rw-r-- 1 mark mark       39670 Jun 18 13:09 USFWSCodes.txt</code></pre>
</div>
</div>
<p>The <code>ebd_US-MI_smp_relMay-2024.txt</code> file is a little over 12.3Gb and contains the EBD data. We’ll refer to these as the <em>observations</em>. The <code>ebd_US-MI_smp_relMay-2024_sampling.txt</code> file contains the SED data - these are the <em>sampling events</em>. The sampling event file is smaller as it contains one row per checklist whereas the observations include, for each checklist, a row with information relating to every species observed. Clearly, we are not going to be attempting to load a 12.3Gb file into an R dataframe.</p>
<p>We’ll start by using <code>auk</code> to filter the data to create a small sample dataset for exploration.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set the filenames for the checklist and sample data</span></span>
<span id="cb4-2">f_ebd <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/ebd_US-MI_smp_relMay-2024/ebd_US-MI_smp_relMay-2024.txt'</span></span>
<span id="cb4-3">f_smp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/ebd_US-MI_smp_relMay-2024/ebd_US-MI_smp_relMay-2024_sampling.txt'</span></span></code></pre></div>
</div>
<p>Next we create an <code>auk_ebd</code> object. This object does not contain the raw data. It’s just an object for storing pointers to our data files and various filters we might add later.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">ebdsmp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">auk_ebd</span>(f_ebd, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file_sampling =</span> f_smp)</span></code></pre></div>
</div>
<p>Let’s start by looking at one day of data to avoid problems related to giant dataframes. If you’ve used <code>auk</code> before, the following two items will be familiar to you. If not, don’t worry about it as we’ll discuss them later.</p>
<ul>
<li>we will not collapse the observation and sampling event data into a single dataframe as this will greatly reduce the data storage requirements,</li>
<li>we will not use <code>auk_zerofill()</code> either as we aren’t interested right now in absence data and this will keep the file size smaller.</li>
</ul>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">rerun <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span></span>
<span id="cb6-2"></span>
<span id="cb6-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Specify output filenames for the filtered data</span></span>
<span id="cb6-4">file_ebd <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/ebd_20240519_MI.txt"</span></span>
<span id="cb6-5">file_smp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/smp_20240519_MI.txt"</span></span>
<span id="cb6-6">  </span>
<span id="cb6-7"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> (rerun) {</span>
<span id="cb6-8">  list_year <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2024</span></span>
<span id="cb6-9">  ebdsmp <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb6-10">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># date: use standard ISO date format `"YYYY-MM-DD"` with * for wildcard</span></span>
<span id="cb6-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">auk_date</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(list_year,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"-05-19"</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(list_year,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"-05-19"</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb6-12">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># complete: all species seen or heard are recorded (this field is in ebd)</span></span>
<span id="cb6-13">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">auk_complete</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb6-14">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">auk_filter</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> file_ebd, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file_sampling =</span> file_smp, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overwrite =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>) </span>
<span id="cb6-15">}</span></code></pre></div>
</div>
<p>In our case, we have downloaded all of the eBird data for the state of Michigan. Our filtering step just ensures we are only working with complete checklists and the filtered text files are still over 11Gb. Specifically:</p>
<ul>
<li>this data is <strong>NOT</strong> zero filled,</li>
<li>this data has <strong>NOT</strong> had shared checklists collapsed,</li>
<li>this data has <strong>NOT</strong> had taxonomy rollup done.</li>
</ul>
<section id="reading-filtered-text-files-into-r-dataframes" class="level3">
<h3 class="anchored" data-anchor-id="reading-filtered-text-files-into-r-dataframes">Reading filtered text files into R dataframes</h3>
<p>Now we can use <code>auk::read_ebd()</code> and <code>auk::read_sampling()</code></p>
<div class="cell">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">observations_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_ebd</span>(file_ebd, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">unique =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb7-2">sampling_events_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_sampling</span>(file_smp, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">unique =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div>
</div>
<p>How big are the two dataframes?</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(observations_df)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 31100</code></pre>
</div>
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(sampling_events_df)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 1451</code></pre>
</div>
</div>
<p>The column names have been cleaned up and some data types changed.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str</span>(observations_df)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>tibble [31,100 × 48] (S3: tbl_df/tbl/data.frame)
 $ checklist_id             : chr [1:31100] "G12405482" "G12405614" "G12405614" "G12405614" ...
 $ global_unique_identifier : chr [1:31100] "URN:CornellLabOfOrnithology:EBIRD:OBS2171548530" "URN:CornellLabOfOrnithology:EBIRD:OBS2171639660" "URN:CornellLabOfOrnithology:EBIRD:OBS2171639653" "URN:CornellLabOfOrnithology:EBIRD:OBS2171639654" ...
 $ last_edited_date         : chr [1:31100] "2024-05-20 12:33:05.670199" "2024-05-19 06:00:38.809563" "2024-05-19 06:00:38.809563" "2024-05-19 06:00:38.809563" ...
 $ taxonomic_order          : num [1:31100] 9011 33252 324 21168 6479 ...
 $ category                 : chr [1:31100] "species" "species" "species" "species" ...
 $ taxon_concept_id         : chr [1:31100] "avibase-BDBB04D5" "avibase-730AC20E" "avibase-B59E1863" "avibase-69544B59" ...
 $ common_name              : chr [1:31100] "Barred Owl" "Red-winged Blackbird" "Canada Goose" "American Crow" ...
 $ scientific_name          : chr [1:31100] "Strix varia" "Agelaius phoeniceus" "Branta canadensis" "Corvus brachyrhynchos" ...
 $ exotic_code              : chr [1:31100] NA NA NA NA ...
 $ observation_count        : chr [1:31100] "3" "1" "13" "1" ...
 $ breeding_code            : chr [1:31100] NA NA NA NA ...
 $ breeding_category        : chr [1:31100] NA NA NA NA ...
 $ behavior_code            : chr [1:31100] NA NA NA NA ...
 $ age_sex                  : chr [1:31100] NA NA NA NA ...
 $ country                  : chr [1:31100] "United States" "United States" "United States" "United States" ...
 $ country_code             : chr [1:31100] "US" "US" "US" "US" ...
 $ state                    : chr [1:31100] "Michigan" "Michigan" "Michigan" "Michigan" ...
 $ state_code               : chr [1:31100] "US-MI" "US-MI" "US-MI" "US-MI" ...
 $ county                   : chr [1:31100] "Gladwin" "Iosco" "Iosco" "Iosco" ...
 $ county_code              : chr [1:31100] "US-MI-051" "US-MI-069" "US-MI-069" "US-MI-069" ...
 $ iba_code                 : chr [1:31100] "US-MI_3775" "US-MI_3811" "US-MI_3811" "US-MI_3811" ...
 $ bcr_code                 : int [1:31100] 12 NA NA NA NA NA NA NA NA NA ...
 $ usfws_code               : chr [1:31100] NA NA NA NA ...
 $ atlas_block              : chr [1:31100] NA NA NA NA ...
 $ locality                 : chr [1:31100] "Molasses River Retreat" "Tawas Bay" "Tawas Bay" "Tawas Bay" ...
 $ locality_id              : chr [1:31100] "L26321465" "L1843221" "L1843221" "L1843221" ...
 $ locality_type            : chr [1:31100] "P" "H" "H" "H" ...
 $ latitude                 : num [1:31100] 43.9 44.3 44.3 44.3 44.3 ...
 $ longitude                : num [1:31100] -84.3 -83.5 -83.5 -83.5 -83.5 ...
 $ observation_date         : Date[1:31100], format: "2024-05-19" "2024-05-19" ...
 $ time_observations_started: chr [1:31100] "04:47:00" "05:54:00" "05:54:00" "05:54:00" ...
 $ observer_id              : chr [1:31100] "obsr1077102,obsr502473" "obsr167561,obsr273509" "obsr167561,obsr273509" "obsr167561,obsr273509" ...
 $ sampling_event_identifier: chr [1:31100] "S175525573,S175525574" "S175533693,S175533694" "S175533693,S175533694" "S175533693,S175533694" ...
 $ protocol_type            : chr [1:31100] "Stationary" "Traveling" "Traveling" "Traveling" ...
 $ protocol_code            : chr [1:31100] "P21" "P22" "P22" "P22" ...
 $ project_code             : chr [1:31100] "EBIRD" "EBIRD" "EBIRD" "EBIRD" ...
 $ duration_minutes         : int [1:31100] 6 4 4 4 4 4 4 4 4 4 ...
 $ effort_distance_km       : num [1:31100] NA 2.72 2.72 2.72 2.72 2.72 2.72 2.72 2.72 2.72 ...
 $ effort_area_ha           : num [1:31100] NA NA NA NA NA NA NA NA NA NA ...
 $ number_observers         : int [1:31100] 2 2 2 2 2 2 2 2 2 2 ...
 $ all_species_reported     : logi [1:31100] TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ group_identifier         : chr [1:31100] "G12405482" "G12405614" "G12405614" "G12405614" ...
 $ has_media                : logi [1:31100] TRUE FALSE FALSE FALSE FALSE FALSE ...
 $ approved                 : logi [1:31100] TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ reviewed                 : logi [1:31100] FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ reason                   : chr [1:31100] NA NA NA NA ...
 $ trip_comments            : chr [1:31100] NA NA NA NA ...
 $ species_comments         : chr [1:31100] NA NA NA NA ...
 - attr(*, "rollup")= logi TRUE</code></pre>
</div>
</div>
<p>The <code>observation_count</code> filed has a character data type, likely due to the ‘X’ values that are valid user inputs.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">observations_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb14-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(observation_count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'X'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb14-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(observation_count) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb14-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 2
# Groups:   observation_count [1]
  observation_count     n
  &lt;chr&gt;             &lt;int&gt;
1 X                   820</code></pre>
</div>
</div>
<p>Change it to integer so that we can do math with it. The ‘X’ values will become <code>NA</code> and we’ll have to be careful with it if doing math on <code>observation_count</code>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Change observation_count to integer</span></span>
<span id="cb16-2">observations_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>observation_count <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.integer</span>(observations_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>observation_count)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: NAs introduced by coercion</code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str</span>(sampling_events_df)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>tibble [1,451 × 31] (S3: tbl_df/tbl/data.frame)
 $ checklist_id             : chr [1:1451] "S176512162" "S175546547" "S179823549" "S179822252" ...
 $ last_edited_date         : chr [1:1451] "2024-05-23 08:03:00.886549" "2024-05-19 07:24:04.965131" "2024-06-06 21:01:29.107477" "2024-06-06 20:49:48.256139" ...
 $ country                  : chr [1:1451] "United States" "United States" "United States" "United States" ...
 $ country_code             : chr [1:1451] "US" "US" "US" "US" ...
 $ state                    : chr [1:1451] "Michigan" "Michigan" "Michigan" "Michigan" ...
 $ state_code               : chr [1:1451] "US-MI" "US-MI" "US-MI" "US-MI" ...
 $ county                   : chr [1:1451] "Manistee" "Iosco" "Huron" "Huron" ...
 $ county_code              : chr [1:1451] "US-MI-101" "US-MI-069" "US-MI-063" "US-MI-063" ...
 $ iba_code                 : chr [1:1451] NA NA NA NA ...
 $ bcr_code                 : int [1:1451] 12 12 NA NA 12 23 23 12 12 22 ...
 $ usfws_code               : chr [1:1451] NA NA NA NA ...
 $ atlas_block              : chr [1:1451] NA NA NA NA ...
 $ locality                 : chr [1:1451] "Arcadia Dunes Grasslands" "Sanctuary, Wilber Township, Iosco County" "Harbor Beach (general)" "Harbor Beach (general)" ...
 $ locality_id              : chr [1:1451] "L1000010" "L1000261" "L1002678" "L1002678" ...
 $ locality_type            : chr [1:1451] "H" "P" "H" "H" ...
 $ latitude                 : num [1:1451] 44.5 44.4 43.8 43.8 47.4 ...
 $ longitude                : num [1:1451] -86.2 -83.5 -82.6 -82.6 -88 ...
 $ observation_date         : Date[1:1451], format: "2024-05-19" "2024-05-19" ...
 $ time_observations_started: chr [1:1451] "11:00:00" "06:06:00" "16:15:00" "14:00:00" ...
 $ observer_id              : chr [1:1451] "obsr2398384" "obsr114260" "obsr201518" "obsr201518" ...
 $ sampling_event_identifier: chr [1:1451] "S176512162" "S175546547" "S179823549" "S179822252" ...
 $ protocol_type            : chr [1:1451] "Traveling" "Traveling" "Traveling" "Traveling" ...
 $ protocol_code            : chr [1:1451] "P22" "P22" "P22" "P22" ...
 $ project_code             : chr [1:1451] "EBIRD" "EBIRD" "EBIRD" "EBIRD" ...
 $ duration_minutes         : int [1:1451] 120 59 20 30 27 30 140 240 5 20 ...
 $ effort_distance_km       : num [1:1451] 19.312 0.647 4.828 6.437 0.833 ...
 $ effort_area_ha           : num [1:1451] NA NA NA NA NA NA NA NA NA NA ...
 $ number_observers         : int [1:1451] 2 1 1 1 1 1 1 2 1 1 ...
 $ all_species_reported     : logi [1:1451] TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ group_identifier         : chr [1:1451] NA NA NA NA ...
 $ trip_comments            : chr [1:1451] NA "Sunny, 69-61 F" NA NA ...
 - attr(*, "unique")= logi TRUE</code></pre>
</div>
</div>
<p>In addition, some important data preprocessing has been done. Shared checklists have been collapsed to avoid duplicate data (see <a href="https://ebird.github.io/ebird-best-practices/ebird.html#sec-ebird-import-shared" class="uri">https://ebird.github.io/ebird-best-practices/ebird.html#sec-ebird-import-shared</a>) and taxonomic rollups have been done to get observations to the species level (see <a href="https://ebird.github.io/ebird-best-practices/ebird.html#sec-ebird-import-rollup" class="uri">https://ebird.github.io/ebird-best-practices/ebird.html#sec-ebird-import-rollup</a>).</p>
<p>An important thing to remember here is that the <code>observations</code> dataframe is only <em>presence data</em>. Often this is sufficent for analysis but if our analysis involves knowing whether a certain species was detected or not for a given checklist, we need to <em>zero-fill</em> our data, creating what is known as <em>presence/absence</em> data.</p>
<p>The observations dataframe is denormalized in the sense of containing duplicate information from the sampling events dataframe such as trip duration and distance covered. This just makes it easier for analysis since it avoids having to join the two dataframes but comes at the expense of larger dataframe size.</p>
</section>
<section id="some-basic-queries" class="level3">
<h3 class="anchored" data-anchor-id="some-basic-queries">Some basic queries</h3>
<p>Let’s do a few basic queries with <code>dplyr</code>.</p>
<p><strong>How many checklists were submitted by each county?</strong></p>
<div class="cell">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">sampling_events_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(county) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(n)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10 × 2
# Groups:   county [10]
   county        n
   &lt;chr&gt;     &lt;int&gt;
 1 Iosco       153
 2 Washtenaw   101
 3 Oakland      65
 4 Kent         58
 5 Wayne        57
 6 Ottawa       50
 7 Chippewa     46
 8 Crawford     41
 9 Muskegon     38
10 Berrien      37</code></pre>
</div>
</div>
<p>If you know anything about Michigan, but aren’t a birder, you might be surprised that Iosco County had the most lists submitted even though it’s far from the Metro Detroit area and Michigan’s most populous counties. Of course, this data is for May 19, 2024 and that is the Saturday of the famous Tawas Spring Migration birding festival. Looks like almost every birder in Macomb county was in Tawas that day. :)</p>
<p><strong>Which species appear on the most checklists?</strong></p>
<div class="cell">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1">observations_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb22-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(common_name) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb22-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb22-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(n)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb22-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10 × 2
# Groups:   common_name [10]
   common_name              n
   &lt;chr&gt;                &lt;int&gt;
 1 American Robin        1010
 2 Red-winged Blackbird   869
 3 Blue Jay               749
 4 Song Sparrow           724
 5 Mourning Dove          678
 6 Northern Cardinal      666
 7 Common Grackle         617
 8 Canada Goose           598
 9 Yellow Warbler         598
10 Red-eyed Vireo         588</code></pre>
</div>
</div>
<p><strong>How about the distribution of trip duration?</strong></p>
<div class="cell">
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1">observations_df  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb24-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(protocol_type <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Traveling'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb24-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(duration_minutes) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb24-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>duration_minutes)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb24-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">binwidth =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">after_stat</span>(count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(count)))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb24-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_x_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">limits =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb24-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">limits =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">label_percent</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb24-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Duration [min]"</span>,</span>
<span id="cb24-9">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"% of eBird checklists"</span>,</span>
<span id="cb24-10">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Distribution of duration on eBird checklists"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: Removed 73 rows containing non-finite outside the scale range
(`stat_bin()`).</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).</code></pre>
</div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/ebird_duckdb/figure-html/unnamed-chunk-14-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Obviously, these are small dataframes and we can just use <code>dplyr</code> directly on R dataframes. However, our use case may involve much larger dataframes that cannot be held in memory. See <a href="https://ebird.github.io/ebird-best-practices/ebird.html#sec-ebird-zf">this section of the eBird Best Practices guide</a> for a discussion of zero-filling. It’s not really relevant for this post other than to point out that zero-filling leads to even bigger data tables.</p>
</section>
</section>
<section id="what-about-larger-datasets" class="level2">
<h2 class="anchored" data-anchor-id="what-about-larger-datasets">What about larger datasets?</h2>
<p>Depending on how much RAM you have and what kinds of questions you are exploring, it’s not too difficult to find yourself running out of memory when using <code>read_ebd()</code>. Often this manifests itself in an inglorious crashing of R Studio. You might be able to decompose your analysis into smaller chunks based on smaller dataframes and then put the pieces together. But, sometimes, it’s just really nice to be able to run queries against big datasets, especially when looking at trends over time. Thankfully, there are options.</p>
<section id="dbplyr-and-dbi" class="level3">
<h3 class="anchored" data-anchor-id="dbplyr-and-dbi">dbplyr and DBI</h3>
<p>The <a href="https://dbplyr.tidyverse.org/"><code>dbplyr</code> package</a> has been around for a long time and allows you to use <code>dplyr</code> with a database back end. You can treat remote tables as if they were local to R and write <code>dplyr</code> statements that are converted to SQL and sent off to the database for processing. Calls to <code>dplyr</code> are lazy in the sense that nothing really happens beyond SQL generation until you request bringing results back into R from the database. Interestingly, you don’t even have to do a <code>library(dbplyr)</code> (though, you must install it) as <code>dplyr</code> will use <code>dbplyr</code> automatically when it recognizes you are communicating with a database. This <a href="https://dbplyr.tidyverse.org/articles/dbplyr.html">getting started vignette</a> will get you going. The <code>dbplyr</code> package relies on the <code>DBI</code> (Database Interface) package and on specific DBI backends for the database you are using. The vignette has all the details. Whenever <code>dbplyr</code> is loaded, <code>DBI</code> gets loaded.</p>
<p>My original plan was to use <code>dbplyr</code> with a SQLite database as I wanted a simple, no-server, database. While SQLite is terrific, especially as an embedded database for transaction oriented applications, there’s another database that is even better suited for analytical applications such as ours.</p>
</section>
<section id="duckdb" class="level3">
<h3 class="anchored" data-anchor-id="duckdb">DuckDB</h3>
<p><a href="https://duckdb.org/">DuckDB</a> is a newish analysis focused database engine. It:</p>
<ul>
<li>is relational,</li>
<li>supports SQL,</li>
<li>does not require a server,</li>
<li>runs embedded in a host process,</li>
<li>has no external dependencies,</li>
<li>is portable,</li>
<li>is fast,</li>
<li>designed for analytical query workloads.</li>
</ul>
<p>They have really committed to working closely with R by creating a drop in replacement for <code>dplyr</code> called <code>duckplyr</code> (see <a href="https://duckdb.org/2024/04/02/duckplyr.html">their blog post</a>).</p>
<p>It positions itself as a replacement for things like <code>dbplyr</code> and using SQLite from R but is compatible with <code>dbplyr</code>. It uses the <code>DBI</code> API, which should make it easy to try out and adopt.</p>
<p>The main <code>duckdb</code> package takes a long time to install.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#install.packages("duckdb")</span></span></code></pre></div>
</div>
<p>Then install <code>duckplyr</code>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#install.packages("duckplyr")</span></span></code></pre></div>
</div>
<p>Start out by loading the <code>duckdb</code> library.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(duckdb)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Loading required package: DBI</code></pre>
</div>
</div>
<p>The <code>DBI</code> package is loaded automatically when <code>duckdb</code> is loaded.</p>
<p>Create an empty, writable, database.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1">con <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbConnect</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">duckdb</span>(), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dbdir =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/ebdsmp_2024519.duckdb"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">read_only =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span></code></pre></div>
</div>
<p>Now, we want to get the observation and sampling event data into the database. Since these are small dataframes, we could simply write them into the database via DBI functions.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbWriteTable</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"observations"</span>, observations_df, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overwrite =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb32-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbWriteTable</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sampling_events"</span>, sampling_events_df, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overwrite =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div>
</div>
<p>Let’s disconnect. Before disconnecting, notice there’s a <code>.wal</code> file with same base name as the <code>.duckdb</code> file. It’s gone after disconnecting.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbDisconnect</span>(con)</span>
<span id="cb33-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rm</span>(con)</span></code></pre></div>
</div>
</section>
</section>
<section id="querying-the-duckdb-database" class="level2">
<h2 class="anchored" data-anchor-id="querying-the-duckdb-database">Querying the DuckDB database</h2>
<p>Let’s reconnect to the database and then try to do some basic R things with the tables.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1">con <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbConnect</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">duckdb</span>(), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dbdir =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/ebdsmp_2024519.duckdb"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">read_only =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span></code></pre></div>
</div>
<p>Since DuckDB plays nice with <code>dbplyr</code>, we can use the <code>tbl()</code> function to get a dataframe-like object to use with <code>dplyr</code>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1">observations <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tbl</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'observations'</span>)</span>
<span id="cb35-2">sampling_events <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tbl</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sampling_events'</span>)</span></code></pre></div>
</div>
<p>Here is a simple select query.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb36-1">observations <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb36-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(checklist_id, common_name, observation_count) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb36-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># Source:   SQL [10 x 3]
# Database: DuckDB v0.10.2 [mark@Linux 6.5.0-41-generic:R 4.4.1//home/mark/Documents/projects/merlin/data/ebdsmp_2024519.duckdb]
   checklist_id common_name            observation_count
   &lt;chr&gt;        &lt;chr&gt;                              &lt;int&gt;
 1 G12405482    Barred Owl                             3
 2 G12405614    Red-winged Blackbird                   1
 3 G12405614    Canada Goose                          13
 4 G12405614    American Crow                          1
 5 G12405614    Ring-billed Gull                       9
 6 G12405614    Black-capped Chickadee                 1
 7 G12405614    Purple Martin                          5
 8 G12405614    Common Grackle                         9
 9 G12405614    Yellow Warbler                         1
10 G12405614    European Starling                      1</code></pre>
</div>
</div>
<p>The behavior of <code>dplyr</code> can vary depending on what R commands you are doing. In this query, we are using the local dataframe.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb38-1">observations_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb38-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(observation_count) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb38-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code> observation_count 
 Min.   :   1.000  
 1st Qu.:   1.000  
 Median :   2.000  
 Mean   :   4.295  
 3rd Qu.:   3.000  
 Max.   :2909.000  
 NA's   :820       </code></pre>
</div>
</div>
<p>In this one, we are using the remote database table.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb40-1">observations <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb40-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(observation_count) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb40-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>           Length Class                 Mode
src         2     src_duckdb_connection list
lazy_query 12     lazy_select_query     list</code></pre>
</div>
</div>
<p>We need to use <code>collect()</code> to actually force the query to run and bring back results.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb42-1">observations <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb42-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(observation_count) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb42-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb42-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code> observation_count 
 Min.   :   1.000  
 1st Qu.:   1.000  
 Median :   2.000  
 Mean   :   4.295  
 3rd Qu.:   3.000  
 Max.   :2909.000  
 NA's   :820       </code></pre>
</div>
</div>
<p>We can list the fields in a table like this.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb44-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbListFields</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'observations'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] "checklist_id"              "global_unique_identifier" 
 [3] "last_edited_date"          "taxonomic_order"          
 [5] "category"                  "taxon_concept_id"         
 [7] "common_name"               "scientific_name"          
 [9] "exotic_code"               "observation_count"        
[11] "breeding_code"             "breeding_category"        
[13] "behavior_code"             "age_sex"                  
[15] "country"                   "country_code"             
[17] "state"                     "state_code"               
[19] "county"                    "county_code"              
[21] "iba_code"                  "bcr_code"                 
[23] "usfws_code"                "atlas_block"              
[25] "locality"                  "locality_id"              
[27] "locality_type"             "latitude"                 
[29] "longitude"                 "observation_date"         
[31] "time_observations_started" "observer_id"              
[33] "sampling_event_identifier" "protocol_type"            
[35] "protocol_code"             "project_code"             
[37] "duration_minutes"          "effort_distance_km"       
[39] "effort_area_ha"            "number_observers"         
[41] "all_species_reported"      "group_identifier"         
[43] "has_media"                 "approved"                 
[45] "reviewed"                  "reason"                   
[47] "trip_comments"             "species_comments"         </code></pre>
</div>
</div>
<p>All right, let’s do a simple aggregate query and see which bird is most commonly listed.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb46" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb46-1">observations <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb46-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(common_name) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb46-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb46-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>()</span>
<span id="cb46-5">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb46-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(n))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># Source:     SQL [?? x 2]
# Database:   DuckDB v0.10.2 [mark@Linux 6.5.0-41-generic:R 4.4.1//home/mark/Documents/projects/merlin/data/ebdsmp_2024519.duckdb]
# Ordered by: desc(n)
   common_name              n
   &lt;chr&gt;                &lt;dbl&gt;
 1 American Robin        1010
 2 Red-winged Blackbird   869
 3 Blue Jay               749
 4 Song Sparrow           724
 5 Mourning Dove          678
 6 Northern Cardinal      666
 7 Common Grackle         617
 8 Canada Goose           598
 9 Yellow Warbler         598
10 Red-eyed Vireo         588
# ℹ more rows</code></pre>
</div>
</div>
<p>If you want to use the <code>duckplyr</code> engine instead of <code>dplyr</code>, just load the library. You’ll get a message about <code>duckplyr</code> methods overwriting the standard <code>dplyr</code> methods and what you can do to restore the <code>dplyr</code> versions.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb48" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb48-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(duckplyr)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>✔ Overwriting dplyr methods with duckplyr methods.
ℹ Turn off with `duckplyr::methods_restore()`.</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>
Attaching package: 'duckplyr'</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>The following objects are masked from 'package:stats':

    filter, lag</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union</code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb53" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb53-1">observations <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb53-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(common_name) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb53-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb53-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>()</span>
<span id="cb53-5">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb53-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(n))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># Source:     SQL [?? x 2]
# Database:   DuckDB v0.10.2 [mark@Linux 6.5.0-41-generic:R 4.4.1//home/mark/Documents/projects/merlin/data/ebdsmp_2024519.duckdb]
# Ordered by: desc(n)
   common_name              n
   &lt;chr&gt;                &lt;dbl&gt;
 1 American Robin        1010
 2 Red-winged Blackbird   869
 3 Blue Jay               749
 4 Song Sparrow           724
 5 Mourning Dove          678
 6 Northern Cardinal      666
 7 Common Grackle         617
 8 Canada Goose           598
 9 Yellow Warbler         598
10 Red-eyed Vireo         588
# ℹ more rows</code></pre>
</div>
</div>
<p>Hard to tell if there’s a performance difference since we only have ~30K records. Later we’ll do this with larger data tables.</p>
<p>To “turn off” <code>duckplyr</code>:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb55" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb55-1">duckplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">methods_restore</span>()</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>ℹ Restoring dplyr methods.</code></pre>
</div>
</div>
<p>Ok, so <code>dplyr</code> is fully capable of interacting with DuckDB database tables via <code>dbplyr</code> and its <code>tbl()</code> function. But, to use DuckDB’s query engine, we load <code>duckplyr</code> to replace <code>dplyr</code> and it creates the query recipe for DuckDB databases. Even without loading the <code>duckplyr</code> library, we can use its DuckDB engine by passing a regular dataframe to <code>duckplyr::as_duckplyr_df()</code>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb57" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb57-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbDisconnect</span>(con)</span>
<span id="cb57-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rm</span>(observations, sampling_events, ebdsmp, con)</span></code></pre></div>
</div>
<p>Time to try loading a whole lot of eBird data into a DuckDB database.</p>
</section>
<section id="reading-large-ebird-csv-files-into-duckdb-using-r" class="level2">
<h2 class="anchored" data-anchor-id="reading-large-ebird-csv-files-into-duckdb-using-r">Reading large eBird CSV files into DuckDB using R</h2>
<p>If we have giant CSV files, we don’t want to have to load them into an R dataframe and then use <code>DBI</code> to write them to the DuckDB database - the whole reason we might be using DuckDB is because the data won’t fit into memory.</p>
<p>So, let’s assume we have already done an <code>auk_filter</code> which has generated the observations and sampling events text files. These files are:</p>
<ul>
<li>tab delimited,</li>
<li>include all of the columns,</li>
<li>have capitalized column names with some including spaces.</li>
</ul>
<p>We can’t bypass R and directly read these into DuckDB using SQL (optionally with DuckDB’s <code>read_csv</code> function) for several reasons. If we need zero-filled observation data we we need to run <code>auk_zerofill</code>. However, the output of this will be an <code>auk_zerofill</code> object containing the two modified dataframes (or a single dataframe if <code>collapse = TRUE</code> is used). These dataframes will likely be too large for R to handle.</p>
<p>Similarly, it’s the <code>read_ebd()</code> and <code>read_sampling()</code> steps (both are <code>auk</code> functions) that do the:</p>
<ul>
<li>column name and data type cleanup</li>
<li>shared checklist collapsing</li>
<li>taxonomic rollup</li>
</ul>
<p>We do <strong>NOT</strong> want to store the raw (or filtered raw) data in a DuckDB database since we want to query data that has been properly had the checklists collapsed and the taxonomic rollup done.</p>
<p>I guess we could bypass R if we rewrote the logic in SQL for zero filling, shared checklist collapsing, taxonomic rollup and column name and datatype changes. That doesn’t sound super fun to do or maintain but is certainly doable.</p>
<p>So, it seems like a viable approach is to break up the eBird downloaded text files into a series of smaller text files and use <code>DBI</code> to append the data to the appropriate tables in the DuckDB database. This will work but there are some challenges.</p>
<section id="creating-and-populating-a-duckdb-database" class="level3">
<h3 class="anchored" data-anchor-id="creating-and-populating-a-duckdb-database">Creating and populating a DuckDB database</h3>
<p>After downloading data from eBird, we have two text files. The first file contains the observation data (one record per species per checklist) and is about 12.3Gb. It includes all checklist observations from the state of Michigan. The second file contains the sampling event data (one record per checklist) and is about 650Mb. We are not going to be able to do a <code>read_ebd()</code> on a 12Gb file (at least I’m not able to on my laptop).</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb58" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb58-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set the filenames for the checklist and sample data</span></span>
<span id="cb58-2">f_ebd <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/ebd_US-MI_smp_relMay-2024/ebd_US-MI_smp_relMay-2024.txt'</span></span>
<span id="cb58-3">f_smp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/ebd_US-MI_smp_relMay-2024/ebd_US-MI_smp_relMay-2024_sampling.txt'</span></span>
<span id="cb58-4"></span>
<span id="cb58-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create ebd object</span></span>
<span id="cb58-6">ebdsmp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">auk_ebd</span>(f_ebd, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file_sampling =</span> f_smp)</span></code></pre></div>
</div>
<p>Our basic strategy is to grab one month of data at a time using <code>auk</code> and output it as separate monthly text files. As part of our <code>auk</code> filtering we will only include complete checklists. Other than that, the resulting text files have the same format as the EDB downloaded data. We are using the EBD dataset that was released in June of 2024 and contains data up through May, 2024. The files are being written to the <code>data/ebd_US-MI_chunked/</code> directory.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb59" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb59-1">rerun <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set to TRUE to rerun this code</span></span>
<span id="cb59-2">start_year <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2015</span></span>
<span id="cb59-3">end_year <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2024</span></span>
<span id="cb59-4">end_month <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb59-5"></span>
<span id="cb59-6"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> (rerun) {</span>
<span id="cb59-7">  list_years <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(start_year, end_year)</span>
<span id="cb59-8">  </span>
<span id="cb59-9">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (list_year <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> list_years) {</span>
<span id="cb59-10">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (list_month <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>)){</span>
<span id="cb59-11">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> (list_year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> end_year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> list_month <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> end_month) {</span>
<span id="cb59-12">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create data range to use in auk filter</span></span>
<span id="cb59-13">        start_date <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">make_date</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">year=</span>list_year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">month=</span>list_month, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">day=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>L)</span>
<span id="cb59-14">        end_date <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ceiling_date</span>(start_date, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">unit=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"month"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb59-15">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(start_date, end_date))</span>
<span id="cb59-16"></span>
<span id="cb59-17">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create output filenames based on month and year</span></span>
<span id="cb59-18">        file_ebd <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/ebd_US-MI_chunked'</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ebd_filtered_MI_'</span>, list_year, list_month, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'.txt'</span>))</span>
<span id="cb59-19">        file_sampling <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/ebd_US-MI_chunked'</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sampling_filtered_MI_'</span>, list_year, list_month, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'.txt'</span>))</span>
<span id="cb59-20">        </span>
<span id="cb59-21">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Do the auk filtering</span></span>
<span id="cb59-22">        ebdsmp <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb59-23">          <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># country: codes and names can be mixed; case insensitive</span></span>
<span id="cb59-24">          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">auk_state</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">state =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"US-MI"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb59-25">          <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># date: use standard ISO date format `"YYYY-MM-DD"` with * for wildcard</span></span>
<span id="cb59-26">          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">auk_date</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(start_date, end_date)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb59-27">          <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># complete: all species seen or heard are recorded (this field is in ebd)</span></span>
<span id="cb59-28">          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">auk_complete</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb59-29">          <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># This final step triggers the actual file processing output writing</span></span>
<span id="cb59-30">          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">auk_filter</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> file_ebd, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file_sampling =</span> file_sampling, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overwrite =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb59-31">      }</span>
<span id="cb59-32">    }</span>
<span id="cb59-33">  }  </span>
<span id="cb59-34">}</span></code></pre></div>
</div>
<p>Now we want to load each monthly text file into a DuckDB database. The observation files will get appended to an <code>observations</code> table and the sampling event files to a <code>sampling_events</code> table. As of now, these tables don’t exist - actually the entire database doesn’t exist. An easy way to create the tables is to just use <code>dbWriteTable()</code> to load small dataframes to a newly created empty database. In this way, the dataframe acts as a template for the column names and data types. This works as long as you use a data sample file that does not trigger any import parsing related warnings when read with <code>read_ebd()</code> or <code>read_sampling()</code>. So, we’ll use the existing <code>observations_df</code> and <code>sampling_events_df</code> dataframes that we created earlier based on the 2024-05-19 checklist data.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb60" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb60-1">rerun <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># </span><span class="al" style="color: #AD0000;
background-color: null;
font-style: inherit;">WARNING</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: If you rerun this chunk, you will delete any data already in the database</span></span>
<span id="cb60-2"></span>
<span id="cb60-3">con <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbConnect</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">duckdb</span>(), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dbdir =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/ebdsmp_MI_relMay2024.duckdb"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">read_only =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb60-4"></span>
<span id="cb60-5"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> (rerun) {</span>
<span id="cb60-6"></span>
<span id="cb60-7">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Filenames of text files to use as table templates</span></span>
<span id="cb60-8">  file_ebd <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/ebd_20240519_MI.txt"</span></span>
<span id="cb60-9">  file_sampling <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/smp_20240519_MI.txt"</span></span>
<span id="cb60-10">  </span>
<span id="cb60-11">  sampling_events_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_sampling</span>(file_sampling, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">unique =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb60-12">  DBI<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbCreateTable</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sampling_events'</span>, sampling_events_df)</span>
<span id="cb60-13">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Remove the local sampling events dataframe</span></span>
<span id="cb60-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rm</span>(sampling_events_df)</span>
<span id="cb60-15">  </span>
<span id="cb60-16">  observations_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_ebd</span>(file_ebd, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">unique =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb60-17">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Change observation_count to integer</span></span>
<span id="cb60-18">  observations_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>observation_count <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.integer</span>(observations_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>observation_count)</span>
<span id="cb60-19">  DBI<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbCreateTable</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'observations'</span>, observations_df)</span>
<span id="cb60-20">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Remove the local observations dataframe</span></span>
<span id="cb60-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rm</span>(observations_df)</span>
<span id="cb60-22">  </span>
<span id="cb60-23">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Delete all records from the two tables</span></span>
<span id="cb60-24">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbExecute</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"DELETE FROM observations"</span>)</span>
<span id="cb60-25">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbExecute</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"DELETE FROM sampling_events"</span>)</span>
<span id="cb60-26">}</span></code></pre></div>
</div>
<p>At this point we have a new DuckDB database with two empty tables that are ready to receive data. Use <code>DBI::dbColumnInfo()</code> to see table structures. Make sure you include the <code>LIMIT</code> clause if you attempt to do this after loading all of your data. Otherwise, you are inviting an R Studio crash.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb61" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb61-1">rs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbSendQuery</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"SELECT * FROM sampling_events LIMIT 10"</span>)</span>
<span id="cb61-2">DBI<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbColumnInfo</span>(rs)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>                        name      type
1               checklist_id character
2           last_edited_date character
3                    country character
4               country_code character
5                      state character
6                 state_code character
7                     county character
8                county_code character
9                   iba_code character
10                  bcr_code   integer
11                usfws_code character
12               atlas_block character
13                  locality character
14               locality_id character
15             locality_type character
16                  latitude   numeric
17                 longitude   numeric
18          observation_date      Date
19 time_observations_started character
20               observer_id character
21 sampling_event_identifier character
22             protocol_type character
23             protocol_code character
24              project_code character
25          duration_minutes   integer
26        effort_distance_km   numeric
27            effort_area_ha   numeric
28          number_observers   integer
29      all_species_reported   logical
30          group_identifier character
31             trip_comments character</code></pre>
</div>
</div>
<p>Alright, let’s load all of our data! This next step is quite time consuming.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb63" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb63-1">rerun <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set to TRUE to rerun this code</span></span>
<span id="cb63-2">start_year <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2015</span></span>
<span id="cb63-3">end_year <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2024</span></span>
<span id="cb63-4">end_month <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb63-5"></span>
<span id="cb63-6"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> (rerun) {</span>
<span id="cb63-7">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make sure tables are empty before we begin</span></span>
<span id="cb63-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbExecute</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"DELETE FROM observations"</span>)</span>
<span id="cb63-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbExecute</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"DELETE FROM sampling_events"</span>)</span>
<span id="cb63-10">  </span>
<span id="cb63-11">  list_years <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(start_year, end_year)</span>
<span id="cb63-12">  list_months <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>)</span>
<span id="cb63-13">  </span>
<span id="cb63-14">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (list_year <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> list_years) {</span>
<span id="cb63-15">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (list_month <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> list_months) {</span>
<span id="cb63-16">      </span>
<span id="cb63-17">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create filenames of source text files</span></span>
<span id="cb63-18">      file_ebd <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/ebd_US-MI_chunked'</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ebd_filtered_MI_'</span>, list_year, list_month, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'.txt'</span>))</span>
<span id="cb63-19">      file_sampling <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/ebd_US-MI_chunked'</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sampling_filtered_MI_'</span>, list_year, list_month, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'.txt'</span>))</span>
<span id="cb63-20">      </span>
<span id="cb63-21">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> (list_year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> end_year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> list_month <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> end_month) {</span>
<span id="cb63-22">        start_date <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">make_date</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">year=</span>list_year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">month=</span>list_month, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">day=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>L)</span>
<span id="cb63-23">        end_date <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ceiling_date</span>(start_date, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">unit=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"month"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb63-24">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(start_date, end_date))</span>
<span id="cb63-25">        </span>
<span id="cb63-26">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create and append dataframe to the sampling_events table</span></span>
<span id="cb63-27">        sampling_events_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_sampling</span>(file_sampling, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">unique =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb63-28">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbAppendTable</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sampling_events"</span>, sampling_events_df)</span>
<span id="cb63-29">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rm</span>(sampling_events_df)</span>
<span id="cb63-30">        </span>
<span id="cb63-31">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create and append dataframe to the observations table</span></span>
<span id="cb63-32">        observations_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_ebd</span>(file_ebd, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">unique =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb63-33">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Change observation_count to integer</span></span>
<span id="cb63-34">        observations_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>observation_count <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.integer</span>(observations_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>observation_count)</span>
<span id="cb63-35">        </span>
<span id="cb63-36">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbAppendTable</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"observations"</span>, observations_df)</span>
<span id="cb63-37">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rm</span>(observations_df)</span>
<span id="cb63-38">    }</span>
<span id="cb63-39">  }  </span>
<span id="cb63-40">  }  </span>
<span id="cb63-41">  </span>
<span id="cb63-42"></span>
<span id="cb63-43">}</span></code></pre></div>
</div>
<p>The populated database is ~1.3Gb.</p>
</section>
</section>
<section id="analyzing-ebird-data-in-the-duckdb-database" class="level2">
<h2 class="anchored" data-anchor-id="analyzing-ebird-data-in-the-duckdb-database">Analyzing eBird data in the DuckDB database</h2>
<p>Let’s see how many total records we have.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb64" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb64-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbGetQuery</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"SELECT COUNT(*) FROM sampling_events"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>  count_star()
1      1529700</code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb66" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb66-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbGetQuery</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"SELECT COUNT(*) FROM observations"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>  count_star()
1     21664757</code></pre>
</div>
</div>
<section id="a-subtle-problem-with-duplicate-checklist_ids" class="level3">
<h3 class="anchored" data-anchor-id="a-subtle-problem-with-duplicate-checklist_ids">A subtle problem with duplicate <code>checklist_id</code>’s</h3>
<p>Through using <code>auk::read_ebd()</code> and <code>auk::read_sampling()</code>, important preprocessing was done such as collapsing shared checklists and rolling up observations to the species level. One might think that we wouldn’t have to worry about duplicate <code>checklist_id</code> values. Well, we do.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb68" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb68-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbGetQuery</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"SELECT checklist_id </span></span>
<span id="cb68-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">           FROM sampling_events </span></span>
<span id="cb68-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">           GROUP BY checklist_id</span></span>
<span id="cb68-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">           HAVING COUNT(checklist_id) &gt; 1"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>   checklist_id
1      G1475849
2      G7107571
3     G10713331
4      G4962093
5      G5729899
6      G2040552
7      G4385780
8      G8521612
9     G10019302
10     G7107956
11     G7107583</code></pre>
</div>
</div>
<p>Turns out we have eleven duplicate <code>checklist_id</code> values and they are all shared checklists. What? I was surprised as well when I found this, but it wasn’t too hard to figure out what’s going on. Let’s look at one the duplicate checklists.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb70" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb70-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbGetQuery</span>(con,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"SELECT checklist_id, observation_date, locality_id, sampling_event_identifier, observer_ID FROM sampling_events WHERE checklist_id = 'G7107571';"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>  checklist_id observation_date locality_id sampling_event_identifier
1     G7107571       2021-06-27    L2123031                 S93447285
2     G7107571       2021-07-05    L2123031                 S93447235
  observer_id
1  obsr123074
2  obsr351872</code></pre>
</div>
</div>
<p>A few things to note:</p>
<ul>
<li>the <code>observation_date</code> values are different, <strong>AND</strong> they span a month boundary,</li>
<li>the <code>observer_id</code> values are different,</li>
<li>neither <code>sampling_event_identifier</code> contains the comma concatenated list of identifiers.</li>
</ul>
<p>We might infer that these two individuals were birding together and one of them entered an incorrect date into eBird. That actually seems like it could be a pretty common occurrence, yet we only got eleven such duplicates out of millions of records for Michigan. The issue is caused by our “chunking” of the giant download file into monthly pieces so as to be able to load it via R into the DuckDB database. Had we chunked the data by year, instead of by month and year, this duplicate <code>checklist_id</code> would not have occurred. The <code>auk::read_sampling()</code> function would have collapsed them into a single records using some sort of logic to decide which record to keep. Of course, even chunking by year doesn’t guarantee no duplicate <code>checklist_id</code> values.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb72" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb72-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbGetQuery</span>(con,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"SELECT checklist_id, group_identifier, observation_date, sampling_event_identifier, observer_ID, last_edited_date FROM sampling_events WHERE checklist_id = 'G1475849';"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>  checklist_id group_identifier observation_date sampling_event_identifier
1     G1475849         G1475849       2015-11-20       S25919906,S25968002
2     G1475849         G1475849       2016-02-25                 S27899045
            observer_id           last_edited_date
1 obsr653314,obsr428042 2021-03-26 03:17:06.584504
2            obsr653323 2021-03-26 03:17:06.584504</code></pre>
</div>
</div>
<p>Again, I’m guessing we had a group of three birders, one of whom put in the wrong date (by a lot), yet somehow the <code>last_edited_date</code> values are identical (and several years later). Maybe.</p>
<p>I guess we need to dig into the actual R code in <code>auk</code> to see how such cases are handled. The source file we need to explore is <a href="https://github.com/CornellLabofOrnithology/auk/blob/main/R/auk-unique.r">auk-unique.r</a>. It appears that the “tiebreaker” is whichever record has the lowest <code>sampling_event_identifier</code> value. Regardless, we are in a bit of pickle as we can’t simply use <code>read_sampling()</code> and <code>read_ebd()</code> on our 12Gb file. But, if we chunk the data, we can’t ensure no duplicate <code>checklist_id</code> values. Furthermore, these are just the records we caught due to our chunking. No idea how many such duplicate records with inconsistent observation dates there are.</p>
<p>We will just do something consistent with the current <code>auk_unique()</code> function and keep the record with the lowest <code>sampling_event_identifier</code> value.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb74" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb74-1">dup_checklist_ids <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbGetQuery</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"SELECT checklist_id, min(sampling_event_identifier) as sei</span></span>
<span id="cb74-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">           FROM sampling_events </span></span>
<span id="cb74-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">           GROUP BY checklist_id</span></span>
<span id="cb74-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">           HAVING COUNT(checklist_id) &gt; 1"</span>)</span>
<span id="cb74-5"></span>
<span id="cb74-6">dup_checklist_ids</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>   checklist_id                 sei
1      G4962093           S64297187
2      G5729899 S73905868,S73907623
3      G4385780           S58370794
4      G7107583 S93447555,S93459182
5      G2040552           S32581769
6      G8521612          S112297874
7      G7107571           S93447235
8      G1475849 S25919906,S25968002
9     G10713331          S147457082
10     G7107956 S93453425,S93459174
11    G10019302          S134105819</code></pre>
</div>
</div>
<p>Now we want to delete rows that match the each <code>checklist_id</code> but do not match the <code>sampling_event_identifier</code>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb76" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb76-1"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span>(i <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(dup_checklist_ids)) {</span>
<span id="cb76-2">    checklist_id <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> dup_checklist_ids[i,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb76-3">    sei <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> dup_checklist_ids[i,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]</span>
<span id="cb76-4"></span>
<span id="cb76-5">    sql_del_sampling <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"DELETE FROM sampling_events WHERE checklist_id == '"</span>, checklist_id, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"' and sampling_event_identifier != '"</span>, sei, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"';"</span>)</span>
<span id="cb76-6">    </span>
<span id="cb76-7">    sql_del_obs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"DELETE FROM observations WHERE checklist_id == '"</span>, checklist_id, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"' and sampling_event_identifier != '"</span>, sei, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"';"</span>)</span>
<span id="cb76-8">    </span>
<span id="cb76-9">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(sql_del_sampling)</span>
<span id="cb76-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbExecute</span>(con, sql_del_sampling)</span>
<span id="cb76-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(sql_del_obs)</span>
<span id="cb76-12">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbExecute</span>(con, sql_del_obs)</span>
<span id="cb76-13">}</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "DELETE FROM sampling_events WHERE checklist_id == 'G4962093' and sampling_event_identifier != 'S64297187';"
[1] "DELETE FROM observations WHERE checklist_id == 'G4962093' and sampling_event_identifier != 'S64297187';"
[1] "DELETE FROM sampling_events WHERE checklist_id == 'G5729899' and sampling_event_identifier != 'S73905868,S73907623';"
[1] "DELETE FROM observations WHERE checklist_id == 'G5729899' and sampling_event_identifier != 'S73905868,S73907623';"
[1] "DELETE FROM sampling_events WHERE checklist_id == 'G4385780' and sampling_event_identifier != 'S58370794';"
[1] "DELETE FROM observations WHERE checklist_id == 'G4385780' and sampling_event_identifier != 'S58370794';"
[1] "DELETE FROM sampling_events WHERE checklist_id == 'G7107583' and sampling_event_identifier != 'S93447555,S93459182';"
[1] "DELETE FROM observations WHERE checklist_id == 'G7107583' and sampling_event_identifier != 'S93447555,S93459182';"
[1] "DELETE FROM sampling_events WHERE checklist_id == 'G2040552' and sampling_event_identifier != 'S32581769';"
[1] "DELETE FROM observations WHERE checklist_id == 'G2040552' and sampling_event_identifier != 'S32581769';"
[1] "DELETE FROM sampling_events WHERE checklist_id == 'G8521612' and sampling_event_identifier != 'S112297874';"
[1] "DELETE FROM observations WHERE checklist_id == 'G8521612' and sampling_event_identifier != 'S112297874';"
[1] "DELETE FROM sampling_events WHERE checklist_id == 'G7107571' and sampling_event_identifier != 'S93447235';"
[1] "DELETE FROM observations WHERE checklist_id == 'G7107571' and sampling_event_identifier != 'S93447235';"
[1] "DELETE FROM sampling_events WHERE checklist_id == 'G1475849' and sampling_event_identifier != 'S25919906,S25968002';"
[1] "DELETE FROM observations WHERE checklist_id == 'G1475849' and sampling_event_identifier != 'S25919906,S25968002';"
[1] "DELETE FROM sampling_events WHERE checklist_id == 'G10713331' and sampling_event_identifier != 'S147457082';"
[1] "DELETE FROM observations WHERE checklist_id == 'G10713331' and sampling_event_identifier != 'S147457082';"
[1] "DELETE FROM sampling_events WHERE checklist_id == 'G7107956' and sampling_event_identifier != 'S93453425,S93459174';"
[1] "DELETE FROM observations WHERE checklist_id == 'G7107956' and sampling_event_identifier != 'S93453425,S93459174';"
[1] "DELETE FROM sampling_events WHERE checklist_id == 'G10019302' and sampling_event_identifier != 'S134105819';"
[1] "DELETE FROM observations WHERE checklist_id == 'G10019302' and sampling_event_identifier != 'S134105819';"</code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb78" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb78-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbGetQuery</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"SELECT COUNT(*) FROM sampling_events"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>  count_star()
1      1529700</code></pre>
</div>
</div>
<p>Yep, this is 11 records less than we had before. This next query should return no rows.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb80" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb80-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbGetQuery</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"SELECT checklist_id </span></span>
<span id="cb80-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">           FROM sampling_events </span></span>
<span id="cb80-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">           GROUP BY checklist_id</span></span>
<span id="cb80-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">           HAVING COUNT(checklist_id) &gt; 1"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] checklist_id
&lt;0 rows&gt; (or 0-length row.names)</code></pre>
</div>
</div>
</section>
<section id="using-dbplyr-and-dplyr" class="level3">
<h3 class="anchored" data-anchor-id="using-dbplyr-and-dplyr">Using dbplyr and dplyr</h3>
<p>Now let’s do some querying. We’ll start by using <code>dplyr</code>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb82" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb82-1">observations <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tbl</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'observations'</span>)</span>
<span id="cb82-2">sampling_events <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tbl</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sampling_events'</span>)</span></code></pre></div>
</div>
<p>Printing <code>observations</code> should just result in a few rows returned.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb83" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb83-1">observations</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># Source:   table&lt;observations&gt; [?? x 48]
# Database: DuckDB v0.10.2 [mark@Linux 6.5.0-41-generic:R 4.4.1//home/mark/Documents/projects/merlin/data/ebdsmp_MI_relMay2024.duckdb]
   checklist_id global_unique_identi…¹ last_edited_date taxonomic_order category
   &lt;chr&gt;        &lt;chr&gt;                  &lt;chr&gt;                      &lt;dbl&gt; &lt;chr&gt;   
 1 G1088065     URN:CornellLabOfOrnit… 2024-01-11 12:1…             324 species 
 2 G1088065     URN:CornellLabOfOrnit… 2024-01-11 12:1…            8312 species 
 3 G1088065     URN:CornellLabOfOrnit… 2024-01-11 12:1…           33967 species 
 4 G1088065     URN:CornellLabOfOrnit… 2024-01-11 12:1…           21168 species 
 5 G1088065     URN:CornellLabOfOrnit… 2024-01-11 12:1…           11175 species 
 6 G1088065     URN:CornellLabOfOrnit… 2024-01-11 12:1…           11527 species 
 7 G1088065     URN:CornellLabOfOrnit… 2024-01-11 12:1…           31790 species 
 8 G1088065     URN:CornellLabOfOrnit… 2024-01-11 12:1…           32595 species 
 9 G1088065     URN:CornellLabOfOrnit… 2024-01-11 12:1…            6479 species 
10 G1088065     URN:CornellLabOfOrnit… 2024-01-11 12:1…           10979 species 
# ℹ more rows
# ℹ abbreviated name: ¹​global_unique_identifier
# ℹ 43 more variables: taxon_concept_id &lt;chr&gt;, common_name &lt;chr&gt;,
#   scientific_name &lt;chr&gt;, exotic_code &lt;chr&gt;, observation_count &lt;int&gt;,
#   breeding_code &lt;chr&gt;, breeding_category &lt;chr&gt;, behavior_code &lt;chr&gt;,
#   age_sex &lt;chr&gt;, country &lt;chr&gt;, country_code &lt;chr&gt;, state &lt;chr&gt;,
#   state_code &lt;chr&gt;, county &lt;chr&gt;, county_code &lt;chr&gt;, iba_code &lt;chr&gt;, …</code></pre>
</div>
</div>
<p>Let’s redo our common species query.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb85" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb85-1">common_species <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> observations <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb85-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(common_name) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb85-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb85-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>()</span>
<span id="cb85-5">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb85-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(n))</span></code></pre></div>
</div>
<p>We can see the SQL generated by <code>dbplyr</code>:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb86" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb86-1">common_species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">show_query</span>()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;SQL&gt;
SELECT common_name, COUNT(*) AS n
FROM observations
GROUP BY common_name
ORDER BY n DESC</code></pre>
</div>
</div>
<p>Use <code>collect()</code> to actually run the query and bring back results.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb88" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb88-1">common_species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect</span>() <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Making sure not using duckplyr::collect()</span></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 431 × 2
   common_name                 n
   &lt;chr&gt;                   &lt;dbl&gt;
 1 Blue Jay               723625
 2 Black-capped Chickadee 714290
 3 American Robin         701454
 4 Northern Cardinal      673327
 5 Mourning Dove          607148
 6 American Crow          587172
 7 American Goldfinch     568616
 8 Red-winged Blackbird   560814
 9 Canada Goose           543752
10 Downy Woodpecker       527761
# ℹ 421 more rows</code></pre>
</div>
</div>
<p>What about common date functions? There are multiple ways to work with dates and times:</p>
<ul>
<li>many <code>lubridate</code> functions are supported by <code>dbplyr</code> and <code>duckplyr</code>,</li>
<li>we can use the <code>sql()</code> function to wrap SQL bits to pass to DuckDB (which supports a bunch of datetime functions.)</li>
</ul>
<p>For example, here’s what <code>dbplyr</code> does with the <code>lubridate::year()</code> function.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb90" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb90-1">dbplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">translate_sql</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">year</span>(observation_date), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">con =</span> con)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;SQL&gt; EXTRACT(year FROM observation_date)</code></pre>
</div>
</div>
<p>Let’s extend our common species query to group by year. A good approach seems to be to use <code>mutate</code> to compute the column you want to group on. Not only does this provide an easy way to name the computed column but I’ve also had trouble using <code>lubridate</code> functions directly in <code>group by</code> statements.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb92" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb92-1">common_species_yearly <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> observations <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb92-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ebdyear =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">year</span>(observation_date)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb92-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(ebdyear, common_name) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb92-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb92-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>()</span>
<span id="cb92-6">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb92-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(ebdyear, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(n))</span>
<span id="cb92-8"></span>
<span id="cb92-9">common_species_yearly <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">show_query</span>()</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>`summarise()` has grouped output by "ebdyear". You can override using the
`.groups` argument.</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;SQL&gt;
SELECT ebdyear, common_name, COUNT(*) AS n
FROM (
  SELECT observations.*, EXTRACT(year FROM observation_date) AS ebdyear
  FROM observations
) q01
GROUP BY ebdyear, common_name
ORDER BY ebdyear, n DESC</code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb95" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb95-1">common_species_yearly <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect</span>()</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>`summarise()` has grouped output by "ebdyear". You can override using the
`.groups` argument.</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 3,512 × 3
# Groups:   ebdyear [10]
   ebdyear common_name                n
     &lt;dbl&gt; &lt;chr&gt;                  &lt;dbl&gt;
 1    2015 Black-capped Chickadee 34584
 2    2015 Blue Jay               33924
 3    2015 American Robin         32854
 4    2015 American Crow          31872
 5    2015 Northern Cardinal      30726
 6    2015 American Goldfinch     28319
 7    2015 Mourning Dove          27910
 8    2015 Canada Goose           27614
 9    2015 Mallard                27196
10    2015 Red-winged Blackbird   26268
# ℹ 3,502 more rows</code></pre>
</div>
</div>
<p>Let’s try a similar approach but use <a href="https://duckdb.org/docs/sql/functions/date.html">DuckDB’s <code>datepart</code> function</a>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb98" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb98-1">common_species_yearly <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> observations <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb98-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ebdyear =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sql</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"datepart('year', observation_date)"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb98-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(ebdyear, common_name) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb98-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb98-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>()</span>
<span id="cb98-6">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb98-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(ebdyear, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(n))</span>
<span id="cb98-8"></span>
<span id="cb98-9">common_species_yearly <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">show_query</span>()</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>`summarise()` has grouped output by "ebdyear". You can override using the
`.groups` argument.</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;SQL&gt;
SELECT ebdyear, common_name, COUNT(*) AS n
FROM (
  SELECT observations.*, datepart('year', observation_date) AS ebdyear
  FROM observations
) q01
GROUP BY ebdyear, common_name
ORDER BY ebdyear, n DESC</code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb101" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb101-1">common_species_yearly <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect</span>()</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>`summarise()` has grouped output by "ebdyear". You can override using the
`.groups` argument.</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 3,512 × 3
# Groups:   ebdyear [10]
   ebdyear common_name                n
     &lt;dbl&gt; &lt;chr&gt;                  &lt;dbl&gt;
 1    2015 Black-capped Chickadee 34584
 2    2015 Blue Jay               33924
 3    2015 American Robin         32854
 4    2015 American Crow          31872
 5    2015 Northern Cardinal      30726
 6    2015 American Goldfinch     28319
 7    2015 Mourning Dove          27910
 8    2015 Canada Goose           27614
 9    2015 Mallard                27196
10    2015 Red-winged Blackbird   26268
# ℹ 3,502 more rows</code></pre>
</div>
</div>
<p>Does <code>duckplyr</code> support <code>lubridate</code> directly?</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb104" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb104-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(duckplyr)</span></code></pre></div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb105" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb105-1">common_species_yearly_duckplyr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> observations <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb105-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ebdyear =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">year</span>(observation_date)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb105-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(ebdyear, common_name) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb105-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb105-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>()</span>
<span id="cb105-6">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb105-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(ebdyear, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(n))</span>
<span id="cb105-8"></span>
<span id="cb105-9">common_species_yearly_duckplyr <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">show_query</span>()</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>`summarise()` has grouped output by "ebdyear". You can override using the
`.groups` argument.</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;SQL&gt;
SELECT ebdyear, common_name, COUNT(*) AS n
FROM (
  SELECT observations.*, EXTRACT(year FROM observation_date) AS ebdyear
  FROM observations
) q01
GROUP BY ebdyear, common_name
ORDER BY ebdyear, n DESC</code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb108" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb108-1">common_species_yearly_duckplyr <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect</span>()</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>`summarise()` has grouped output by "ebdyear". You can override using the
`.groups` argument.</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 3,512 × 3
# Groups:   ebdyear [10]
   ebdyear common_name                n
     &lt;dbl&gt; &lt;chr&gt;                  &lt;dbl&gt;
 1    2015 Black-capped Chickadee 34584
 2    2015 Blue Jay               33924
 3    2015 American Robin         32854
 4    2015 American Crow          31872
 5    2015 Northern Cardinal      30726
 6    2015 American Goldfinch     28319
 7    2015 Mourning Dove          27910
 8    2015 Canada Goose           27614
 9    2015 Mallard                27196
10    2015 Red-winged Blackbird   26268
# ℹ 3,502 more rows</code></pre>
</div>
</div>
<p>Yes, it does!</p>
<p>Let’s finish with a summary of birding effort by year for the month of May in Iosco County.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb111" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb111-1">sampling_events <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb111-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ebdyear =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">year</span>(observation_date), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ebdmonth =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">month</span>(observation_date)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb111-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(ebdmonth <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, county <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Iosco'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb111-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(ebdyear) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb111-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb111-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean_distance =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(effort_distance_km),</span>
<span id="cb111-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean_duration =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(duration_minutes),</span>
<span id="cb111-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">num_checklists =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>()</span>
<span id="cb111-9">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb111-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect</span>()</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: Missing values are always removed in SQL aggregation functions.
Use `na.rm = TRUE` to silence this warning
This warning is displayed once every 8 hours.</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10 × 4
   ebdyear mean_distance mean_duration num_checklists
     &lt;dbl&gt;         &lt;dbl&gt;         &lt;dbl&gt;          &lt;dbl&gt;
 1    2015          3.60          135.            672
 2    2016          3.36          130.            952
 3    2017          2.93          133.            969
 4    2018          3.65          127.           1201
 5    2019          3.46          131.           1365
 6    2020          4.01          125.            701
 7    2021          3.52          125.           1673
 8    2022          3.23          123.           1667
 9    2023          3.28          121.           1772
10    2024          3.42          122.           1960</code></pre>
</div>
</div>
<p>Now that we’ve got our Michigan data in a DuckDB database, it should be easier to do more analysis. Future blogs posts are planned on this topic.</p>
</section>
<section id="some-duckdb-resources" class="level3">
<h3 class="anchored" data-anchor-id="some-duckdb-resources">Some DuckDB Resources</h3>
<p>Here are a few resources for learning more about DuckDB and <code>duckplyr</code>.</p>
<ul>
<li><a href="https://github.com/davidgasquez/awesome-duckdb?tab=readme-ov-file">Awesome DuckDB</a></li>
<li><a href="https://bwlewis.github.io/duckdb_and_r/thoughts_on_duckdb.html">DuckDB and R: Five Easy Pieces</a></li>
<li><a href="https://duckdb.org/2024/04/02/duckplyr.html">duckplyr: dplyr Powered by DuckDB</a></li>
</ul>


</section>
</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{isken2024,
  author = {Isken, Mark},
  title = {Create an {eBird} {DuckDB} Database for Use with {R}},
  date = {2024-06-24},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-isken2024" class="csl-entry quarto-appendix-citeas">
Isken, Mark. 2024. <span>“Create an eBird DuckDB Database for Use with
R.”</span> June 24, 2024.
</div></div></section></div> ]]></description>
  <category>r</category>
  <category>ebird</category>
  <category>birding</category>
  <category>database</category>
  <category>dplyr</category>
  <guid>https://bitsofanalytics.org/posts/ebird_duckdb/duckdb_create_database.html</guid>
  <pubDate>Mon, 24 Jun 2024 04:00:00 GMT</pubDate>
</item>
<item>
  <title>A vector-raster river map, three ways - Part 3: R</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/river_map_r/river_map_r.html</link>
  <description><![CDATA[ 



<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>In <a href="https://bitsofanalytics.org/posts/river_map_python/">part 1</a> of this series we created a river map with QGIS and in <a href="https://bitsofanalytics.org/posts/river_map_python/">part 2</a> with Python. We’ll finish it off with R. For the R based map, I decided to create a separate post written with Quarto to avoid messing around with the <a href="https://rpy2.github.io/">r2py package</a> within Jupyter notebooks.</p>
<p>This finished products still need a lot of work. We are really just focusing on getting started with creating maps that combine vector and raster data and that require some combination of clipping, resampling and reprojecting. Along the way we’ll learn about some of the key R packages for working with geospatial data such as <a href="https://r-spatial.github.io/sf/index.html">sf</a>, <a href="https://rspatial.github.io/terra/">terra</a> and <a href="https://dieghernan.github.io/tidyterra/">tidyterra</a>. Most of the mapping will be done with <strong>ggplot2</strong>.</p>
<p>A few key online books I used include:</p>
<ul>
<li><a href="https://r.geocompx.org/">Geocomputation in R</a> - extremely informative online book</li>
<li><a href="https://ggplot2-book.org/maps">The maps chapter in the ggplot2 book</a> - ggplot2 has a wealth of built in mapping capabilities</li>
<li><a href="https://tmieno2.github.io/R-as-GIS-for-Economists/">R as GIS for Economists</a> - another good online book</li>
</ul>
</section>
<section id="data-sources" class="level2">
<h2 class="anchored" data-anchor-id="data-sources">Data sources</h2>
<p>The data, all freely available, is described in more detail in the first two posts in this series. It’s a combination of vector and raster data.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set environmental variable in $HOME/.Renviron</span></span>
<span id="cb1-2">WSFR_DATA_ROOT <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.getenv</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"WSFR_DATA_ROOT"</span>))</span></code></pre></div>
</div>
<section id="sites-and-basins-vector" class="level3">
<h3 class="anchored" data-anchor-id="sites-and-basins-vector">Sites and basins (vector)</h3>
<p>The streamflow guage sites and associated watershed basins are defined in the <code>geospatial.gpkg</code> <a href="https://mapscaping.com/reading-and-writing-geopackage-in-python/">GeoPackage</a> file available from the <a href="https://www.drivendata.org/competitions/254/reclamation-water-supply-forecast-dev/data/">WSFR data downloads page</a>. GeoPackage files can contain multiple layers of both vector and raster data.</p>
<p>Under the hood, a GeoPackage is a SQLite database that conforms to the <a href="https://www.geopackage.org/spec/">GeoPackage standard</a> developed by the <a href="https://www.ogc.org/">Open Geospatial Consortium</a>. Both <code>basins</code> and <code>sites</code> are tables in the GeoPackage (SQLite database) and can be accessed directly with a SQLite database browser or any tool for working with SQLite databases.</p>
<p>In order to open a GeoPackage in R, we can use the <a href="https://r-spatial.github.io/sf/articles/sf1.html">simple features</a> package.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(sf)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE</code></pre>
</div>
</div>
<p><a href="https://libgeos.org/">GEOS</a>, <a href="https://www.osgeo.org/projects/gdal/">GDAL</a> and <a href="https://proj.org/en/9.3/">PROJ</a> are all geospatial related system libraries used by the various R (and Python) geospatial packages. The fact that <code>sf_use_s2()</code> returns true, means that the <a href="https://cran.r-project.org/web/packages/s2/index.html">s2 spherical geometry package</a> is used. The R <strong>s2</strong> package is a set of bindings to <a href="http://s2geometry.io/">Google’s S2 package</a>.</p>
<p>To read a GeoPackage file into R, we use the <code>sf::st_read()</code> function. Many functions in <strong>sf</strong> with begin with <code>st_</code>. We need to pass in the layer name since the file contains multiple layers.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">basins_sf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_read</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(WSFR_DATA_ROOT, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"geospatial.gpkg"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">layer =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"basins"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Reading layer `basins' from data source 
  `/home/mark/Documents/projects/driven_data/wsfrodeo/data/geospatial.gpkg' 
  using driver `GPKG'
Simple feature collection with 26 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -122.3289 ymin: 35.69785 xmax: -104.7031 ymax: 51.33442
Geodetic CRS:  WGS 84</code></pre>
</div>
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">sites_sf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_read</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(WSFR_DATA_ROOT, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"geospatial.gpkg"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">layer =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sites"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Reading layer `sites' from data source 
  `/home/mark/Documents/projects/driven_data/wsfrodeo/data/geospatial.gpkg' 
  using driver `GPKG'
Simple feature collection with 26 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -122.2974 ymin: 35.70835 xmax: -104.718 ymax: 48.73217
Geodetic CRS:  WGS 84</code></pre>
</div>
</div>
<p>In Python we used GeoPandas to create <code>GeoDataFrame</code> objects. An <code>sf</code> object is a geodataframe in an R sense - it’s a geospatially aware <code>DataFrame</code> object.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">basins_sf</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Simple feature collection with 26 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -122.3289 ymin: 35.69785 xmax: -104.7031 ymax: 51.33442
Geodetic CRS:  WGS 84
First 10 features:
                          site_id                                name      area
1   hungry_horse_reservoir_inflow       Hungry Horse Reservoir Inflow  1681.780
2                snake_r_nr_heise              Snake River near Heise  5719.410
3         pueblo_reservoir_inflow             Pueblo Reservoir Inflow  4615.460
4          sweetwater_r_nr_alcova        Sweetwater River near Alcova  2377.280
5            missouri_r_at_toston            Missouri River at Toston 14676.200
6             animas_r_at_durango             Animas River at Durango   700.901
7              yampa_r_nr_maybell            Yampa River near Maybell  3381.680
8          libby_reservoir_inflow              Libby Reservoir Inflow  9030.450
9                boise_r_nr_boise              Boise River near Boise  2687.340
10 green_r_bl_howard_a_hanson_dam Green River below Howard Hanson Dam   221.234
                             geom
1  MULTIPOLYGON (((-113.097 47...
2  MULTIPOLYGON (((-110.792 44...
3  MULTIPOLYGON (((-105.6734 3...
4  MULTIPOLYGON (((-107.3282 4...
5  MULTIPOLYGON (((-110.63 46....
6  MULTIPOLYGON (((-107.8758 3...
7  MULTIPOLYGON (((-107.028 40...
8  MULTIPOLYGON (((-114.8582 4...
9  MULTIPOLYGON (((-115.2353 4...
10 MULTIPOLYGON (((-121.3147 4...</code></pre>
</div>
</div>
<p>Let’s plot these using <strong>ggplot2</strong> just to get a sense of them. The <code>geom_sf()</code> function makes it easy to plot simple features.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggplot2)</span></code></pre></div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb11-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> sites_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#3399FF"</span>)</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/river_map_r/river_map_r_files/figure-html/unnamed-chunk-6-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>Notice we automatically get lat/lon values on the axes.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb12-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> sites_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#3399FF"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb12-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> basins_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#a2d572"</span>)</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/river_map_r/river_map_r_files/figure-html/unnamed-chunk-7-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>If we don’t want the grey background, we can use a different theme.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb13-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> sites_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#3399FF"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb13-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> basins_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#a2d572"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb13-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_classic</span>()</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/river_map_r/river_map_r_files/figure-html/unnamed-chunk-8-1.png" class="img-fluid" width="672"></p>
</div>
</div>
</section>
<section id="state-and-provincial-boundaries-vector" class="level3">
<h3 class="anchored" data-anchor-id="state-and-provincial-boundaries-vector">State and provincial boundaries (vector)</h3>
<p>For this we can use the <a href="https://cran.r-project.org/web/packages/rnaturalearth/vignettes/rnaturalearth.html">rnaturalearth</a> package. It provides easy access to a whole lot of free vector and raster data focusing on our earth.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(rnaturalearth)</span></code></pre></div>
</div>
<blockquote class="blockquote">
<p><strong>NOTE</strong> After installing sf and rnaturalearth, both gave errors when I tried to load them with some message about a corrupted database. Quiting R Studio and restarting it resolved the problem. I think it’s something related to communication with the system PROJ library.</p>
</blockquote>
<p>In order to plot more detailed maps, we need to install the <a href="https://github.com/ropensci/rnaturalearth">rnaturalearthhires</a> package from GitHub. To do this I needed the <code>devtools</code> package but when installing it the usual way I got issues due to missing system libraries - see https://github.com/r-lib/devtools/issues/2472. Not a big deal, but noticed in Hadley Wickham’s answer to this issue he suggested using <code>writeLines(pak::pkg_system_requirements("devtools", "ubuntu", "22.04"))</code> to see the exact dependencies. Hmm, that seems useful but I had not heard of the <a href="https://pak.r-lib.org/reference/get-started.html">pak</a> package before. Looks like it’s a new take on package management in R. So, did an <code>install.packages("pak")</code> (yep, old package manager to install new package manager package) and then <code>pak::pkg_install("devtools")</code> which led to really nice error messages telling me exactly which Ubuntu system libraries I needed to install manually.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb15-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> apt install libfribidi-dev libgit2-dev libharfbuzz-dev</span></code></pre></div>
</div>
<p>Then redid the <code>pak::pkg_install("devtools")</code> followed by <code>pak::pkg_install("ropensci/rnaturalearthhires")</code>. Nice.</p>
<p>Let’s combine maps of states and provinces in the US and Candada into a single simple feature. The <code>st_union()</code> function makes this easy.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">states_provinces_sf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_union</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ne_states</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">country =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"united states of america"</span>),</span>
<span id="cb16-2">                                <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ne_states</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">country =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"canada"</span>))</span></code></pre></div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> states_provinces_sf[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"geometry"</span>]) </span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/river_map_r/river_map_r_files/figure-html/unnamed-chunk-12-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>To set the map extent for the entire presentation layer, we can use the <code>xlim=</code> and <code>ylim=</code> arguments to <code>coord_sf</code>. Make sure that this is done after adding the map layers.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">minx <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">126</span></span>
<span id="cb18-2">miny <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">35</span></span>
<span id="cb18-3">maxx <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">103</span></span>
<span id="cb18-4">maxy <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">52</span></span></code></pre></div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> states_provinces_sf[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"geometry"</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> sites_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#3399FF"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> basins_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#a2d572"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlim=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(minx, maxx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylim=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(miny, maxy)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_classic</span>()</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/river_map_r/river_map_r_files/figure-html/unnamed-chunk-14-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>Now let’s change the CRS for the to Albers Equal Area using the ESRI:102008 code. We can use <code>coord_sf</code> to set the CRS. Under the hood, I imagine <code>coord_sf</code> is using the PROJ that is installed on your system and that was linked to when we loaded the <strong>sf</strong> library. I was hoping I could first set the map extent using EPSG:4326 limits and then change the CRS. But, this doesn’t work. Later we’ll convert all of the simple features to ESRI:102008 and do some other manipulations to plot just the area we are interested in.</p>
<p><strong>This is not the right approach.</strong></p>
<div class="cell">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> states_provinces_sf[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"geometry"</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> sites_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#3399FF"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> basins_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#a2d572"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlim=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(minx, maxx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylim=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(miny, maxy)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">crs =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ESRI:102008'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_classic</span>()</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Coordinate system already present. Adding new coordinate system, which will
replace the existing one.</code></pre>
</div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/river_map_r/river_map_r_files/figure-html/unnamed-chunk-15-1.png" class="img-fluid" width="864"></p>
</div>
</div>
</section>
<section id="river-layer-vector" class="level3">
<h3 class="anchored" data-anchor-id="river-layer-vector">River layer (vector)</h3>
<p>For this layer we can use <a href="https://www.hydrosheds.org/products/hydrorivers">HydroRIVERS</a> data, a part of the <a href="https://www.hydrosheds.org/">HydroSHEDS</a> project. Both geodatase and shapefile formats are downloadable from <a href="https://www.hydrosheds.org/products/hydrorivers" class="uri">https://www.hydrosheds.org/products/hydrorivers</a>. There are different files for different regions of the world. The North American and Central America data as a zipped geodatabase is available from <a href="https://data.hydrosheds.org/file/HydroRIVERS/HydroRIVERS_v10_na.gdb.zip">here</a> and is ~72Mb in size (compressed). Unzip it after downloading. It’s a lot of rivers.</p>
<p>A few tutorials I found on mapping rivers basins in R, include:</p>
<ul>
<li><a href="https://github.com/milos-agathon/mapping-river-basins-with-r">Mapping Rivers Basins with R</a> with <a href="https://www.youtube.com/watch?v=HugGwjogPv0">video</a></li>
<li><a href="https://waterprogramming.wordpress.com/2021/10/18/tips-for-creating-watershed-maps-in-r/">Tips for Creating Watershed Maps in R</a></li>
<li><a href="https://waterprogramming.wordpress.com/2019/04/05/geospatial-mapping-in-r/">Geospatial Mapping in R</a></li>
</ul>
<div class="cell">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1">rivers_sf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_read</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(WSFR_DATA_ROOT, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"hydrosheds/HydroRIVERS_v10_na.gdb"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HydroRIVERS_v10_na.gdb"</span>))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Reading layer `HydroRIVERS_v10_na' from data source 
  `/home/mark/Documents/projects/driven_data/wsfrodeo/data/hydrosheds/HydroRIVERS_v10_na.gdb/HydroRIVERS_v10_na.gdb' 
  using driver `OpenFileGDB'
Simple feature collection with 986463 features and 15 fields
Geometry type: MULTILINESTRING
Dimension:     XY
Bounding box:  xmin: -137.9354 ymin: 5.510417 xmax: -52.66458 ymax: 62.67292
Geodetic CRS:  WGS 84</code></pre>
</div>
</div>
<p>Now let’s filter by flow order and clip to a bounding box. The <strong>sf</strong> package includes the <code>st_bbox()</code> function for creating bounding boxes in a specific CRS. It also includes the <code>st_crop()</code> function for cropping simple features to a bounding box.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dplyr, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">warn.conflicts =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb24-2"></span>
<span id="cb24-3">rivers_sf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> rivers_sf <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb24-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(ORD_FLOW <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb24-5"></span>
<span id="cb24-6">bbox <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_bbox</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xmin =</span> minx, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xmax =</span> maxx, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ymin =</span> miny, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ymax =</span> maxy),</span>
<span id="cb24-7">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">crs =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_crs</span>(basins_sf))</span>
<span id="cb24-8"></span>
<span id="cb24-9">rivers_sf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_crop</span>(rivers_sf, bbox)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: attribute variables are assumed to be spatially constant throughout
all geometries</code></pre>
</div>
</div>
<p>It seems that the placement of the <code>coord_sf</code> command is important. If I put it before the <code>geom_sf</code> layers, the extent of those layers seems to override the <code>xlim</code> and <code>ylim</code> values set in the <code>coord_sf</code> statement. It also seems important to express the limits so that they are consistent with the CRS used for the map.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb26-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> states_provinces_sf) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb26-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> sites_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#3399FF"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb26-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> basins_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#a2d572"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb26-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> rivers_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#487bb6"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb26-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlim=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(minx, maxx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylim=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(miny, maxy)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb26-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_classic</span>()</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/river_map_r/river_map_r_files/figure-html/unnamed-chunk-18-1.png" class="img-fluid" width="864"></p>
</div>
</div>
</section>
<section id="digital-elevation-model-or-dem-raster" class="level3">
<h3 class="anchored" data-anchor-id="digital-elevation-model-or-dem-raster">Digital Elevation Model or DEM (raster)</h3>
<p>For working with raster data in R, the <strong>terra</strong> package is replacing the <strong>raster</strong> package. But, there’s also the <strong>stars</strong> package (written by the authors of <strong>sf</strong>). We will use <strong>terra</strong> in this post. For information on transitioning from <strong>raster</strong> to <strong>terra</strong> or <strong>stars</strong>, see <a href="https://psfaculty.plantsciences.ucdavis.edu/plant/AdditionalTopics_Transition.pdf">this pdf</a>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(terra)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>terra 1.7.65</code></pre>
</div>
</div>
<p>Documentation for <a href="https://cran.r-project.org/web/packages/terra/terra.pdf">terra can be found here</a>.</p>
<p>We can read in the Hydrosheds DEM using the <strong>terra</strong> function <code>rast()</code>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1">dem_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(WSFR_DATA_ROOT, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'hydrosheds/na_con_3s/na_con_3s.tif'</span>)</span>
<span id="cb29-2">dem_sr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rast</span>(dem_file)</span></code></pre></div>
</div>
<p>The <strong>terra</strong> package has a bunch of functions for working with <code>SpatRaster</code>, <code>SpatVector</code> and <code>SpatExtent</code> objects.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1">terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(dem_sr) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Number of rows</span></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 72000</code></pre>
</div>
<div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1">terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ncol</span>(dem_sr) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Number of columns</span></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 120000</code></pre>
</div>
<div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1">terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">res</span>(dem_sr) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># resolution</span></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0.0008333333 0.0008333333</code></pre>
</div>
</div>
<p>The units of resolution are degrees. The DEM is 3 arc seconds on each side. This is ~90 meters at the equator.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb36-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">writeLines</span>(terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">crs</span>(dem_sr))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]</code></pre>
</div>
</div>
<p>What about the total number of cells and the number of layers?</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb38-1">terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ncell</span>(dem_sr)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 8.64e+09</code></pre>
</div>
<div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb40-1">terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nlyr</span>(dem_sr)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 1</code></pre>
</div>
</div>
<p>The <code>ext</code> function creates a <code>SpatExtent</code> object from a vector specifying xmin, xmax, ymin, and ymax. Instead of using the extent used above, we are going to add on a buffer of 10 degrees in each direction (except west). Hopefully this will help avoid the problems with reprojecting and ending up with a raster that didn’t cover the entire area of interest.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb42-1">buffer <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb42-2">map_extent <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ext</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(bbox<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>xmin, bbox<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>xmax, bbox<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ymin, bbox<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ymax))</span>
<span id="cb42-3">raster_extent <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ext</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(bbox<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>xmin, bbox<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>xmax <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> buffer, bbox<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ymin <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> buffer, bbox<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ymax <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> buffer))</span></code></pre></div>
</div>
<p>Now we can use the <code>raster_extent</code> to crop our DEM.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb43-1">dem_sr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">crop</span>(dem_sr, raster_extent)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
|---------|---------|---------|---------|
=========================================
                                          </code></pre>
</div>
<div class="sourceCode cell-code" id="cb45" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb45-1">dem_sr</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>class       : SpatRaster 
dimensions  : 32400, 33600, 1  (nrow, ncol, nlyr)
resolution  : 0.0008333333, 0.0008333333  (x, y)
extent      : -126, -98, 30, 57  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : spat_nthHTgVe8dNXZZB_10907.tif 
varname     : na_con_3s 
name        : Band_1 
min value   :  -1000 
max value   :   4363 </code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb47" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb47-1">terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(dem_sr) </span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 32400</code></pre>
</div>
<div class="sourceCode cell-code" id="cb49" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb49-1">terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ncol</span>(dem_sr) </span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 33600</code></pre>
</div>
<div class="sourceCode cell-code" id="cb51" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb51-1">terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ncell</span>(dem_sr) </span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 1088640000</code></pre>
</div>
</div>
</section>
</section>
<section id="adding-the-dem-raster-to-our-plot" class="level2">
<h2 class="anchored" data-anchor-id="adding-the-dem-raster-to-our-plot">Adding the DEM raster to our plot</h2>
<p>The <strong>ggplot2</strong> package directly supports plotting vector data stored in <strong>sf</strong> objects through its <code>geom_sf()</code> and <code>coord_sf()</code> functions. The <a href="https://paleolimbot.github.io/ggspatial/articles/ggspatial.html">ggspatial library</a> fills in some gaps to make plotting with <strong>ggplot2</strong> easier and more powerful. We’ll also make use of <a href="https://dieghernan.github.io/tidyterra/">tidyterra</a>, a library that extends <strong>terra</strong> with <strong>tidyverse</strong> functions.</p>
<section id="tidyterra" class="level3">
<h3 class="anchored" data-anchor-id="tidyterra">tidyterra</h3>
<div class="cell">
<div class="sourceCode cell-code" id="cb53" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb53-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyterra, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">warn.conflicts =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span></code></pre></div>
</div>
<p>The two main stated goals of <strong>tidyterra</strong> are to:</p>
<ul>
<li>add common <strong>tidyverse</strong> methods to <code>SpatVector</code> and <code>SpatRaster</code> objects (<strong>terra</strong>),</li>
<li>provide geoms to plot <code>SpatVector</code> and <code>SpatRaster</code> objects with <strong>ggplot2</strong>.</li>
</ul>
<p>There is an <a href="https://dieghernan.github.io/tidyterra/articles/tidyterra.html">article on <strong>tidyterra</strong></a> in the Journal of Open Source Software.</p>
<p>According to the docs for <code>geom_spatraster()</code>, it’s based on the <code>layer_spatial()</code> implementation in <strong>ggspatial</strong>. And, <code>geom_spatvector()</code> is a wrapper for <code>ggplot2::geom_sf()</code>.</p>
<p>When you plot a <code>SpatRaster</code> with <strong>ggplot2</strong> using <code>geom_spatraster()</code>, resampling is done automatically based on the number of cells in the raster.</p>
<p>This <a href="https://dieghernan.github.io/202210_tidyterra-hillshade/">post by the author of tidyterra</a>, has a nice discussion of issues surrounding <a href="https://en.wikipedia.org/wiki/Hypsometric_tints">choice of color gradients for elevation plots</a>. For example, green connotes forest and so for the west we don’t want a linear scale with green for low values because too much will be green. A non-linear scale is more appropriate. The <strong>tidyterra</strong> package includes <a href="https://dieghernan.github.io/tidyterra/reference/scale_hypso.html">a number of such color gradients</a>. The gradients are from [cpt-city] (http://soliton.vm.bytemark.co.uk/pub/cpt-city/), which is a huge collection of color gradient scales. Another good post on hillshade effect is <a href="https://dominicroye.github.io/en/2022/hillshade-effects/">this one by Dominic Roye</a>.</p>
<p>From the <a href="https://dieghernan.github.io/tidyterra/articles/palettes.html">pallets page</a>, I chose the “etopo1_hypso” pallette, part of the <code>hypso.colors2</code> collection of palletes.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb54" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb54-1">grad_hypso <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hypso.colors2</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"etopo1_hypso"</span>)</span></code></pre></div>
</div>
<p>Now we are ready to add our DEM to the plot using <code>geom_spatraster</code>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb55" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb55-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb55-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_spatraster</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> dem_sr) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_fill_gradientn</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colours =</span> grad_hypso, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.value =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb55-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> states_provinces_sf[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"geometry"</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb55-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> sites_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb55-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> basins_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"green"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.20</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb55-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> rivers_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#487bb6"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb55-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">crs =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'EPSG:4326'</span>, </span>
<span id="cb55-8">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlim =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(map_extent<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>xmin, map_extent<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>xmax), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylim =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(map_extent<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ymin, map_extent<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ymax)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb55-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_classic</span>()</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>&lt;SpatRaster&gt; resampled to 501095 cells for plotting</code></pre>
</div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/river_map_r/river_map_r_files/figure-html/unnamed-chunk-29-1.png" class="img-fluid" width="864"></p>
</div>
</div>
</section>
</section>
<section id="reprojection" class="level2">
<h2 class="anchored" data-anchor-id="reprojection">Reprojection</h2>
<p>Our map is of a pretty big chunk of North America. Plotting directly as lat-lon values in the X-Y plane is probably not great in terms of distorting things like area and distance. Earlier we reprojected the state and province layer using Albers Equal Area (ESRI:102008). Let’s do that for all of the layers.</p>
<p>Here are</p>
<p>This section of GCwR is very good and has many links to relevant background sources:</p>
<ul>
<li>Chapter 7 from the <a href="https://r.geocompx.org/reproj-geo-data">Geocomputation with R</a> book,</li>
<li><a href="https://inbo.github.io/tutorials/tutorials/spatial_crs_coding/">How to specify a coordinate reference system in R</a></li>
<li><a href="https://rsbivand.github.io/ECS530_h19/ECS530_III.html#coordinate_reference_systems:_background">ECS530: (III) Coordinate reference systems</a> - detailed history of PROJ, GDAL and their use in R and other systems.</li>
</ul>
<p>Currently all of our data is in a geographic (non-projected) CRS - EPSG:4326. This CRS uses the WGS 94 datum as its ellipsoid model of the earth. The ESRI:102008 projection uses NAD83 as its datum. Good thing we’ve got tools to make it easy to convert between two such CRSs.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb57" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb57-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_crs</span>(rivers_sf)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]</code></pre>
</div>
</div>
<p>The <code>st_crs()</code> function has a bunch of named items, including:</p>
<ul>
<li><code>st_crs(new_vector)$IsGeographic</code> to check is the CRS is geographic or not</li>
<li><code>st_crs(new_vector)$units_gdal</code> to find out the CRS units</li>
<li><code>st_crs(new_vector)$srid</code> to extract its ‘SRID’ identifier (when available)</li>
<li><code>st_crs(new_vector)$proj4string</code> to extract the proj-string representation</li>
</ul>
<div class="cell">
<div class="sourceCode cell-code" id="cb59" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb59-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_crs</span>(basins_sf)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>IsGeographic</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] TRUE</code></pre>
</div>
<div class="sourceCode cell-code" id="cb61" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb61-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_crs</span>(basins_sf)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>units_gdal</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "degree"</code></pre>
</div>
<div class="sourceCode cell-code" id="cb63" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb63-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_crs</span>(basins_sf)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>srid</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "EPSG:4326"</code></pre>
</div>
<div class="sourceCode cell-code" id="cb65" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb65-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_crs</span>(basins_sf)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>proj4string</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "+proj=longlat +datum=WGS84 +no_defs"</code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb67" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb67-1">basins_102008_sf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_transform</span>(basins_sf, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ESRI:102008"</span>)</span>
<span id="cb67-2">sites_102008_sf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_transform</span>(sites_sf, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ESRI:102008"</span>)</span>
<span id="cb67-3">rivers_102008_sf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_transform</span>(rivers_sf, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ESRI:102008"</span>)</span>
<span id="cb67-4">states_provinces_102008_sf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_transform</span>(states_provinces_sf, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ESRI:102008"</span>)</span></code></pre></div>
</div>
<p>Notice how much more information is contained in the WKT (well known text) associated with ESRI:102008.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb68" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb68-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_crs</span>(basins_102008_sf)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Coordinate Reference System:
  User input: ESRI:102008 
  wkt:
PROJCRS["North_America_Albers_Equal_Area_Conic",
    BASEGEOGCRS["NAD83",
        DATUM["North American Datum 1983",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["North_America_Albers_Equal_Area_Conic",
        METHOD["Albers Equal Area",
            ID["EPSG",9822]],
        PARAMETER["Latitude of false origin",40,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-96,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",20,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",60,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Not known."],
        AREA["North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. United States (USA) - Alabama; Alaska (mainland); Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming."],
        BBOX[23.81,-172.54,86.46,-47.74]],
    ID["ESRI",102008]]</code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb70" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb70-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_crs</span>(basins_102008_sf)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>IsGeographic</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] FALSE</code></pre>
</div>
<div class="sourceCode cell-code" id="cb72" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb72-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_crs</span>(basins_102008_sf)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>units_gdal</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "metre"</code></pre>
</div>
<div class="sourceCode cell-code" id="cb74" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb74-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_crs</span>(basins_102008_sf)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>srid</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "ESRI:102008"</code></pre>
</div>
<div class="sourceCode cell-code" id="cb76" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb76-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_crs</span>(basins_102008_sf)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>proj4string</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "+proj=aea +lat_0=40 +lon_0=-96 +lat_1=20 +lat_2=60 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs"</code></pre>
</div>
</div>
<p>To reproject the raster DEM, we can use <code>terra::project()</code>. However, when i tried that it took forever and R died when I tried to kill the command. For reprojecting large rasters, using <a href="https://gdal.org/programs/gdalwarp.html">gdalwarp</a> is often a good way to go. According to the documentation for the R package, <a href="https://cran.r-project.org/web/packages/gdalUtilities/index.html">gdalUtils</a>, the <strong>sf</strong> package contains the GDAL executables along with R wrappers for them. The <strong>gdalUtilities</strong> package provides different wrappers whose arguments more closely match the GDAL executables.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb78" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb78-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># takes forever</span></span>
<span id="cb78-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#dem_102008_sr = terra::project(dem_sr, 'ESRI:102008')</span></span></code></pre></div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb79" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb79-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(gdalUtilities)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>
Attaching package: 'gdalUtilities'</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>The following object is masked from 'package:sf':

    gdal_rasterize</code></pre>
</div>
</div>
<p>Since we had issues in the <a href="https://bitsofanalytics.org/posts/river_map_python/">Python river map post</a> related to the raster being clipped in unexpected ways, let’s take an alternative approach. Let’s try this:</p>
<ul>
<li>use <strong>gdalUtilities</strong> to resample and reproject the clipped DEM raster to ESRI:102008 and write it to a file,</li>
<li>read the new file in using <strong>terra</strong>,</li>
<li>(maybe) clip the new raster by matching the extent of one of the existing vector layers.</li>
</ul>
<p>The argument names used match those used by the <code>gdalwarp</code> command line tool.</p>
<ul>
<li><code>r='bilinear'</code> - the resampling algorithm (nearest neighbor is default),</li>
<li><code>tr = c(1000, 1000)</code> - width and height of each pixel in units of the target SRS (spatial reference system),</li>
<li><code>te = ...</code> - xmin, ymin, xmax, ymax of target extent,</li>
<li><code>te_srs = 'EPSG:4326',</code> - the SRS corresponding to the <code>te</code> argument (avoid having to know the desired extent for the target SRS),</li>
<li><code>t_srs = 'ESRI:102008'</code> - the target SRS</li>
</ul>
<div class="cell">
<div class="sourceCode cell-code" id="cb82" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb82-1">dem_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(WSFR_DATA_ROOT, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'hydrosheds/na_con_3s/na_con_3s.tif'</span>)</span>
<span id="cb82-2">dem_102008_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(WSFR_DATA_ROOT, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'hydrosheds/na_con_3s/na_con_3s_102008_r.tif'</span>)</span>
<span id="cb82-3"></span>
<span id="cb82-4">gdalUtilities<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gdalwarp</span>(dem_file, dem_102008_file, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">r =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'bilinear'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tr =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>),</span>
<span id="cb82-5">                        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">te =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(raster_extent<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>xmin, raster_extent<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ymin, raster_extent<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>xmax, raster_extent<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ymax),</span>
<span id="cb82-6">                        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">te_srs =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'EPSG:4326'</span>,</span>
<span id="cb82-7">                        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">t_srs =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ESRI:102008'</span>,</span>
<span id="cb82-8">                        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overwrite =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div>
</div>
<p>Let’s explore the new raster.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb83" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb83-1">dem_102008_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(WSFR_DATA_ROOT, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'hydrosheds/na_con_3s/na_con_3s_102008_r.tif'</span>)</span>
<span id="cb83-2">dem_102008_sr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rast</span>(dem_102008_file)</span>
<span id="cb83-3"></span>
<span id="cb83-4">terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(dem_102008_sr) </span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 2725</code></pre>
</div>
<div class="sourceCode cell-code" id="cb85" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb85-1">terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ncol</span>(dem_102008_sr) </span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 2616</code></pre>
</div>
<div class="sourceCode cell-code" id="cb87" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb87-1">terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ncell</span>(dem_102008_sr) </span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 7128600</code></pre>
</div>
</div>
<p>Our DEM is much smaller now. We can use this new raster to extract a new map extent in meters for ESRI:102008.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb89" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb89-1">map_extent_102008 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ext</span>(dem_102008_sr)</span>
<span id="cb89-2">map_extent_102008</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>SpatExtent : -2735048.78302539, -119048.783025391, -731955.995255143, 1993044.00474486 (xmin, xmax, ymin, ymax)</code></pre>
</div>
</div>
<p>Now use this <code>SpatExtent</code> to clip the state/province boundaries layer to the same size.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb91" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb91-1">states_provinces_102008_sf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_crop</span>(states_provinces_102008_sf, map_extent_102008)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: attribute variables are assumed to be spatially constant throughout
all geometries</code></pre>
</div>
</div>
<p>Okay, we’re ready to plot.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb93" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb93-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb93-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_spatraster</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> dem_102008_sr) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb93-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_fill_gradientn</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colours =</span> grad_hypso, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.value =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb93-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> states_provinces_102008_sf[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"geometry"</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb93-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> basins_102008_sf, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Basins"</span>),  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.20</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"green"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">show.legend =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"polygon"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb93-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> sites_102008_sf, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sites"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">show.legend =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"point"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb93-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'green'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'red'</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb93-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> rivers_102008_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#487bb6"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb93-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">crs =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ESRI:102008'</span>, </span>
<span id="cb93-10">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlim =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(map_extent_102008<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>xmin, map_extent_102008<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>xmax), </span>
<span id="cb93-11">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylim =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(map_extent_102008<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ymin, map_extent_102008<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ymax)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb93-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_classic</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb93-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Water Supply Rodeo Forecast Sites"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>&lt;SpatRaster&gt; resampled to 500346 cells for plotting</code></pre>
</div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/river_map_r/river_map_r_files/figure-html/unnamed-chunk-41-1.png" class="img-fluid" width="864"></p>
</div>
</div>
<p>A few things to note:</p>
<ul>
<li>the axis labels are in longitude and latitude as one would hope,</li>
<li>our raster has a colorbar legend entry,</li>
<li>the legend for the sites and basins is, not great. Not sure why both legend entries have red diamonds. Hmmm.</li>
</ul>
<p>Seems there’s some trickiness with map legends in <strong>ggplot2</strong>. We’ll punt on this for now but leave some relevant links for future exploration.</p>
<ul>
<li><a href="https://stackoverflow.com/questions/66770005/why-doesnt-my-legend-appear-when-drawing-geom-sf-with-ggplot">Why doesn’t my legend appear when drawing geom_sf with ggplot?</a></li>
<li><a href="https://stackoverflow.com/questions/59249385/draw-legend-with-geom-sf-when-no-aesthetic-is-specified">Draw legend with geom_sf when no aesthetic is specified</a></li>
<li><a href="https://stackoverflow.com/questions/51371480/format-multiple-geom-sf-legends">Format multiple geom_sf legends</a></li>
<li><a href="https://aosmith.rbind.io/2020/07/09/ggplot2-override-aes/">Controlling legend appearance in ggplot2 with override.aes</a></li>
</ul>
<section id="scale" class="level3">
<h3 class="anchored" data-anchor-id="scale">Scale</h3>
<p>Let’s end this post by adding a distance scale using the <strong>ggspatial</strong> package.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb95" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb95-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggspatial)</span></code></pre></div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb96" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb96-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb96-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_spatraster</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> dem_102008_sr) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb96-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_fill_gradientn</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colours =</span> grad_hypso, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.value =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb96-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> states_provinces_102008_sf[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"geometry"</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb96-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> basins_102008_sf, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Basins"</span>),  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.20</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"green"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">show.legend =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"polygon"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb96-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> sites_102008_sf, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sites"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">show.legend =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"point"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb96-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'green'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'red'</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb96-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> rivers_102008_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#487bb6"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb96-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">crs =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ESRI:102008'</span>, </span>
<span id="cb96-10">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlim =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(map_extent_102008<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>xmin, map_extent_102008<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>xmax), </span>
<span id="cb96-11">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylim =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(map_extent_102008<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ymin, map_extent_102008<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ymax)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb96-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_classic</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb96-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Water Supply Rodeo Forecast Sites"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb96-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">annotation_scale</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">location =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"tl"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>&lt;SpatRaster&gt; resampled to 500346 cells for plotting</code></pre>
</div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/river_map_r/river_map_r_files/figure-html/unnamed-chunk-43-1.png" class="img-fluid" width="864"></p>
</div>
</div>
<p>Seems like a good place to stop. Obviously, much to do in terms of getting all the details right to make really nice maps. But, we’ve at least figured out the basic workflow and relevant R packages for getting a first draft done.</p>


</section>
</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{isken2024,
  author = {Isken, Mark},
  title = {A Vector-Raster River Map, Three Ways - {Part} 3: {R}},
  date = {2024-02-20},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-isken2024" class="csl-entry quarto-appendix-citeas">
Isken, Mark. 2024. <span>“A Vector-Raster River Map, Three Ways - Part
3: R.”</span> February 20, 2024.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>geonewb</category>
  <category>mapping</category>
  <category>geocomputation</category>
  <guid>https://bitsofanalytics.org/posts/river_map_r/river_map_r.html</guid>
  <pubDate>Tue, 20 Feb 2024 05:00:00 GMT</pubDate>
</item>
<item>
  <title>A vector-raster river map, three ways - Part 2: Python</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/river_map_python/</link>
  <description><![CDATA[ 




<p>Welcome to another post in my <a href="https://bitsofanalytics.org/#category=geonewb">geonewb</a> series.</p>
<p>In <a href="https://bitsofanalytics.org/posts/river_map_qgis/">Part 1 of this river map series</a> we created a map using QGIS and now we’ll try to reproduce that map using Python. There are numerous packages in Python for plotting maps but we’ll just focus on using Cartopy (which is based on matplotlib) for a simple static map.</p>
<p>As you’ll see, we end up in a bit of hornet’s nest of issues related to reprojections. This post is just a start to Python based mapping.</p>
<div id="d2e21e9f-7d40-4be0-987b-528297d95029" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> os</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> pathlib <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Path</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> IPython.display <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Image</span></code></pre></div>
</div>
<div id="47758c0a-8367-4d17-8a1e-1712c5326b83" class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> matplotlib.pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb2-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span></code></pre></div>
</div>
<div id="0c35af04-3a3e-4b30-b9e2-b01b40ed2c0e" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> pint <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> UnitRegistry</span>
<span id="cb3-2">ureg <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> UnitRegistry()</span></code></pre></div>
</div>
<div id="0ba4296f-e163-467f-8fd9-3994065652b7" class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>matplotlib inline</span></code></pre></div>
</div>
<div id="d842e82a-8451-4425-933d-3065a0f0a546" class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">WSFR_DATA_ROOT <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> os.environ[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'WSFR_DATA_ROOT'</span>]</span></code></pre></div>
</div>
<p>Here’s what one of the final map looks like, created with QGIS. This one uses an equirectangular “projection”.</p>
<div id="26de49a7-2c60-408a-9baa-eafcc67a32c1" class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1">Image(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./images/WaterSupplyForecastRodeo.png'</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="6">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/river_map_python/index_files/figure-html/cell-7-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Here is another version using an Albers Equal-Area projection and just showing the map view in QGIS (not a final map layout as above).</p>
<div id="2d1b067d-5929-43cd-ae18-d6ea0e287daf" class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1">Image(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./images/WaterSupplyForecastRodeo_Albers.png'</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="7">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/river_map_python/index_files/figure-html/cell-8-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<section id="data-sources-and-python-libraries" class="level2">
<h2 class="anchored" data-anchor-id="data-sources-and-python-libraries">Data sources and Python libraries</h2>
<p>(The following is repeated, for convenience, from <a href="https://bitsofanalytics.org/posts/river_map_qgis/">Part 1</a>)</p>
<p>All of the data we need is freely available. We will need several Python packages for working with the raster and vector data for these maps. In particular, we will use the <a href="https://geopandas.org/en/stable/index.html">GeoPandas</a> package to read vector data into <code>GeoDataFrame</code> objects. GeoPandas adds geospatial functionality to pandas.</p>
<blockquote class="blockquote">
<p>GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by <a href="https://shapely.readthedocs.io/en/stable/index.html">shapely</a>. GeoPandas further depends on <a href="https://fiona.readthedocs.io/">fiona</a> for file access and matplotlib for plotting.</p>
</blockquote>
<p>For raster data, we’ll use the <a href="https://docs.xarray.dev/en/stable/">xarray</a> and <a href="https://corteva.github.io/rioxarray/html/readme.html">rioxarray</a> packages.</p>
<p>The xarray package builds on top of NumPy N-d arrays and adds the ability to create and work with labels for the dimensions.</p>
<p>The two main data structures are <code>DataArray</code> (a N-d generalization of a <code>pandas.Series</code>) and <code>DataSet</code> (an N-d generalization of a <code>pandas.DataFrame</code>). The <a href="https://docs.xarray.dev/en/stable/getting-started-guide/why-xarray.html">Overview: Why xarray?</a> page has a nice level of detail on the case for xarray and its link to geospatial analysis.</p>
<p>The rioxarray package extends the xarray package to facilitate reading raster data into xarray objects. The actual reading of the raster file is done using another Python package known as <a href="https://rasterio.readthedocs.io/en/latest/">rasterio</a>. From the rasterio docs:</p>
<blockquote class="blockquote">
<p>Geographic information systems use GeoTIFF and other formats to organize and store gridded raster datasets such as satellite imagery and terrain models. Rasterio reads and writes these formats and provides a Python API based on Numpy N-dimensional arrays and GeoJSON.</p>
</blockquote>
<section id="sites-and-basins-vector" class="level3">
<h3 class="anchored" data-anchor-id="sites-and-basins-vector">Sites and basins (vector)</h3>
<p>The streamflow guage sites and associated watershed basins are defined in the <code>geospatial.gpkg</code> <a href="https://mapscaping.com/reading-and-writing-geopackage-in-python/">GeoPackage</a> file available from the <a href="https://www.drivendata.org/competitions/254/reclamation-water-supply-forecast-dev/data/">WSFR data downloads page</a>. GeoPackage files can contain multiple layers of both vector and raster data. Individual layers can be read using GeoPandas’ <code>read_file</code> method by passing the layer name. The <code>layer=</code> argument is actually passed along to <a href="https://fiona.readthedocs.io/en/stable/index.html">Fiona</a>, a Python wrapper for accessing vector data via the <a href="https://gdal.org/">GDAL/OGR library</a>.</p>
<p>Under the hood, a GeoPackage is a SQLite database that conforms to the <a href="https://www.geopackage.org/spec/">GeoPackage standard</a> developed by the <a href="https://www.ogc.org/">Open Geospatial Consortium</a>. Both <code>basins</code> and <code>sites</code> are tables in the GeoPackage (SQLite database) and can be accessed directly with a SQLite database browser or any tool for working with SQLite databases. If you omit the <code>layer=</code> argument, GeoPandas returns the <code>basins</code> layer, likely as it’s the first layer in whatever internal indexing scheme is used with the collection of layers.</p>
<div id="88b806f5-ac42-45c9-827c-80398de69ada" class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> geopandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> gpd</span>
<span id="cb8-2"></span>
<span id="cb8-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Reading GeoPackage files</span></span>
<span id="cb8-4">geospatial_input_file <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Path(WSFR_DATA_ROOT, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'geospatial.gpkg'</span>)</span>
<span id="cb8-5">basins_gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> gpd.read_file(geospatial_input_file, layer<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'basins'</span>)</span>
<span id="cb8-6">sites_gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> gpd.read_file(geospatial_input_file, layer<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sites'</span>)</span></code></pre></div>
</div>
<div id="0044e8ef-407b-453f-976b-dd2a493310fc" class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1">basins_gdf</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="9">
<div>
<div>


<table class="dataframe table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">site_id</th>
<th data-quarto-table-cell-role="th">name</th>
<th data-quarto-table-cell-role="th">area</th>
<th data-quarto-table-cell-role="th">geometry</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>hungry_horse_reservoir_inflow</td>
<td>Hungry Horse Reservoir Inflow</td>
<td>1681.780</td>
<td>POLYGON ((-113.09701 47.24399, -113.09730 47.2...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>snake_r_nr_heise</td>
<td>Snake River near Heise</td>
<td>5719.410</td>
<td>MULTIPOLYGON (((-110.79196 44.40127, -110.7922...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>pueblo_reservoir_inflow</td>
<td>Pueblo Reservoir Inflow</td>
<td>4615.460</td>
<td>POLYGON ((-105.67340 38.15883, -105.67315 38.1...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>sweetwater_r_nr_alcova</td>
<td>Sweetwater River near Alcova</td>
<td>2377.280</td>
<td>POLYGON ((-107.32822 42.21621, -107.32903 42.2...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>missouri_r_at_toston</td>
<td>Missouri River at Toston</td>
<td>14676.200</td>
<td>POLYGON ((-110.63001 46.30856, -110.63006 46.3...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">5</td>
<td>animas_r_at_durango</td>
<td>Animas River at Durango</td>
<td>700.901</td>
<td>POLYGON ((-107.87584 37.27614, -107.87759 37.2...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">6</td>
<td>yampa_r_nr_maybell</td>
<td>Yampa River near Maybell</td>
<td>3381.680</td>
<td>MULTIPOLYGON (((-107.02802 40.02615, -107.0277...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">7</td>
<td>libby_reservoir_inflow</td>
<td>Libby Reservoir Inflow</td>
<td>9030.450</td>
<td>MULTIPOLYGON (((-114.85817 48.50961, -114.8582...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">8</td>
<td>boise_r_nr_boise</td>
<td>Boise River near Boise</td>
<td>2687.340</td>
<td>MULTIPOLYGON (((-115.23528 44.09596, -115.2355...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">9</td>
<td>green_r_bl_howard_a_hanson_dam</td>
<td>Green River below Howard Hanson Dam</td>
<td>221.234</td>
<td>POLYGON ((-121.31474 47.13373, -121.31530 47.1...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">10</td>
<td>taylor_park_reservoir_inflow</td>
<td>Taylor Park Reservoir Inflow</td>
<td>254.415</td>
<td>MULTIPOLYGON (((-106.74944 39.04268, -106.7499...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">11</td>
<td>dillon_reservoir_inflow</td>
<td>Dillon Reservoir Inflow</td>
<td>328.429</td>
<td>POLYGON ((-106.04383 39.35748, -106.04564 39.3...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">12</td>
<td>ruedi_reservoir_inflow</td>
<td>Ruedi Reservoir Inflow</td>
<td>223.740</td>
<td>POLYGON ((-106.52087 39.15795, -106.52216 39.1...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">13</td>
<td>fontenelle_reservoir_inflow</td>
<td>Fontenelle Reservoir Inflow</td>
<td>4199.150</td>
<td>POLYGON ((-110.06865 42.02568, -110.06944 42.0...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">14</td>
<td>weber_r_nr_oakley</td>
<td>Weber River near Oakley</td>
<td>162.429</td>
<td>POLYGON ((-111.06529 40.67623, -111.06633 40.6...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">15</td>
<td>san_joaquin_river_millerton_reservoir</td>
<td>San Joaquin River - Millerton Reservoir</td>
<td>1886.720</td>
<td>MULTIPOLYGON (((-120.44227 39.29433, -120.4423...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">16</td>
<td>merced_river_yosemite_at_pohono_bridge</td>
<td>Merced River - Yosemite at Pohono Bridge</td>
<td>321.858</td>
<td>POLYGON ((-119.44368 37.86102, -119.44401 37.8...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">17</td>
<td>american_river_folsom_lake</td>
<td>American River - Folsom Reservoir</td>
<td>1677.530</td>
<td>MULTIPOLYGON (((-119.52661 37.04015, -119.5265...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">18</td>
<td>colville_r_at_kettle_falls</td>
<td>Colville River at Kettle Falls</td>
<td>1086.250</td>
<td>POLYGON ((-117.88747 48.70666, -117.88776 48.7...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">19</td>
<td>stehekin_r_at_stehekin</td>
<td>Stehekin River at Stehekin</td>
<td>319.664</td>
<td>POLYGON ((-120.93255 48.19748, -120.93284 48.1...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">20</td>
<td>detroit_lake_inflow</td>
<td>Detroit Lake Inflow</td>
<td>452.383</td>
<td>POLYGON ((-121.85271 44.46766, -121.85379 44.4...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">21</td>
<td>virgin_r_at_virtin</td>
<td>Virgin River at Virgin</td>
<td>948.039</td>
<td>MULTIPOLYGON (((-112.73274 37.06034, -112.7327...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">22</td>
<td>skagit_ross_reservoir</td>
<td>Ross Reservoir Inflow</td>
<td>800.749</td>
<td>MULTIPOLYGON (((-120.98078 48.74358, -120.9810...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">23</td>
<td>boysen_reservoir_inflow</td>
<td>Boysen Reservoir Inflow</td>
<td>7706.580</td>
<td>POLYGON ((-108.50590 42.46772, -108.50643 42.4...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">24</td>
<td>pecos_r_nr_pecos</td>
<td>Pecos River near Pecos</td>
<td>171.839</td>
<td>MULTIPOLYGON (((-105.50099 35.90638, -105.5007...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">25</td>
<td>owyhee_r_bl_owyhee_dam</td>
<td>Owyhee River below Owyhee Dam</td>
<td>11659.800</td>
<td>POLYGON ((-118.19040 43.09658, -118.19034 43.0...</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</div>
<div id="8b9b0069-f5af-4021-b027-fdaee91965bb" class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1">sites_gdf</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="10">
<div>
<div>


<table class="dataframe table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">site_id</th>
<th data-quarto-table-cell-role="th">name</th>
<th data-quarto-table-cell-role="th">geometry</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>hungry_horse_reservoir_inflow</td>
<td>Hungry Horse Reservoir Inflow</td>
<td>POINT (-114.03786 48.35658)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>snake_r_nr_heise</td>
<td>Snake River near Heise</td>
<td>POINT (-111.66000 43.61250)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>pueblo_reservoir_inflow</td>
<td>Pueblo Reservoir Inflow</td>
<td>POINT (-104.71803 38.27167)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>sweetwater_r_nr_alcova</td>
<td>Sweetwater River near Alcova</td>
<td>POINT (-107.13394 42.48996)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>missouri_r_at_toston</td>
<td>Missouri River at Toston</td>
<td>POINT (-111.42028 46.14657)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">5</td>
<td>animas_r_at_durango</td>
<td>Animas River at Durango</td>
<td>POINT (-107.88035 37.27917)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">6</td>
<td>yampa_r_nr_maybell</td>
<td>Yampa River near Maybell</td>
<td>POINT (-108.03341 40.50275)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">7</td>
<td>libby_reservoir_inflow</td>
<td>Libby Reservoir Inflow</td>
<td>POINT (-115.31872 48.40066)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">8</td>
<td>boise_r_nr_boise</td>
<td>Boise River near Boise</td>
<td>POINT (-116.05955 43.52767)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">9</td>
<td>green_r_bl_howard_a_hanson_dam</td>
<td>Green River below Howard Hanson Dam</td>
<td>POINT (-121.79789 47.28371)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">10</td>
<td>taylor_park_reservoir_inflow</td>
<td>Taylor Park Reservoir Inflow</td>
<td>POINT (-106.60920 38.81833)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">11</td>
<td>dillon_reservoir_inflow</td>
<td>Dillon Reservoir Inflow</td>
<td>POINT (-106.06641 39.62554)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">12</td>
<td>ruedi_reservoir_inflow</td>
<td>Ruedi Reservoir Inflow</td>
<td>POINT (-106.81865 39.36387)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">13</td>
<td>fontenelle_reservoir_inflow</td>
<td>Fontenelle Reservoir Inflow</td>
<td>POINT (-110.06667 42.02778)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">14</td>
<td>weber_r_nr_oakley</td>
<td>Weber River near Oakley</td>
<td>POINT (-111.24796 40.73717)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">15</td>
<td>san_joaquin_river_millerton_reservoir</td>
<td>San Joaquin River - Millerton Reservoir</td>
<td>POINT (-119.72431 36.98439)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">16</td>
<td>merced_river_yosemite_at_pohono_bridge</td>
<td>Merced River - Yosemite at Pohono Bridge</td>
<td>POINT (-119.66567 37.71628)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">17</td>
<td>american_river_folsom_lake</td>
<td>American River - Folsom Reservoir</td>
<td>POINT (-121.16436 38.70453)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">18</td>
<td>colville_r_at_kettle_falls</td>
<td>Colville River at Kettle Falls</td>
<td>POINT (-118.06249 48.59435)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">19</td>
<td>stehekin_r_at_stehekin</td>
<td>Stehekin River at Stehekin</td>
<td>POINT (-120.69177 48.32958)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">20</td>
<td>detroit_lake_inflow</td>
<td>Detroit Lake Inflow</td>
<td>POINT (-122.29744 44.75378)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">21</td>
<td>virgin_r_at_virtin</td>
<td>Virgin River at Virgin</td>
<td>POINT (-113.18078 37.20415)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">22</td>
<td>skagit_ross_reservoir</td>
<td>Ross Reservoir Inflow</td>
<td>POINT (-121.06761 48.73217)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">23</td>
<td>boysen_reservoir_inflow</td>
<td>Boysen Reservoir Inflow</td>
<td>POINT (-108.17899 43.42496)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">24</td>
<td>pecos_r_nr_pecos</td>
<td>Pecos River near Pecos</td>
<td>POINT (-105.68270 35.70835)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">25</td>
<td>owyhee_r_bl_owyhee_dam</td>
<td>Owyhee River below Owyhee Dam</td>
<td>POINT (-117.25583 43.65444)</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</div>
<p>The <code>total_bounds</code> property gives us a bounding box containing all of the features in that <code>GeoDataFrame</code>. Notice that these are in longitude and latitude and are <code>minx</code> (longitude), <code>miny</code> (latitude), <code>maxx</code>, <code>maxy</code>.</p>
<div id="4c69956b-0687-46d8-a6ed-6efd4ca89d5b" class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1">basins_gdf.total_bounds</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="11">
<pre><code>array([-122.32886555,   35.69784548, -104.7031235 ,   51.33441978])</code></pre>
</div>
</div>
</section>
<section id="state-and-provincial-boundaries-vector" class="level3">
<h3 class="anchored" data-anchor-id="state-and-provincial-boundaries-vector">State and provincial boundaries (vector)</h3>
<p>The US government provides <a href="https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html">Cartographic Boundary Files</a> in both geodatabase and shapefile formats. They are available at different levels of resolution and also by geographic region.</p>
<blockquote class="blockquote">
<p>The cartographic boundary files are simplified representations of selected geographic areas from the Census Bureau’s Master Address File/Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) System. These boundary files are specifically designed for small scale thematic mapping. As of 2019, cartographic boundary files are available in shapefile, geodatabase, and Keyhole Markup Language (KML) format. For more details about these files, including their appropriate usage, please see our <a href="https://www.census.gov/programs-surveys/geography/technical-documentation/naming-convention/cartographic-boundary-file.html">Cartographic Boundary File Description</a> page.</p>
</blockquote>
<p>I downloaded the <a href="https://www2.census.gov/geo/tiger/GENZ2022/gdb/cb_2022_us_all_20m.zip">lowest resolution (1:20,000,000) zipped geodatabase file</a> of the entire US. Uncompressing it results in a folder that is the geodatabase.</p>
<p>Since some of the basins are in Canada, I found <a href="https://open.canada.ca/data/en/dataset/a883eb14-0c0e-45c4-b8c4-b54c4a819edb">similar maps for Canada in shapefile format</a>.</p>
<p>Later we’ll also use the public domain Natural Earth data.</p>
<div id="401c7f60-9857-422f-b664-7b0cc951e600" class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1">state_file <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Path(WSFR_DATA_ROOT, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'boundaries/cb_2022_us_all_20m.gdb'</span>)</span>
<span id="cb13-2">state_layer_name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'cb_2022_us_state_20m'</span></span>
<span id="cb13-3">state_gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> gpd.read_file(state_file, layer<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>state_layer_name)</span>
<span id="cb13-4"></span>
<span id="cb13-5">province_file <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Path(WSFR_DATA_ROOT, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'boundaries/lpr_000b16a_e/lpr_000b16a_e.shp'</span>)</span>
<span id="cb13-6">province_gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> gpd.read_file(province_file)</span></code></pre></div>
</div>
</section>
<section id="river-layer-vector" class="level3">
<h3 class="anchored" data-anchor-id="river-layer-vector">River layer (vector)</h3>
<p>For this layer we can use <a href="https://www.hydrosheds.org/products/hydrorivers">HydroRIVERS</a> data, a part of the <a href="https://www.hydrosheds.org/">HydroSHEDS</a> project. Both geodatabase and shapefile formats are downloadable from <a href="https://www.hydrosheds.org/products/hydrorivers">https://www.hydrosheds.org/products/hydrorivers</a>. There are different files for different regions of the world. The North American and Central America data as a zipped geodatabase is available from <a href="https://data.hydrosheds.org/file/HydroRIVERS/HydroRIVERS_v10_na.gdb.zip">here</a> and is ~72Mb in size (compressed). Unzip it after downloading. It’s a lot of rivers.</p>
<p>By default, GeoPandas uses <a href="https://fiona.readthedocs.io/en/stable/index.html">fiona</a> as the engine for file reading via the <code>read_file</code> function. <a href="https://geopandas.org/en/stable/docs/user_guide/io.html">According to their docs</a>, they will switch to <a href="https://pyogrio.readthedocs.io/en/latest/">pyogrio</a> as the default in GeoPandas 1.0 and recommend installing and switching now by using <code>engine='pyogrio'</code>.</p>
<pre><code>conda install -c conda-forge pyogrio</code></pre>
<div id="c0fda58b-f7cc-44be-896f-8197e698a2d6" class="cell" data-execution_count="13">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1">rivers_file <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'../data/hydrosheds/HydroRIVERS_v10_na.gdb/HydroRIVERS_v10_na.gdb'</span></span>
<span id="cb15-2">rivers_gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> gpd.read_file(rivers_file, engine<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pyogrio'</span>)</span>
<span id="cb15-3">rivers_gdf.info()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;class 'geopandas.geodataframe.GeoDataFrame'&gt;
RangeIndex: 986463 entries, 0 to 986462
Data columns (total 16 columns):
 #   Column        Non-Null Count   Dtype   
---  ------        --------------   -----   
 0   HYRIV_ID      986463 non-null  int32   
 1   NEXT_DOWN     986463 non-null  int32   
 2   MAIN_RIV      986463 non-null  int32   
 3   LENGTH_KM     986463 non-null  float32 
 4   DIST_DN_KM    986463 non-null  float32 
 5   DIST_UP_KM    986463 non-null  float32 
 6   CATCH_SKM     986463 non-null  float32 
 7   UPLAND_SKM    986463 non-null  float32 
 8   ENDORHEIC     986463 non-null  int16   
 9   DIS_AV_CMS    986463 non-null  float32 
 10  ORD_STRA      986463 non-null  int16   
 11  ORD_CLAS      986463 non-null  int16   
 12  ORD_FLOW      986463 non-null  int16   
 13  HYBAS_L12     986463 non-null  float64 
 14  Shape_Length  986463 non-null  float64 
 15  geometry      986463 non-null  geometry
dtypes: float32(6), float64(2), geometry(1), int16(4), int32(3)
memory usage: 64.0 MB</code></pre>
</div>
</div>
<p>Let’s filter by <code>ORD_FLOW</code> to reduce the number of features to plot.</p>
<div id="04ea3157-0bfd-466a-a733-1e854dbe4d34" class="cell" data-execution_count="14">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1">rivers_gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rivers_gdf.query(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ORD_FLOW &lt;= 6'</span>)</span>
<span id="cb17-2">rivers_gdf.info()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;class 'geopandas.geodataframe.GeoDataFrame'&gt;
Index: 299372 entries, 6 to 986458
Data columns (total 16 columns):
 #   Column        Non-Null Count   Dtype   
---  ------        --------------   -----   
 0   HYRIV_ID      299372 non-null  int32   
 1   NEXT_DOWN     299372 non-null  int32   
 2   MAIN_RIV      299372 non-null  int32   
 3   LENGTH_KM     299372 non-null  float32 
 4   DIST_DN_KM    299372 non-null  float32 
 5   DIST_UP_KM    299372 non-null  float32 
 6   CATCH_SKM     299372 non-null  float32 
 7   UPLAND_SKM    299372 non-null  float32 
 8   ENDORHEIC     299372 non-null  int16   
 9   DIS_AV_CMS    299372 non-null  float32 
 10  ORD_STRA      299372 non-null  int16   
 11  ORD_CLAS      299372 non-null  int16   
 12  ORD_FLOW      299372 non-null  int16   
 13  HYBAS_L12     299372 non-null  float64 
 14  Shape_Length  299372 non-null  float64 
 15  geometry      299372 non-null  geometry
dtypes: float32(6), float64(2), geometry(1), int16(4), int32(3)
memory usage: 21.7 MB</code></pre>
</div>
</div>
</section>
<section id="dem-layer-raster" class="level3">
<h3 class="anchored" data-anchor-id="dem-layer-raster">DEM layer (raster)</h3>
<p>Since this project is all about hydrology, we will use a “hydrologically conditioned DEM” available as part of the <a href="https://www.hydrosheds.org/products/hydrosheds">HydroSHEDS core layer</a>.</p>
<ul>
<li>go to <a href="https://www.hydrosheds.org/hydrosheds-core-downloads">https://www.hydrosheds.org/hydrosheds-core-downloads</a>,</li>
<li>select the Conditioned DEM tab near bottom of page,</li>
<li>download the <a href="https://data.hydrosheds.org/file/hydrosheds-v1-con/na_con_3s.zip">compressed file for North and Central America</a>; it’s ~2.7GB,</li>
<li>uncompress the downloaded file and you’ll get a folder containing documentation and the GeoTIFF file, <code>na_con_3s.tif</code>, with the DEM raster.</li>
</ul>
<p>Since this is raster data in a GeoTIFF file, we can use the <a href="https://corteva.github.io/rioxarray/stable/">rioxarray package</a> to read this data into an <a href="https://docs.xarray.dev/en/stable/">xarray</a> <code>DataArray</code>.</p>
<div id="84393ec6-7288-4356-8516-bb1e6acaaeac" class="cell" data-execution_count="15">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rioxarray</span>
<span id="cb19-2"></span>
<span id="cb19-3">dem_file <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Path(WSFR_DATA_ROOT, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'hydrosheds/na_con_3s/na_con_3s.tif'</span>)</span>
<span id="cb19-4">dem_xr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rioxarray.open_rasterio(dem_file, masked<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb19-5">dem_xr</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="15">
<div><svg style="position: absolute; width: 0; height: 0; overflow: hidden">
<defs>
<symbol id="icon-database" viewbox="0 0 32 32">
<path d="M16 0c-8.837 0-16 2.239-16 5v4c0 2.761 7.163 5 16 5s16-2.239 16-5v-4c0-2.761-7.163-5-16-5z"></path>
<path d="M16 17c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z"></path>
<path d="M16 26c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z"></path>
</symbol>
<symbol id="icon-file-text2" viewbox="0 0 32 32">
<path d="M28.681 7.159c-0.694-0.947-1.662-2.053-2.724-3.116s-2.169-2.030-3.116-2.724c-1.612-1.182-2.393-1.319-2.841-1.319h-15.5c-1.378 0-2.5 1.121-2.5 2.5v27c0 1.378 1.122 2.5 2.5 2.5h23c1.378 0 2.5-1.122 2.5-2.5v-19.5c0-0.448-0.137-1.23-1.319-2.841zM24.543 5.457c0.959 0.959 1.712 1.825 2.268 2.543h-4.811v-4.811c0.718 0.556 1.584 1.309 2.543 2.268zM28 29.5c0 0.271-0.229 0.5-0.5 0.5h-23c-0.271 0-0.5-0.229-0.5-0.5v-27c0-0.271 0.229-0.5 0.5-0.5 0 0 15.499-0 15.5 0v7c0 0.552 0.448 1 1 1h7v19.5z"></path>
<path d="M23 26h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
<path d="M23 22h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
<path d="M23 18h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
</symbol>
</defs>
</svg>
<style>/* CSS stylesheet for displaying xarray objects in jupyterlab.
 *
 */

:root {
  --xr-font-color0: var(--jp-content-font-color0, rgba(0, 0, 0, 1));
  --xr-font-color2: var(--jp-content-font-color2, rgba(0, 0, 0, 0.54));
  --xr-font-color3: var(--jp-content-font-color3, rgba(0, 0, 0, 0.38));
  --xr-border-color: var(--jp-border-color2, #e0e0e0);
  --xr-disabled-color: var(--jp-layout-color3, #bdbdbd);
  --xr-background-color: var(--jp-layout-color0, white);
  --xr-background-color-row-even: var(--jp-layout-color1, white);
  --xr-background-color-row-odd: var(--jp-layout-color2, #eeeeee);
}

html[theme=dark],
body[data-theme=dark],
body.vscode-dark {
  --xr-font-color0: rgba(255, 255, 255, 1);
  --xr-font-color2: rgba(255, 255, 255, 0.54);
  --xr-font-color3: rgba(255, 255, 255, 0.38);
  --xr-border-color: #1F1F1F;
  --xr-disabled-color: #515151;
  --xr-background-color: #111111;
  --xr-background-color-row-even: #111111;
  --xr-background-color-row-odd: #313131;
}

.xr-wrap {
  display: block !important;
  min-width: 300px;
  max-width: 700px;
}

.xr-text-repr-fallback {
  /* fallback to plain text repr when CSS is not injected (untrusted notebook) */
  display: none;
}

.xr-header {
  padding-top: 6px;
  padding-bottom: 6px;
  margin-bottom: 4px;
  border-bottom: solid 1px var(--xr-border-color);
}

.xr-header > div,
.xr-header > ul {
  display: inline;
  margin-top: 0;
  margin-bottom: 0;
}

.xr-obj-type,
.xr-array-name {
  margin-left: 2px;
  margin-right: 10px;
}

.xr-obj-type {
  color: var(--xr-font-color2);
}

.xr-sections {
  padding-left: 0 !important;
  display: grid;
  grid-template-columns: 150px auto auto 1fr 20px 20px;
}

.xr-section-item {
  display: contents;
}

.xr-section-item input {
  display: none;
}

.xr-section-item input + label {
  color: var(--xr-disabled-color);
}

.xr-section-item input:enabled + label {
  cursor: pointer;
  color: var(--xr-font-color2);
}

.xr-section-item input:enabled + label:hover {
  color: var(--xr-font-color0);
}

.xr-section-summary {
  grid-column: 1;
  color: var(--xr-font-color2);
  font-weight: 500;
}

.xr-section-summary > span {
  display: inline-block;
  padding-left: 0.5em;
}

.xr-section-summary-in:disabled + label {
  color: var(--xr-font-color2);
}

.xr-section-summary-in + label:before {
  display: inline-block;
  content: '►';
  font-size: 11px;
  width: 15px;
  text-align: center;
}

.xr-section-summary-in:disabled + label:before {
  color: var(--xr-disabled-color);
}

.xr-section-summary-in:checked + label:before {
  content: '▼';
}

.xr-section-summary-in:checked + label > span {
  display: none;
}

.xr-section-summary,
.xr-section-inline-details {
  padding-top: 4px;
  padding-bottom: 4px;
}

.xr-section-inline-details {
  grid-column: 2 / -1;
}

.xr-section-details {
  display: none;
  grid-column: 1 / -1;
  margin-bottom: 5px;
}

.xr-section-summary-in:checked ~ .xr-section-details {
  display: contents;
}

.xr-array-wrap {
  grid-column: 1 / -1;
  display: grid;
  grid-template-columns: 20px auto;
}

.xr-array-wrap > label {
  grid-column: 1;
  vertical-align: top;
}

.xr-preview {
  color: var(--xr-font-color3);
}

.xr-array-preview,
.xr-array-data {
  padding: 0 5px !important;
  grid-column: 2;
}

.xr-array-data,
.xr-array-in:checked ~ .xr-array-preview {
  display: none;
}

.xr-array-in:checked ~ .xr-array-data,
.xr-array-preview {
  display: inline-block;
}

.xr-dim-list {
  display: inline-block !important;
  list-style: none;
  padding: 0 !important;
  margin: 0;
}

.xr-dim-list li {
  display: inline-block;
  padding: 0;
  margin: 0;
}

.xr-dim-list:before {
  content: '(';
}

.xr-dim-list:after {
  content: ')';
}

.xr-dim-list li:not(:last-child):after {
  content: ',';
  padding-right: 5px;
}

.xr-has-index {
  font-weight: bold;
}

.xr-var-list,
.xr-var-item {
  display: contents;
}

.xr-var-item > div,
.xr-var-item label,
.xr-var-item > .xr-var-name span {
  background-color: var(--xr-background-color-row-even);
  margin-bottom: 0;
}

.xr-var-item > .xr-var-name:hover span {
  padding-right: 5px;
}

.xr-var-list > li:nth-child(odd) > div,
.xr-var-list > li:nth-child(odd) > label,
.xr-var-list > li:nth-child(odd) > .xr-var-name span {
  background-color: var(--xr-background-color-row-odd);
}

.xr-var-name {
  grid-column: 1;
}

.xr-var-dims {
  grid-column: 2;
}

.xr-var-dtype {
  grid-column: 3;
  text-align: right;
  color: var(--xr-font-color2);
}

.xr-var-preview {
  grid-column: 4;
}

.xr-index-preview {
  grid-column: 2 / 5;
  color: var(--xr-font-color2);
}

.xr-var-name,
.xr-var-dims,
.xr-var-dtype,
.xr-preview,
.xr-attrs dt {
  white-space: nowrap;
  overflow: hidden;
  text-overflow: ellipsis;
  padding-right: 10px;
}

.xr-var-name:hover,
.xr-var-dims:hover,
.xr-var-dtype:hover,
.xr-attrs dt:hover {
  overflow: visible;
  width: auto;
  z-index: 1;
}

.xr-var-attrs,
.xr-var-data,
.xr-index-data {
  display: none;
  background-color: var(--xr-background-color) !important;
  padding-bottom: 5px !important;
}

.xr-var-attrs-in:checked ~ .xr-var-attrs,
.xr-var-data-in:checked ~ .xr-var-data,
.xr-index-data-in:checked ~ .xr-index-data {
  display: block;
}

.xr-var-data > table {
  float: right;
}

.xr-var-name span,
.xr-var-data,
.xr-index-name div,
.xr-index-data,
.xr-attrs {
  padding-left: 25px !important;
}

.xr-attrs,
.xr-var-attrs,
.xr-var-data,
.xr-index-data {
  grid-column: 1 / -1;
}

dl.xr-attrs {
  padding: 0;
  margin: 0;
  display: grid;
  grid-template-columns: 125px auto;
}

.xr-attrs dt,
.xr-attrs dd {
  padding: 0;
  margin: 0;
  float: left;
  padding-right: 10px;
  width: auto;
}

.xr-attrs dt {
  font-weight: normal;
  grid-column: 1;
}

.xr-attrs dt:hover span {
  display: inline-block;
  background: var(--xr-background-color);
  padding-right: 10px;
}

.xr-attrs dd {
  grid-column: 2;
  white-space: pre-wrap;
  word-break: break-all;
}

.xr-icon-database,
.xr-icon-file-text2,
.xr-no-icon {
  display: inline-block;
  vertical-align: middle;
  width: 1em;
  height: 1.5em !important;
  stroke-width: 0;
  stroke: currentColor;
  fill: currentColor;
}
</style><pre class="xr-text-repr-fallback">&lt;xarray.DataArray (band: 1, y: 72000, x: 120000)&gt;
[8640000000 values with dtype=float32]
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 -150.0 -150.0 -150.0 -150.0 ... -50.0 -50.0 -50.0
  * y            (y) float64 60.0 60.0 60.0 60.0 ... 0.002083 0.00125 0.0004167
    spatial_ref  int64 0
Attributes:
    AREA_OR_POINT:       Area
    RepresentationType:  THEMATIC
    scale_factor:        1.0
    add_offset:          0.0
    long_name:           Band_1</pre><div class="xr-wrap" style="display:none"><div class="xr-header"><div class="xr-obj-type">xarray.DataArray</div><div class="xr-array-name"></div><ul class="xr-dim-list"><li><span class="xr-has-index">band</span>: 1</li><li><span class="xr-has-index">y</span>: 72000</li><li><span class="xr-has-index">x</span>: 120000</li></ul></div><ul class="xr-sections"><li class="xr-section-item"><div class="xr-array-wrap"><input id="section-66e1a2e2-50b9-4e10-b6d6-89c5867b88ca" class="xr-array-in" type="checkbox" checked=""><label for="section-66e1a2e2-50b9-4e10-b6d6-89c5867b88ca" title="Show/hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-array-preview xr-preview"><span>...</span></div><div class="xr-array-data"><pre>[8640000000 values with dtype=float32]</pre></div></div></li><li class="xr-section-item"><input id="section-76b87c52-ebc1-4d02-9d4e-20d9ccc6f45d" class="xr-section-summary-in" type="checkbox" checked=""><label for="section-76b87c52-ebc1-4d02-9d4e-20d9ccc6f45d" class="xr-section-summary">Coordinates: <span>(4)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><ul class="xr-var-list"><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">band</span></div><div class="xr-var-dims">(band)</div><div class="xr-var-dtype">int64</div><div class="xr-var-preview xr-preview">1</div><input id="attrs-c588f944-0e1f-4d32-89f0-070f10a3414b" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-c588f944-0e1f-4d32-89f0-070f10a3414b" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-319ff1a5-2197-4456-ab01-55dfc3b0113c" class="xr-var-data-in" type="checkbox"><label for="data-319ff1a5-2197-4456-ab01-55dfc3b0113c" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array([1])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">x</span></div><div class="xr-var-dims">(x)</div><div class="xr-var-dtype">float64</div><div class="xr-var-preview xr-preview">-150.0 -150.0 ... -50.0 -50.0</div><input id="attrs-10cfd3ce-5c49-4c24-9ec9-12d114d94e60" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-10cfd3ce-5c49-4c24-9ec9-12d114d94e60" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-1c3592a9-8793-43ec-b68b-81529fe9039b" class="xr-var-data-in" type="checkbox"><label for="data-1c3592a9-8793-43ec-b68b-81529fe9039b" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array([-149.999583, -149.99875 , -149.997917, ...,  -50.002083,  -50.00125 ,
        -50.000417])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">y</span></div><div class="xr-var-dims">(y)</div><div class="xr-var-dtype">float64</div><div class="xr-var-preview xr-preview">60.0 60.0 ... 0.00125 0.0004167</div><input id="attrs-40306b28-5e75-470a-8f99-357ed6f08c00" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-40306b28-5e75-470a-8f99-357ed6f08c00" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-90761d39-3010-4c1e-b03c-4088eb3d79ea" class="xr-var-data-in" type="checkbox"><label for="data-90761d39-3010-4c1e-b03c-4088eb3d79ea" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array([5.999958e+01, 5.999875e+01, 5.999792e+01, ..., 2.083333e-03,
       1.250000e-03, 4.166667e-04])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span>spatial_ref</span></div><div class="xr-var-dims">()</div><div class="xr-var-dtype">int64</div><div class="xr-var-preview xr-preview">0</div><input id="attrs-d6dbf0bb-0ff2-4585-b8ec-e69af4f85123" class="xr-var-attrs-in" type="checkbox"><label for="attrs-d6dbf0bb-0ff2-4585-b8ec-e69af4f85123" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-f536ce56-1ad8-4dc0-8f57-093303dd871e" class="xr-var-data-in" type="checkbox"><label for="data-f536ce56-1ad8-4dc0-8f57-093303dd871e" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>crs_wkt :</span></dt><dd>GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]</dd><dt><span>semi_major_axis :</span></dt><dd>6378137.0</dd><dt><span>semi_minor_axis :</span></dt><dd>6356752.314245179</dd><dt><span>inverse_flattening :</span></dt><dd>298.257223563</dd><dt><span>reference_ellipsoid_name :</span></dt><dd>WGS 84</dd><dt><span>longitude_of_prime_meridian :</span></dt><dd>0.0</dd><dt><span>prime_meridian_name :</span></dt><dd>Greenwich</dd><dt><span>geographic_crs_name :</span></dt><dd>WGS 84</dd><dt><span>horizontal_datum_name :</span></dt><dd>World Geodetic System 1984</dd><dt><span>grid_mapping_name :</span></dt><dd>latitude_longitude</dd><dt><span>spatial_ref :</span></dt><dd>GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]</dd><dt><span>GeoTransform :</span></dt><dd>-150.0 0.000833333333333333 0.0 59.99999999999999 0.0 -0.000833333333333333</dd></dl></div><div class="xr-var-data"><pre>array(0)</pre></div></li></ul></div></li><li class="xr-section-item"><input id="section-9d61ae70-1b6c-47cf-8580-3dd3a910453d" class="xr-section-summary-in" type="checkbox"><label for="section-9d61ae70-1b6c-47cf-8580-3dd3a910453d" class="xr-section-summary">Indexes: <span>(3)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><ul class="xr-var-list"><li class="xr-var-item"><div class="xr-index-name"><div>band</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-90610e3f-81c0-49f8-a775-0a4c9481417f" class="xr-index-data-in" type="checkbox"><label for="index-90610e3f-81c0-49f8-a775-0a4c9481417f" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Index([1], dtype='int64', name='band'))</pre></div></li><li class="xr-var-item"><div class="xr-index-name"><div>x</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-1d0898d7-c0b6-4cfc-8755-e2ff4b75b897" class="xr-index-data-in" type="checkbox"><label for="index-1d0898d7-c0b6-4cfc-8755-e2ff4b75b897" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Index([-149.99958333333333,          -149.99875, -149.99791666666667,
       -149.99708333333334,          -149.99625, -149.99541666666667,
       -149.99458333333334,          -149.99375, -149.99291666666667,
       -149.99208333333334,
       ...
       -50.007916666666716, -50.007083333333384,  -50.00625000000005,
        -50.00541666666672, -50.004583333333386,  -50.00375000000004,
        -50.00291666666671, -50.002083333333374,  -50.00125000000004,
        -50.00041666666671],
      dtype='float64', name='x', length=120000))</pre></div></li><li class="xr-var-item"><div class="xr-index-name"><div>y</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-26693a84-26bc-4e0c-805a-65ccd6b5863c" class="xr-index-data-in" type="checkbox"><label for="index-26693a84-26bc-4e0c-805a-65ccd6b5863c" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Index([    59.99958333333333,    59.998749999999994,     59.99791666666666,
           59.99708333333333,    59.996249999999996,     59.99541666666666,
          59.994583333333324,     59.99374999999999,     59.99291666666666,
          59.992083333333326,
       ...
        0.007916666666687888,  0.007083333333355313,  0.006250000000022737,
        0.005416666666690162,  0.004583333333350481, 0.0037500000000179057,
       0.0029166666666853303,  0.002083333333352755, 0.0012500000000201794,
        0.000416666666687604],
      dtype='float64', name='y', length=72000))</pre></div></li></ul></div></li><li class="xr-section-item"><input id="section-91d5a268-b77d-4faa-b7a4-8788bfffba52" class="xr-section-summary-in" type="checkbox" checked=""><label for="section-91d5a268-b77d-4faa-b7a4-8788bfffba52" class="xr-section-summary">Attributes: <span>(5)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><dl class="xr-attrs"><dt><span>AREA_OR_POINT :</span></dt><dd>Area</dd><dt><span>RepresentationType :</span></dt><dd>THEMATIC</dd><dt><span>scale_factor :</span></dt><dd>1.0</dd><dt><span>add_offset :</span></dt><dd>0.0</dd><dt><span>long_name :</span></dt><dd>Band_1</dd></dl></div></li></ul></div></div>
</div>
</div>
<div id="baf6e565-3313-48af-88d2-e2b3eb399406" class="cell" data-tags="[]" data-execution_count="16">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb20-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'Number of cells: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> dem_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Number of cells: 8,640,000,000</code></pre>
</div>
</div>
<p>Yikes! That is one big array. This is a DEM for all of North and Central America and we only need a portion. I used QGIS to get an approximate extent for the map we want to make. Then we can use rioxarray to clip the raster.</p>
<div id="8e2724ca-0047-4ef7-aab1-18f20756aef2" class="cell" data-execution_count="17">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb22-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extent is minx, miny, maxx, maxy</span></span>
<span id="cb22-2">extent_epsg_4326 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">126</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">35</span>, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">103</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">52</span>] </span>
<span id="cb22-3">minx_4326, miny_4326, maxx_4326, maxy_4326 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> extent_epsg_4326</span>
<span id="cb22-4"></span>
<span id="cb22-5">dem_xr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dem_xr.rio.clip_box(</span>
<span id="cb22-6">            minx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>minx_4326,</span>
<span id="cb22-7">            miny<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>miny_4326,</span>
<span id="cb22-8">            maxx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>maxx_4326,</span>
<span id="cb22-9">            maxy<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>maxy_4326,</span>
<span id="cb22-10">            crs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"EPSG:4326"</span>,</span>
<span id="cb22-11">        )</span>
<span id="cb22-12"></span>
<span id="cb22-13"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'bounds: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>bounds()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb22-14"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'shape: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb22-15"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'number of cells: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> dem_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>bounds: (-126.00000000000001, 35.0, -102.9991666666667, 52.00083333333333)
shape: (1, 20401, 27601)
number of cells: 563,088,001</code></pre>
</div>
</div>
<p>That’s still a huge array and I’m thinking this is going to be problematic. I’m far from clear on how different Python based plotting libraries handle large raster files. We’ll deal with this when we get to it. First, we have another issue to deal with.</p>
</section>
</section>
<section id="preparation-for-map-making" class="level2">
<h2 class="anchored" data-anchor-id="preparation-for-map-making">Preparation for map making</h2>
<p>In the previous section we read all of the data into appropriate data structures - the vector layers (sites, basins, rivers, and state/province boundaries) were stored in <code>GeoDataFrame</code> objects and the DEM raster layer was stored in an xarray <code>DataArray</code>.</p>
<p>But, before we can try to plot all of this data together on a map, we need to make sure that all of the data uses the same <em>projected coordinate reference system</em> so that everything shows up where it should on the map. We won’t have QGIS to magically reproject data as needed in order to create a map.</p>
<p>What is the coordinate reference system (CRS) for each of our data items? The <code>GeoDataFrame</code> objects have a <code>crs</code> property that can be checked. For the raster, we can use the <code>rio</code> accessor from rioxarray to access the <code>crs</code> property of the <code>DataArray</code>.</p>
<div id="4bade68c-cb1f-4bc5-8230-00501b134e84" class="cell" data-execution_count="18">
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb24-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Vector GeoDataFrames</span></span>
<span id="cb24-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'sites CRS: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>sites_gdf<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>crs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb24-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'basins CRS: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>basins_gdf<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>crs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb24-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'state CRS: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>state_gdf<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>crs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb24-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'province CRS: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>province_gdf<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>crs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb24-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'rivers CRS: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>rivers_gdf<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>crs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb24-7"></span>
<span id="cb24-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Raster DataArray</span></span>
<span id="cb24-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'DEM CRS: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>crs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>sites CRS: EPSG:4326
basins CRS: EPSG:4326
state CRS: EPSG:4269
province CRS: PROJCS["PCS_Lambert_Conformal_Conic",GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["latitude_of_origin",63.390675],PARAMETER["central_meridian",-91.8666666666667],PARAMETER["standard_parallel_1",49],PARAMETER["standard_parallel_2",77],PARAMETER["false_easting",6200000],PARAMETER["false_northing",3000000],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
rivers CRS: EPSG:4326
DEM CRS: EPSG:4326</code></pre>
</div>
</div>
<p>A few of the data items use the <a href="https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset">EPSG:4326</a> CRS, the state data uses <a href="https://spatialreference.org/ref/epsg/4269/">EPSG:4269</a> and the province data uses the <a href="https://en.wikipedia.org/wiki/Lambert_conformal_conic_projection">Lambert Conformal Conic projection</a>. Now what?</p>
<section id="a-bit-on-coordinate-reference-systems-and-projections" class="level3">
<h3 class="anchored" data-anchor-id="a-bit-on-coordinate-reference-systems-and-projections">A bit on coordinate reference systems and projections</h3>
<p>This <a href="http://downloads2.esri.com/support/documentation/ao_/710Understanding_Map_Projections.pdf">booklet written by ESRI</a> is a nice primer on coordinate reference systems and projections. <a href="https://pygis.io/docs/d_crs_what_is_it.html">Chapter 2 of the online PyGIS book</a> is another source of introductory information on these topics.</p>
<p>EPSG:4326, a very commonly used CRS, is a <em>geographic coordinate system</em> and not a <em>projected coordinate system</em>.</p>
<p>A good explanation of the difference between these two things is <a href="https://www.esri.com/arcgis-blog/products/arcgis-pro/mapping/gcs_vs_pcs/">provided by ESRI</a>.</p>
<blockquote class="blockquote">
<ul>
<li>A GCS defines where the data is located on the earth’s surface.</li>
<li>A PCS tells the data how to draw on a flat surface, like on a paper map or a computer screen.</li>
</ul>
</blockquote>
<p>and</p>
<blockquote class="blockquote">
<p>A projected coordinate system (PCS) is a GCS that has been flattened using a map projection.</p>
</blockquote>
<p>The EPSG:4326 CRS uses longitude and latitude to locate points on an underlying model of the earth (<a href="https://en.wikipedia.org/wiki/World_Geodetic_System">WGS84</a>). But, this is a 3D model and any attempt to represent it in 2D (via one of countless possible <a href="https://en.wikipedia.org/wiki/Map_projection">projections</a>) will lead to different degrees of distortion in area, distance and angle conformance. Before plotting geographic data on some 2D surface (e.g.&nbsp;screen or paper) we need to pick an appropriate projected coordinate reference system. BTW, sometimes it seems like EPSG:4326 and WGS84 are used synonomously. The folks at the <a href="https://mapscaping.com/podcasts/">MapScaping podcast</a> have done a nice blog post on the <a href="https://mapscaping.com/difference-between-wgs84-and-epsg4326/">relationship between EPSG:4326 and WGS84</a>.</p>
<p>We could simply plot the EPSG:4326 data on an xy-grid with Matplotlib. Implicitly we have used a <em>pseudo Plate Carree</em> or <a href="https://en.wikipedia.org/wiki/Equirectangular_projection">equirectangular projection</a> (“pseudo” because it’s measured in degrees instead of meters). This might be acceptable for very small areas or if you’re just interested in a rough map and don’t need to do distance or area calculations. This “non-projection” is referenced in the <a href="https://proj.org/en/9.3/operations/conversions/latlon.html">PROJ documentation</a> as a projection that leaves the underlying CRS unchanged - it’s just a pass through.</p>
<p>But, didn’t we just create a map in QGIS in which all layers used the EPSG:4326 CRS? When QGIS displays the map on the screen, it must do some sort of projection. How does QGIS handle projections and CRS for individual layers and the map display? According to <a href="https://docs.qgis.org/3.28/en/docs/user_manual/working_with_projections/working_with_projections.html">this docs page</a> each layer can have its own CRS and the overall project has a CRS. The project CRS controls how the map is displayed and QGIS does “on the fly” reprojections of individual layers as needed. The individual layer CRS’s define the locations (XY) in each layer - often EPSG:4326 which is NOT a projected CRS. By default, the project CRS is also EPSG:4326. This <a href="https://gis.stackexchange.com/questions/73576/what-is-projection-of-wgs84-in-gis-software">GIS StackExchange post</a> as <a href="https://gis.stackexchange.com/questions/387517/what-is-meant-with-epsg4326-projection">well as this one</a> were also quite helpful in answering this question.</p>
<p>The interplay (both generally and within a GIS) of datums, coordinate reference systems and projections is a complex topic. Here are a few general resources that I have found to be helpful.</p>
<ul>
<li><a href="https://www.paulbolstad.net/gisbook.html">GIS Fundamentals: A First Text on Geographic Information Systems</a> - textbook by Paul Bolstad</li>
<li><a href="https://proj.org/en/9.3/index.html">PROJ</a> - THE software tool for transformations between CRS</li>
<li><a href="http://downloads2.esri.com/support/documentation/ao_/710Understanding_Map_Projections.pdf">Understanding Map Projections</a> - booklet produced by ESRI</li>
<li><a href="https://www.sharpgis.net/post/Spatial-references2c-coordinate-systems2c-projections2c-datums2c-ellipsoids-e28093-confusing">Spatial references, coordinate systems, projections, datums, ellipsoids – confusing?</a> - blog post that tries to cut through the confusion</li>
<li><a href="https://cales.arizona.edu/art/kb/reproj/huber.html">Reprojecting geographic features</a> - web page by Bill Huber</li>
<li><a href="https://pygis.io/docs/d_crs_what_is_it.html">What is a CRS?</a> - a chapter in the online PyGIS book</li>
<li><a href="https://docs.qgis.org/3.28/en/docs/training_manual/vector_analysis/reproject_transform.html">Reprojecting and transforming data</a> - part of the QGIS Training Manual in the chapter on vector analysis</li>
<li><a href="https://gis.stackexchange.com/questions/48949/epsg-3857-or-4326-for-web-mapping?noredirect=1&amp;lq=1">EPSG 3857 or 4326 for Web Mapping</a> - GIS StackExchange question discusses GCS vs PCS for two common EPSG codes</li>
<li><a href="https://gis.stackexchange.com/questions/21352/projecting-epsg4326-data-in-2d-map?rq=1">Projecting EPSG:4326 data in 2D map</a> - GIS StackExchange question that gets into some interesting projection related subtleties and anecdotes.</li>
<li><a href="https://rsbivand.github.io/ECS530_h19/ECS530_III.html#coordinate_reference_systems:_background">Coordinate reference systems: background</a> - very detailed history of PROJ, GDAL and their use in R and other systems.</li>
</ul>
</section>
<section id="which-projection-to-use" class="level3">
<h3 class="anchored" data-anchor-id="which-projection-to-use">Which projection to use?</h3>
<p>For the continential US, the <a href="https://www.usgs.gov/publications/map-projections-used-us-geological-survey">USGS uses a number of different projections</a>. We will use the <a href="https://en.wikipedia.org/wiki/Albers_projection">Albers Equal-Area Conic projection</a> (AEA) - it’s the one used by the USGS for state and regional maps. The <a href="https://proj.org/en/9.3/operations/projections/aea.html">PROJ page for AEA</a> includes the official <a href="https://proj.org/en/9.3/operations/projections/aea.html#id1">proj-string</a>.</p>
<p>It’s a bit more complicated as there are multiple versions of AEA developed by the USGS (e.g.&nbsp;CONUS and Alaska have <a href="https://github.com/pauldzy/USGS_Albers_Equal_Area_Projections">separate versions</a>). Instead of an EPSG code for AEA, there appears to be an ESRI code, <a href="https://epsg.io/102008">ESRI:102008</a> for the North American AEA. For other versions, there are different EPSG codes:</p>
<ul>
<li>CONUS (USGS version) - <a href="https://epsg.io/5070">EPSG:5070</a></li>
<li>Alaska - <a href="https://epsg.io/3338">EPSG:3338</a></li>
<li>Hawaii - no registered EPSG code</li>
</ul>
<p>Since our map is focused on the western United States and a bit of Canada, we’ll use ESRI:102008.</p>
<p>What exactly are the differences between EPSG:5070 and the AEA projection ESRI:102008? To answer this question we need to look at the detailed specifications of each projection. We can do this by examining what are known as <a href="https://en.wikipedia.org/wiki/Well-known_text_representation_of_coordinate_reference_systems">well known text (WKT) representations of CRSs</a>. How do we get the WKT representations of a giving CRS? Turns out that the <code>crs</code> property in a <code>GeoDataFrame</code> isn’t just a string, but is a <code>pyproj.CRS</code> object and this object knows about WKTs.</p>
<p>The <a href="https://pyproj4.github.io/pyproj/stable/">pyproj package</a> is a Python interface to <a href="https://proj.org/en/9.3/index.html">PROJ</a>. One of the things we can do with a <code>pyproj.CRS</code> object is get a WKT representation of it using the <code>to_wkt()</code> method. The WKT representation contains a wealth of detailed data that define a projection. Our goal here is just to see if we can use the WKTs to figure out the difference between the USGS version of AEA and the ESRI:102008 version.</p>
<div id="49472586-3990-47ed-9232-0f58a6068c05" class="cell" data-execution_count="19">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb26-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> pyproj <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> CRS</span></code></pre></div>
</div>
<p>Here’s the North American Albers Equal Area.</p>
<div id="8b942412-684c-4b66-9279-ead7cfb297d6" class="cell" data-tags="[]" data-execution_count="20">
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb27-1">crs_102008 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> CRS.from_string(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ESRI:102008"</span>)</span>
<span id="cb27-2">wkt_102008 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crs_102008.to_wkt()</span>
<span id="cb27-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(crs_102008.to_wkt(pretty<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)) </span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>PROJCRS["North_America_Albers_Equal_Area_Conic",
    BASEGEOGCRS["NAD83",
        DATUM["North American Datum 1983",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4269]],
    CONVERSION["North_America_Albers_Equal_Area_Conic",
        METHOD["Albers Equal Area",
            ID["EPSG",9822]],
        PARAMETER["Latitude of false origin",40,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-96,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",20,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",60,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Not known."],
        AREA["North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. United States (USA) - Alabama; Alaska (mainland); Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming."],
        BBOX[23.81,-172.54,86.46,-47.74]],
    ID["ESRI",102008]]</code></pre>
</div>
</div>
<p>… and here’s the EPSG:5070 version:</p>
<div id="f4692368-bdbb-4bbf-b63a-fbe91bc408dc" class="cell" data-tags="[]" data-execution_count="21">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb29-1">crs_5070 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> CRS.from_string(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'EPSG:5070'</span>)</span>
<span id="cb29-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(crs_5070.to_wkt(pretty<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>PROJCRS["NAD83 / Conus Albers",
    BASEGEOGCRS["NAD83",
        DATUM["North American Datum 1983",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4269]],
    CONVERSION["Conus Albers",
        METHOD["Albers Equal Area",
            ID["EPSG",9822]],
        PARAMETER["Latitude of false origin",23,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-96,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",29.5,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",45.5,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Data analysis and small scale data presentation for contiguous lower 48 states."],
        AREA["United States (USA) - CONUS onshore - Alabama; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming."],
        BBOX[24.41,-124.79,49.38,-66.91]],
    ID["EPSG",5070]]</code></pre>
</div>
</div>
<p>To highlight the differences between these two strings, we can use Python’s <a href="https://docs.python.org/3/library/difflib.html">difflib</a> package. It’s kind of like the Linux <code>diff</code> command but in Python.</p>
<div id="5f0d7a35-df0d-476f-870b-1a4e9efb2902" class="cell" data-execution_count="22">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb31-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> difflib <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> ndiff</span></code></pre></div>
</div>
<div id="4a7d447f-ad7c-4efd-ba56-9796576130b4" class="cell" data-tags="[]" data-execution_count="23">
<div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb32-1">diff <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ndiff(crs_102008.to_wkt(pretty<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>).splitlines(), crs_5070.to_wkt(pretty<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>).splitlines())</span>
<span id="cb32-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> line <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> diff:</span>
<span id="cb32-3">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(line)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>- PROJCRS["North_America_Albers_Equal_Area_Conic",
+ PROJCRS["NAD83 / Conus Albers",
      BASEGEOGCRS["NAD83",
          DATUM["North American Datum 1983",
              ELLIPSOID["GRS 1980",6378137,298.257222101,
                  LENGTHUNIT["metre",1]]],
          PRIMEM["Greenwich",0,
              ANGLEUNIT["degree",0.0174532925199433]],
          ID["EPSG",4269]],
-     CONVERSION["North_America_Albers_Equal_Area_Conic",
+     CONVERSION["Conus Albers",
          METHOD["Albers Equal Area",
              ID["EPSG",9822]],
-         PARAMETER["Latitude of false origin",40,
?                                              ^^

+         PARAMETER["Latitude of false origin",23,
?                                              ^^

              ANGLEUNIT["degree",0.0174532925199433],
              ID["EPSG",8821]],
          PARAMETER["Longitude of false origin",-96,
              ANGLEUNIT["degree",0.0174532925199433],
              ID["EPSG",8822]],
-         PARAMETER["Latitude of 1st standard parallel",20,
?                                                        ^

+         PARAMETER["Latitude of 1st standard parallel",29.5,
?                                                        ^^^

              ANGLEUNIT["degree",0.0174532925199433],
              ID["EPSG",8823]],
-         PARAMETER["Latitude of 2nd standard parallel",60,
?                                                       ^^

+         PARAMETER["Latitude of 2nd standard parallel",45.5,
?                                                       ^^^^

              ANGLEUNIT["degree",0.0174532925199433],
              ID["EPSG",8824]],
          PARAMETER["Easting at false origin",0,
              LENGTHUNIT["metre",1],
              ID["EPSG",8826]],
          PARAMETER["Northing at false origin",0,
              LENGTHUNIT["metre",1],
              ID["EPSG",8827]]],
      CS[Cartesian,2],
-         AXIS["(E)",east,
?                ^

+         AXIS["easting (X)",east,
?               ++++++++ ^

              ORDER[1],
              LENGTHUNIT["metre",1]],
-         AXIS["(N)",north,
?                ^

+         AXIS["northing (Y)",north,
?               +++++++++ ^

              ORDER[2],
              LENGTHUNIT["metre",1]],
      USAGE[
-         SCOPE["Not known."],
+         SCOPE["Data analysis and small scale data presentation for contiguous lower 48 states."],
-         AREA["North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. United States (USA) - Alabama; Alaska (mainland); Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming."],
?               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                   ^^^^^^^^^^^^^^^^^   ^^^^^^^^^^^^^^^^

+         AREA["United States (USA) - CONUS onshore - Alabama; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming."],
?               ^^^^^^  ^^^^^         ++++++++++++++++          ^^^^^^^   ^^^^^^^

-         BBOX[23.81,-172.54,86.46,-47.74]],
+         BBOX[24.41,-124.79,49.38,-66.91]],
-     ID["ESRI",102008]]
?           ^^  ^ ^ --

+     ID["EPSG",5070]]
?          + ^  ^ ^
</code></pre>
</div>
</div>
<p>So, the significant differences are:</p>
<ul>
<li>different origins of latitude and longitude</li>
<li>different standard parallels</li>
<li>different areas of applicability (see <code>SCOPE</code> and <code>BBOX</code>)</li>
</ul>
<p>Notice that the WKT for EPSG:4326 contains no conversion related parameters as it is <strong>NOT</strong> a projected coordinated reference system.</p>
<div id="6dd64b8e-dace-4d5c-9ca0-2a0d195b1740" class="cell" data-tags="[]" data-execution_count="24">
<div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb34-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(sites_gdf.crs.to_wkt(pretty<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]</code></pre>
</div>
</div>
<p>Ok, we are going to use ESRI:102008 as our common projected CRS for all of our data items.</p>
<p>Some additional resources on choosing projections include:</p>
<ul>
<li><a href="https://learn.arcgis.com/en/projects/choose-the-right-projection/">Choose the right projection (ESRI)</a></li>
<li><a href="https://gis.stackexchange.com/questions/141580/which-projection-is-best-for-mapping-the-contiguous-united-states">Which projection for CONUS? (GIS StackExchange)</a></li>
<li><a href="https://gis.stackexchange.com/questions/48949/epsg-3857-or-4326-for-web-mapping">epsg:3857 or epsg:4326 for web mapping? (GIS StackExchange)</a></li>
</ul>
</section>
<section id="reprojecting-to-esri102008" class="level3">
<h3 class="anchored" data-anchor-id="reprojecting-to-esri102008">Reprojecting to ESRI:102008</h3>
<p>Let’s try to change the CRS for all of our data items to the Albers Equal-Area (USGS) CRS. For the <code>GeoDataFrame</code> objects, the <code>to_crs</code> method accepts EPSG codes.</p>
<div id="423e4dd6-daae-4f1e-a784-40c55b3771ee" class="cell" data-execution_count="25">
<div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb36-1">sites_102008_gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sites_gdf.to_crs(crs_102008)</span>
<span id="cb36-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'sites CRS: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>sites_102008_gdf<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>crs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb36-3">basins_102008_gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> basins_gdf.to_crs(crs_102008)</span>
<span id="cb36-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'basins CRS: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>basins_102008_gdf<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>crs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb36-5">state_102008_gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> state_gdf.to_crs(crs_102008)</span>
<span id="cb36-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'state CRS: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>state_102008_gdf<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>crs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb36-7">province_102008_gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> province_gdf.to_crs(crs_102008)</span>
<span id="cb36-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'province CRS: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>province_102008_gdf<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>crs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb36-9">rivers_102008_gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rivers_gdf.to_crs(crs_102008)</span>
<span id="cb36-10"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'rivers CRS: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>rivers_102008_gdf<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>crs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>sites CRS: ESRI:102008
basins CRS: ESRI:102008
state CRS: ESRI:102008
province CRS: ESRI:102008
rivers CRS: ESRI:102008</code></pre>
</div>
</div>
<p>Now we need to do the same for the DEM.</p>
<p>To do this, looks like we need <code>rio.reproject</code>. Reprojecting rasters can be computationally intensive, but we need to do it if we want our data to align properly on our map. We clipped the DEM raster earlier and this will make the reprojection easier. A few helpful links on doing reprojection with rioxarray include:</p>
<ul>
<li><a href="https://rasterio.readthedocs.io/en/stable/topics/reproject.html">Reprojection section in rasterio docs</a></li>
<li><a href="https://corteva.github.io/rioxarray/html/rioxarray.html#rioxarray.raster_array.RasterArray.reproject">API for rio.reproject</a></li>
<li><a href="https://gis.stackexchange.com/questions/278274/reprojecting-a-large-raster-using-python-and-rasterio">Reprojecting a large raster using Python and rasterio</a></li>
</ul>
<p>The <code>rio.reproject()</code> function allows you to both resample and reproject a raster with a single function call. It’s just a wrapper for <a href="https://rasterio.readthedocs.io/en/stable/api/rasterio.warp.html#rasterio.warp.reproject()">rasterio.wrap.reproject</a>.</p>
<p>I followed the lead of this blog post - <a href="https://spatial-dev.guru/2022/09/24/upsample-and-downsample-raster-in-python-using-rioxarray/">Upsample and Downsample raster in python using rioxarray</a>.</p>
<div id="a8643a1e-8d31-476d-93d9-8f4a6e0e6454" class="cell" data-execution_count="26">
<div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb38-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rasterio</span></code></pre></div>
</div>
<div id="74ad7736-be5f-4968-af4a-ec39d60ae17e" class="cell" data-execution_count="27">
<div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb39-1">downscale_factor <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span></span>
<span id="cb39-2"> </span>
<span id="cb39-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate new height and width using downscale_factor</span></span>
<span id="cb39-4">new_width <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dem_xr.rio.width <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> downscale_factor</span>
<span id="cb39-5">new_height <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dem_xr.rio.height <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> downscale_factor</span>
<span id="cb39-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(new_width), <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(new_height))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>2760 2040</code></pre>
</div>
</div>
<p>We can resample and reproject from EPSG:4326 to ESRI:102008. I would assume that the resampling would take place first followed by the projection.</p>
<p>I’m also not sure about what makes the most sense for the no data value. Our original data is all integer and the no data value is 32767 (EPSG:4326). For ESRI:102008, the no data value is -9999. However, after resampling, the array values could be floats depending on how the resampling is done.</p>
<p>The <code>rasterio.warp.reproject()</code> function will determine the <code>src_nodata</code> value from the <code>dem_xr</code> rioaxarray and these pixel values won’t get used during resampling.</p>
<p>A few helpful resources on no data values in raster files are:</p>
<ul>
<li><a href="https://corteva.github.io/rioxarray/stable/getting_started/nodata_management.html">Nodata management from rioxarray docs</a></li>
<li><a href="https://gis.stackexchange.com/questions/367342/downsides-to-saving-rasters-with-nan-as-no-data-values">Downsides to saving rasters with nan as no data values</a></li>
</ul>
<p>Now we are ready to resample and reproject by passing in our target projection, a resampling scheme, and the new shape of our target raster.</p>
<div id="02a56082-d76e-4b83-86d3-7e6239f8624e" class="cell" data-execution_count="28">
<div class="sourceCode cell-code" id="cb41" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb41-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Resample and reproject</span></span>
<span id="cb41-2">dem_102008_xr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dem_xr.rio.reproject(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ESRI:102008'</span>, resampling<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>rasterio.enums.Resampling.bilinear,</span>
<span id="cb41-3">                                     shape<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(new_height), <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(new_width))) </span>
<span id="cb41-4">dem_102008_xr</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="28">
<div><svg style="position: absolute; width: 0; height: 0; overflow: hidden">
<defs>
<symbol id="icon-database" viewbox="0 0 32 32">
<path d="M16 0c-8.837 0-16 2.239-16 5v4c0 2.761 7.163 5 16 5s16-2.239 16-5v-4c0-2.761-7.163-5-16-5z"></path>
<path d="M16 17c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z"></path>
<path d="M16 26c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z"></path>
</symbol>
<symbol id="icon-file-text2" viewbox="0 0 32 32">
<path d="M28.681 7.159c-0.694-0.947-1.662-2.053-2.724-3.116s-2.169-2.030-3.116-2.724c-1.612-1.182-2.393-1.319-2.841-1.319h-15.5c-1.378 0-2.5 1.121-2.5 2.5v27c0 1.378 1.122 2.5 2.5 2.5h23c1.378 0 2.5-1.122 2.5-2.5v-19.5c0-0.448-0.137-1.23-1.319-2.841zM24.543 5.457c0.959 0.959 1.712 1.825 2.268 2.543h-4.811v-4.811c0.718 0.556 1.584 1.309 2.543 2.268zM28 29.5c0 0.271-0.229 0.5-0.5 0.5h-23c-0.271 0-0.5-0.229-0.5-0.5v-27c0-0.271 0.229-0.5 0.5-0.5 0 0 15.499-0 15.5 0v7c0 0.552 0.448 1 1 1h7v19.5z"></path>
<path d="M23 26h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
<path d="M23 22h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
<path d="M23 18h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
</symbol>
</defs>
</svg>
<style>/* CSS stylesheet for displaying xarray objects in jupyterlab.
 *
 */

:root {
  --xr-font-color0: var(--jp-content-font-color0, rgba(0, 0, 0, 1));
  --xr-font-color2: var(--jp-content-font-color2, rgba(0, 0, 0, 0.54));
  --xr-font-color3: var(--jp-content-font-color3, rgba(0, 0, 0, 0.38));
  --xr-border-color: var(--jp-border-color2, #e0e0e0);
  --xr-disabled-color: var(--jp-layout-color3, #bdbdbd);
  --xr-background-color: var(--jp-layout-color0, white);
  --xr-background-color-row-even: var(--jp-layout-color1, white);
  --xr-background-color-row-odd: var(--jp-layout-color2, #eeeeee);
}

html[theme=dark],
body[data-theme=dark],
body.vscode-dark {
  --xr-font-color0: rgba(255, 255, 255, 1);
  --xr-font-color2: rgba(255, 255, 255, 0.54);
  --xr-font-color3: rgba(255, 255, 255, 0.38);
  --xr-border-color: #1F1F1F;
  --xr-disabled-color: #515151;
  --xr-background-color: #111111;
  --xr-background-color-row-even: #111111;
  --xr-background-color-row-odd: #313131;
}

.xr-wrap {
  display: block !important;
  min-width: 300px;
  max-width: 700px;
}

.xr-text-repr-fallback {
  /* fallback to plain text repr when CSS is not injected (untrusted notebook) */
  display: none;
}

.xr-header {
  padding-top: 6px;
  padding-bottom: 6px;
  margin-bottom: 4px;
  border-bottom: solid 1px var(--xr-border-color);
}

.xr-header > div,
.xr-header > ul {
  display: inline;
  margin-top: 0;
  margin-bottom: 0;
}

.xr-obj-type,
.xr-array-name {
  margin-left: 2px;
  margin-right: 10px;
}

.xr-obj-type {
  color: var(--xr-font-color2);
}

.xr-sections {
  padding-left: 0 !important;
  display: grid;
  grid-template-columns: 150px auto auto 1fr 20px 20px;
}

.xr-section-item {
  display: contents;
}

.xr-section-item input {
  display: none;
}

.xr-section-item input + label {
  color: var(--xr-disabled-color);
}

.xr-section-item input:enabled + label {
  cursor: pointer;
  color: var(--xr-font-color2);
}

.xr-section-item input:enabled + label:hover {
  color: var(--xr-font-color0);
}

.xr-section-summary {
  grid-column: 1;
  color: var(--xr-font-color2);
  font-weight: 500;
}

.xr-section-summary > span {
  display: inline-block;
  padding-left: 0.5em;
}

.xr-section-summary-in:disabled + label {
  color: var(--xr-font-color2);
}

.xr-section-summary-in + label:before {
  display: inline-block;
  content: '►';
  font-size: 11px;
  width: 15px;
  text-align: center;
}

.xr-section-summary-in:disabled + label:before {
  color: var(--xr-disabled-color);
}

.xr-section-summary-in:checked + label:before {
  content: '▼';
}

.xr-section-summary-in:checked + label > span {
  display: none;
}

.xr-section-summary,
.xr-section-inline-details {
  padding-top: 4px;
  padding-bottom: 4px;
}

.xr-section-inline-details {
  grid-column: 2 / -1;
}

.xr-section-details {
  display: none;
  grid-column: 1 / -1;
  margin-bottom: 5px;
}

.xr-section-summary-in:checked ~ .xr-section-details {
  display: contents;
}

.xr-array-wrap {
  grid-column: 1 / -1;
  display: grid;
  grid-template-columns: 20px auto;
}

.xr-array-wrap > label {
  grid-column: 1;
  vertical-align: top;
}

.xr-preview {
  color: var(--xr-font-color3);
}

.xr-array-preview,
.xr-array-data {
  padding: 0 5px !important;
  grid-column: 2;
}

.xr-array-data,
.xr-array-in:checked ~ .xr-array-preview {
  display: none;
}

.xr-array-in:checked ~ .xr-array-data,
.xr-array-preview {
  display: inline-block;
}

.xr-dim-list {
  display: inline-block !important;
  list-style: none;
  padding: 0 !important;
  margin: 0;
}

.xr-dim-list li {
  display: inline-block;
  padding: 0;
  margin: 0;
}

.xr-dim-list:before {
  content: '(';
}

.xr-dim-list:after {
  content: ')';
}

.xr-dim-list li:not(:last-child):after {
  content: ',';
  padding-right: 5px;
}

.xr-has-index {
  font-weight: bold;
}

.xr-var-list,
.xr-var-item {
  display: contents;
}

.xr-var-item > div,
.xr-var-item label,
.xr-var-item > .xr-var-name span {
  background-color: var(--xr-background-color-row-even);
  margin-bottom: 0;
}

.xr-var-item > .xr-var-name:hover span {
  padding-right: 5px;
}

.xr-var-list > li:nth-child(odd) > div,
.xr-var-list > li:nth-child(odd) > label,
.xr-var-list > li:nth-child(odd) > .xr-var-name span {
  background-color: var(--xr-background-color-row-odd);
}

.xr-var-name {
  grid-column: 1;
}

.xr-var-dims {
  grid-column: 2;
}

.xr-var-dtype {
  grid-column: 3;
  text-align: right;
  color: var(--xr-font-color2);
}

.xr-var-preview {
  grid-column: 4;
}

.xr-index-preview {
  grid-column: 2 / 5;
  color: var(--xr-font-color2);
}

.xr-var-name,
.xr-var-dims,
.xr-var-dtype,
.xr-preview,
.xr-attrs dt {
  white-space: nowrap;
  overflow: hidden;
  text-overflow: ellipsis;
  padding-right: 10px;
}

.xr-var-name:hover,
.xr-var-dims:hover,
.xr-var-dtype:hover,
.xr-attrs dt:hover {
  overflow: visible;
  width: auto;
  z-index: 1;
}

.xr-var-attrs,
.xr-var-data,
.xr-index-data {
  display: none;
  background-color: var(--xr-background-color) !important;
  padding-bottom: 5px !important;
}

.xr-var-attrs-in:checked ~ .xr-var-attrs,
.xr-var-data-in:checked ~ .xr-var-data,
.xr-index-data-in:checked ~ .xr-index-data {
  display: block;
}

.xr-var-data > table {
  float: right;
}

.xr-var-name span,
.xr-var-data,
.xr-index-name div,
.xr-index-data,
.xr-attrs {
  padding-left: 25px !important;
}

.xr-attrs,
.xr-var-attrs,
.xr-var-data,
.xr-index-data {
  grid-column: 1 / -1;
}

dl.xr-attrs {
  padding: 0;
  margin: 0;
  display: grid;
  grid-template-columns: 125px auto;
}

.xr-attrs dt,
.xr-attrs dd {
  padding: 0;
  margin: 0;
  float: left;
  padding-right: 10px;
  width: auto;
}

.xr-attrs dt {
  font-weight: normal;
  grid-column: 1;
}

.xr-attrs dt:hover span {
  display: inline-block;
  background: var(--xr-background-color);
  padding-right: 10px;
}

.xr-attrs dd {
  grid-column: 2;
  white-space: pre-wrap;
  word-break: break-all;
}

.xr-icon-database,
.xr-icon-file-text2,
.xr-no-icon {
  display: inline-block;
  vertical-align: middle;
  width: 1em;
  height: 1.5em !important;
  stroke-width: 0;
  stroke: currentColor;
  fill: currentColor;
}
</style><pre class="xr-text-repr-fallback">&lt;xarray.DataArray (band: 1, y: 2040, x: 2760)&gt;
array([[[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)
Coordinates:
  * x            (x) float64 -2.554e+06 -2.553e+06 ... -4.589e+05 -4.581e+05
  * y            (y) float64 1.722e+06 1.721e+06 ... -5.636e+05 -5.647e+05
  * band         (band) int64 1
    spatial_ref  int64 0
Attributes:
    AREA_OR_POINT:       Area
    RepresentationType:  THEMATIC
    scale_factor:        1.0
    add_offset:          0.0
    long_name:           Band_1</pre><div class="xr-wrap" style="display:none"><div class="xr-header"><div class="xr-obj-type">xarray.DataArray</div><div class="xr-array-name"></div><ul class="xr-dim-list"><li><span class="xr-has-index">band</span>: 1</li><li><span class="xr-has-index">y</span>: 2040</li><li><span class="xr-has-index">x</span>: 2760</li></ul></div><ul class="xr-sections"><li class="xr-section-item"><div class="xr-array-wrap"><input id="section-dff84547-8c18-4bd4-b5a7-aa214b4170d1" class="xr-array-in" type="checkbox" checked=""><label for="section-dff84547-8c18-4bd4-b5a7-aa214b4170d1" title="Show/hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-array-preview xr-preview"><span>nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan nan</span></div><div class="xr-array-data"><pre>array([[[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)</pre></div></div></li><li class="xr-section-item"><input id="section-679d7bf2-60df-4bed-a875-f9e5899110ed" class="xr-section-summary-in" type="checkbox" checked=""><label for="section-679d7bf2-60df-4bed-a875-f9e5899110ed" class="xr-section-summary">Coordinates: <span>(4)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><ul class="xr-var-list"><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">x</span></div><div class="xr-var-dims">(x)</div><div class="xr-var-dtype">float64</div><div class="xr-var-preview xr-preview">-2.554e+06 ... -4.581e+05</div><input id="attrs-f3007e7d-6efa-4139-8b5f-9dbfb4150eb9" class="xr-var-attrs-in" type="checkbox"><label for="attrs-f3007e7d-6efa-4139-8b5f-9dbfb4150eb9" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-8f4b8eb7-90a8-4bf7-b9b5-e2125560aa8c" class="xr-var-data-in" type="checkbox"><label for="data-8f4b8eb7-90a8-4bf7-b9b5-e2125560aa8c" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>axis :</span></dt><dd>X</dd><dt><span>long_name :</span></dt><dd>x coordinate of projection</dd><dt><span>standard_name :</span></dt><dd>projection_x_coordinate</dd><dt><span>units :</span></dt><dd>metre</dd></dl></div><div class="xr-var-data"><pre>array([-2553828.668114, -2553069.081085, -2552309.494056, ...,  -459647.229615,
        -458887.642587,  -458128.055558])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">y</span></div><div class="xr-var-dims">(y)</div><div class="xr-var-dtype">float64</div><div class="xr-var-preview xr-preview">1.722e+06 1.721e+06 ... -5.647e+05</div><input id="attrs-ada93466-2ad4-4a9e-b90b-8b8d8d41f621" class="xr-var-attrs-in" type="checkbox"><label for="attrs-ada93466-2ad4-4a9e-b90b-8b8d8d41f621" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-bbd9a149-e2d2-4ba3-bc19-643c6cb3f30d" class="xr-var-data-in" type="checkbox"><label for="data-bbd9a149-e2d2-4ba3-bc19-643c6cb3f30d" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>axis :</span></dt><dd>Y</dd><dt><span>long_name :</span></dt><dd>y coordinate of projection</dd><dt><span>standard_name :</span></dt><dd>projection_y_coordinate</dd><dt><span>units :</span></dt><dd>metre</dd></dl></div><div class="xr-var-data"><pre>array([1721937.725242, 1720816.28022 , 1719694.835198, ..., -562445.783991,
       -563567.229013, -564688.674035])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">band</span></div><div class="xr-var-dims">(band)</div><div class="xr-var-dtype">int64</div><div class="xr-var-preview xr-preview">1</div><input id="attrs-b449e252-e1c8-43a4-ac32-0217dedcc61e" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-b449e252-e1c8-43a4-ac32-0217dedcc61e" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-8fa288a4-5721-405b-93f5-ad98505d84ee" class="xr-var-data-in" type="checkbox"><label for="data-8fa288a4-5721-405b-93f5-ad98505d84ee" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array([1])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span>spatial_ref</span></div><div class="xr-var-dims">()</div><div class="xr-var-dtype">int64</div><div class="xr-var-preview xr-preview">0</div><input id="attrs-84797ab9-7f19-4537-b61d-a11c5a42e6a2" class="xr-var-attrs-in" type="checkbox"><label for="attrs-84797ab9-7f19-4537-b61d-a11c5a42e6a2" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-e3da646b-9ebe-4a77-bb29-abd8ab611503" class="xr-var-data-in" type="checkbox"><label for="data-e3da646b-9ebe-4a77-bb29-abd8ab611503" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>crs_wkt :</span></dt><dd>PROJCS["North_America_Albers_Equal_Area_Conic",GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4269"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",40],PARAMETER["longitude_of_center",-96],PARAMETER["standard_parallel_1",20],PARAMETER["standard_parallel_2",60],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["ESRI","102008"]]</dd><dt><span>semi_major_axis :</span></dt><dd>6378137.0</dd><dt><span>semi_minor_axis :</span></dt><dd>6356752.314140356</dd><dt><span>inverse_flattening :</span></dt><dd>298.257222101</dd><dt><span>reference_ellipsoid_name :</span></dt><dd>GRS 1980</dd><dt><span>longitude_of_prime_meridian :</span></dt><dd>0.0</dd><dt><span>prime_meridian_name :</span></dt><dd>Greenwich</dd><dt><span>geographic_crs_name :</span></dt><dd>NAD83</dd><dt><span>horizontal_datum_name :</span></dt><dd>North American Datum 1983</dd><dt><span>projected_crs_name :</span></dt><dd>North_America_Albers_Equal_Area_Conic</dd><dt><span>grid_mapping_name :</span></dt><dd>albers_conical_equal_area</dd><dt><span>standard_parallel :</span></dt><dd>(20.0, 60.0)</dd><dt><span>latitude_of_projection_origin :</span></dt><dd>40.0</dd><dt><span>longitude_of_central_meridian :</span></dt><dd>-96.0</dd><dt><span>false_easting :</span></dt><dd>0.0</dd><dt><span>false_northing :</span></dt><dd>0.0</dd><dt><span>spatial_ref :</span></dt><dd>PROJCS["North_America_Albers_Equal_Area_Conic",GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4269"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",40],PARAMETER["longitude_of_center",-96],PARAMETER["standard_parallel_1",20],PARAMETER["standard_parallel_2",60],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["ESRI","102008"]]</dd><dt><span>GeoTransform :</span></dt><dd>-2554208.4616283686 759.5870288351857 0.0 1722498.447752379 0.0 -1121.4450217146702</dd></dl></div><div class="xr-var-data"><pre>array(0)</pre></div></li></ul></div></li><li class="xr-section-item"><input id="section-3e56292b-a892-4062-b26b-5f5ac0773910" class="xr-section-summary-in" type="checkbox"><label for="section-3e56292b-a892-4062-b26b-5f5ac0773910" class="xr-section-summary">Indexes: <span>(3)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><ul class="xr-var-list"><li class="xr-var-item"><div class="xr-index-name"><div>x</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-b2b5191e-21a8-4a37-9574-a5fa9dbd8298" class="xr-index-data-in" type="checkbox"><label for="index-b2b5191e-21a8-4a37-9574-a5fa9dbd8298" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Index([ -2553828.668113951,  -2553069.081085116, -2552309.4940562807,
       -2551549.9070274457, -2550790.3199986103, -2550030.7329697753,
         -2549271.14594094,  -2548511.558912105, -2547751.9718832695,
       -2546992.3848544345,
       ...
        -464964.3388171904,  -464204.7517883552,    -463445.16475952,
        -462685.5777306848,  -461925.9907018496, -461166.40367301437,
       -460406.81664417917, -459647.22961534397,   -458887.642586509,
        -458128.0555576738],
      dtype='float64', name='x', length=2760))</pre></div></li><li class="xr-var-item"><div class="xr-index-name"><div>y</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-7eb07671-853a-4c5f-8a18-a30f8c3eb548" class="xr-index-data-in" type="checkbox"><label for="index-7eb07671-853a-4c5f-8a18-a30f8c3eb548" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Index([1721937.7252415218,  1720816.280219807, 1719694.8351980925,
       1718573.3901763777, 1717451.9451546632, 1716330.5001329484,
       1715209.0551112338,  1714087.610089519, 1712966.1650678045,
       1711844.7200460897,
       ...
       -554595.6688392586, -555717.1138609736, -556838.5588826882,
       -557960.0039044027, -559081.4489261173, -560202.8939478323,
       -561324.3389695468, -562445.7839912614, -563567.2290129759,
        -564688.674034691],
      dtype='float64', name='y', length=2040))</pre></div></li><li class="xr-var-item"><div class="xr-index-name"><div>band</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-f4cc6f8f-3b89-429d-a087-1ff64204c100" class="xr-index-data-in" type="checkbox"><label for="index-f4cc6f8f-3b89-429d-a087-1ff64204c100" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Index([1], dtype='int64', name='band'))</pre></div></li></ul></div></li><li class="xr-section-item"><input id="section-1e5baeb7-4e86-4af3-aef7-f8eb587b3b30" class="xr-section-summary-in" type="checkbox" checked=""><label for="section-1e5baeb7-4e86-4af3-aef7-f8eb587b3b30" class="xr-section-summary">Attributes: <span>(5)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><dl class="xr-attrs"><dt><span>AREA_OR_POINT :</span></dt><dd>Area</dd><dt><span>RepresentationType :</span></dt><dd>THEMATIC</dd><dt><span>scale_factor :</span></dt><dd>1.0</dd><dt><span>add_offset :</span></dt><dd>0.0</dd><dt><span>long_name :</span></dt><dd>Band_1</dd></dl></div></li></ul></div></div>
</div>
</div>
<div id="2b5f6110-bb1d-4493-a2e3-6431d83eeace" class="cell" data-execution_count="29">
<div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb42-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Summary stats of new resample raster</span></span>
<span id="cb42-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'crs: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>crs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb42-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'bounds: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>bounds()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb42-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'shape: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb42-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'Number of cells: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> dem_102008_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb42-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'Number of data cells: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>dem_102008_xr.isnull())<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>values<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb42-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'nodata value: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>nodata<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb42-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'original nodata value: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>encoded_nodata<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>crs: ESRI:102008
bounds: (-2554208.4616283686, -565249.3965455483, -457748.2620432562, 1722498.447752379)
shape: (1, 2040, 2760)
Number of cells: 5,630,400
Number of data cells: 3,720,192
nodata value: nan
original nodata value: 32767.0</code></pre>
</div>
</div>
<div id="d2785e1e-1049-4dc8-9975-426fdd606ad7" class="cell" data-execution_count="30">
<div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb44-1">dem_102008_xr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>].values</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="30">
<pre><code>array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)</code></pre>
</div>
</div>
<div id="0d2db57e-e445-4902-981e-c06123dd0535" class="cell" data-execution_count="31">
<div class="sourceCode cell-code" id="cb46" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb46-1">extent_esri_102008 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dem_102008_xr.rio.bounds()</span>
<span id="cb46-2"></span>
<span id="cb46-3">minx_102008, miny_102008, maxx_102008, maxy_102008 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> extent_esri_102008</span></code></pre></div>
</div>
<p>Let’s just plot this raster with matplotlib. Since <code>nan</code> is the no data value, we don’t need to manually mask.</p>
<div id="e9d5c313-f783-4255-a347-73d5bf500e99" class="cell" data-execution_count="32">
<div class="sourceCode cell-code" id="cb47" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb47-1">dem_102008_xr.plot()</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/river_map_python/index_files/figure-html/cell-33-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Success. Clearly, this <code>DataArray</code> contains information that allows matplotlib to create these “projection aware” axis labels. Or is matplotlib making some kind of assumption due to the non-rectangular nature of the data values? Hold that thought.</p>
<div id="78878438-b993-4e0f-9875-b18534df39a1" class="cell" data-execution_count="33">
<div class="sourceCode cell-code" id="cb48" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb48-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> pyproj <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> CRS</span></code></pre></div>
</div>
<div id="baedce50-d1bc-45d5-b1e2-765222e9dee0" class="cell" data-execution_count="34">
<div class="sourceCode cell-code" id="cb49" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb49-1">crs_102008 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> CRS(dem_102008_xr.rio.crs)</span>
<span id="cb49-2">crs_102008.axis_info</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="34">
<pre><code>[Axis(name=Easting, abbrev=, direction=east, unit_auth_code=EPSG, unit_code=9001, unit_name=metre),
 Axis(name=Northing, abbrev=, direction=north, unit_auth_code=EPSG, unit_code=9001, unit_name=metre)]</code></pre>
</div>
</div>
<p>During the process of figuring all this out, I frequently would have my Jupyter Lab kernel crash during reprojection related operations.</p>
<p>This led me to explore other options such as:</p>
<ul>
<li><a href="http://erouault.blogspot.com/2015/10/gdal-and-ogr-utilities-as-library.html">using the GDAL</a> executable program <a href="https://gdal.org/programs/gdalwarp.html#gdalwarp">gdalwarp</a> to do the reprojection using the underlying TIFF as the source and writing out a new TIFF,</li>
<li>use the Python bindings to GDAL (<code>from osgeo import gdal</code>) to use <code>gdal.warp</code> - see <a href="https://gis.stackexchange.com/questions/233589/re-project-raster-in-python-using-gdal">https://gis.stackexchange.com/questions/233589/re-project-raster-in-python-using-gdal</a>,</li>
</ul>
<p>With respect to the Python bindings, I wasn’t sure if this was any different than using rioxarray as I was guessing that under the hood, both of these approaches were still using Python to access GDAL. Also, it’s <a href="https://rasterio.readthedocs.io/en/stable/topics/switch.html">highly suggested</a> to not import <code>gdal</code> and <code>rasterio</code> in the same notebook session.</p>
</section>
<section id="reprojecting-and-resampling-with-gdal-command-line-utilities" class="level3">
<h3 class="anchored" data-anchor-id="reprojecting-and-resampling-with-gdal-command-line-utilities">Reprojecting and resampling with GDAL command line utilities</h3>
<p>As I describe here, <a href="https://bitsofanalytics.org/posts/algaebloom-part1/#gdal---the-geospatial-data-abstraction-library">GDAL is an indispensable part of computational geospatial work</a>. In its set of command line utilities you can find <a href="https://gdal.org/programs/gdalwarp.html#gdalwarp">gdalwarp</a> which facilitates resampling, reprojection and even clipping. I found this <a href="http://jgomezdans.github.io/gdal_notes/index.html">set of notes</a> that introduces GDAL along with Python.</p>
<p>Let’s create some variables for our input and output filenames.</p>
<div id="871f8034-90eb-43f9-9cbe-4cde6ca808a8" class="cell" data-execution_count="35">
<div class="sourceCode cell-code" id="cb51" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb51-1">dem_file <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Path(WSFR_DATA_ROOT, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'hydrosheds/na_con_3s/na_con_3s.tif'</span>)</span>
<span id="cb51-2">dem_file_102008_gdalwarp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Path(WSFR_DATA_ROOT, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'hydrosheds/na_con_3s/na_con_3s_102008_gdalwarp.tif'</span>)</span></code></pre></div>
</div>
<p>Recall that we’ve got a bounding box specified in longitude and latitude.</p>
<div id="3e97dc10-466e-4cb4-83d2-a048b35b2b1e" class="cell" data-execution_count="36">
<div class="sourceCode cell-code" id="cb52" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb52-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(extent_epsg_4326)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[-126, 35, -103, 52]</code></pre>
</div>
</div>
<p>Here are the arguments used for <code>gdalwarp</code>:</p>
<ul>
<li><code>-t_srs ESRI:102008</code> - spatial reference system for the target.</li>
<li><code>ts 2760 2040</code> - width and height in pixels for the target,</li>
<li><code>-r bilinear</code> - resampling method</li>
<li><code>-te -126 35 -103 52</code> - the extent for the destination file. The default is to use the target spatial reference system, but…</li>
<li><code>-te_srs EPSG:4326</code> - you can specify the spatial reference system used to define the extent.</li>
<li><code>-dstnodata -9999</code> - specify the no data value in the target file.</li>
</ul>
<div id="e8aa57b2-a64e-41ac-b776-6921a1137b62" class="cell" data-execution_count="37">
<div class="sourceCode cell-code" id="cb54" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb54-1">gdalwarp_resample_cmd <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'gdalwarp -overwrite -t_srs ESRI:102008 -srcnodata 32767 -dstnodata -9999 -r bilinear -ts </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> -te </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>minx_4326<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>miny_4326<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>maxx_4326<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>maxy_4326<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> -te_srs EPSG:4326 </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_file<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_file_102008_gdalwarp<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span></span>
<span id="cb54-2"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>{gdalwarp_resample_cmd}</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Creating output file that is 2760P x 2040L.
Processing /home/mark/Documents/projects/driven_data/wsfrodeo/data/hydrosheds/na_con_3s/na_con_3s.tif [1/1] : 0...10...20...30...40...50...60...70...80...90...100 - done.</code></pre>
</div>
</div>
<p>Now read in the resampled and clipped GeoTIFF. We can use the <code>masked=True</code> option to convert the no data values to NaN.</p>
<div id="2a8336ff-91ae-430d-819d-9fbbc615588a" class="cell" data-execution_count="38">
<div class="sourceCode cell-code" id="cb56" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb56-1">dem_102008_gdalwarp_xr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rioxarray.open_rasterio(Path(dem_file_102008_gdalwarp), masked<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span></code></pre></div>
</div>
<div id="e9cf745e-3fc1-4a5d-b1a2-ad30a2319cec" class="cell" data-execution_count="39">
<div class="sourceCode cell-code" id="cb57" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb57-1">dem_102008_gdalwarp_xr</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="39">
<div><svg style="position: absolute; width: 0; height: 0; overflow: hidden">
<defs>
<symbol id="icon-database" viewbox="0 0 32 32">
<path d="M16 0c-8.837 0-16 2.239-16 5v4c0 2.761 7.163 5 16 5s16-2.239 16-5v-4c0-2.761-7.163-5-16-5z"></path>
<path d="M16 17c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z"></path>
<path d="M16 26c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z"></path>
</symbol>
<symbol id="icon-file-text2" viewbox="0 0 32 32">
<path d="M28.681 7.159c-0.694-0.947-1.662-2.053-2.724-3.116s-2.169-2.030-3.116-2.724c-1.612-1.182-2.393-1.319-2.841-1.319h-15.5c-1.378 0-2.5 1.121-2.5 2.5v27c0 1.378 1.122 2.5 2.5 2.5h23c1.378 0 2.5-1.122 2.5-2.5v-19.5c0-0.448-0.137-1.23-1.319-2.841zM24.543 5.457c0.959 0.959 1.712 1.825 2.268 2.543h-4.811v-4.811c0.718 0.556 1.584 1.309 2.543 2.268zM28 29.5c0 0.271-0.229 0.5-0.5 0.5h-23c-0.271 0-0.5-0.229-0.5-0.5v-27c0-0.271 0.229-0.5 0.5-0.5 0 0 15.499-0 15.5 0v7c0 0.552 0.448 1 1 1h7v19.5z"></path>
<path d="M23 26h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
<path d="M23 22h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
<path d="M23 18h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
</symbol>
</defs>
</svg>
<style>/* CSS stylesheet for displaying xarray objects in jupyterlab.
 *
 */

:root {
  --xr-font-color0: var(--jp-content-font-color0, rgba(0, 0, 0, 1));
  --xr-font-color2: var(--jp-content-font-color2, rgba(0, 0, 0, 0.54));
  --xr-font-color3: var(--jp-content-font-color3, rgba(0, 0, 0, 0.38));
  --xr-border-color: var(--jp-border-color2, #e0e0e0);
  --xr-disabled-color: var(--jp-layout-color3, #bdbdbd);
  --xr-background-color: var(--jp-layout-color0, white);
  --xr-background-color-row-even: var(--jp-layout-color1, white);
  --xr-background-color-row-odd: var(--jp-layout-color2, #eeeeee);
}

html[theme=dark],
body[data-theme=dark],
body.vscode-dark {
  --xr-font-color0: rgba(255, 255, 255, 1);
  --xr-font-color2: rgba(255, 255, 255, 0.54);
  --xr-font-color3: rgba(255, 255, 255, 0.38);
  --xr-border-color: #1F1F1F;
  --xr-disabled-color: #515151;
  --xr-background-color: #111111;
  --xr-background-color-row-even: #111111;
  --xr-background-color-row-odd: #313131;
}

.xr-wrap {
  display: block !important;
  min-width: 300px;
  max-width: 700px;
}

.xr-text-repr-fallback {
  /* fallback to plain text repr when CSS is not injected (untrusted notebook) */
  display: none;
}

.xr-header {
  padding-top: 6px;
  padding-bottom: 6px;
  margin-bottom: 4px;
  border-bottom: solid 1px var(--xr-border-color);
}

.xr-header > div,
.xr-header > ul {
  display: inline;
  margin-top: 0;
  margin-bottom: 0;
}

.xr-obj-type,
.xr-array-name {
  margin-left: 2px;
  margin-right: 10px;
}

.xr-obj-type {
  color: var(--xr-font-color2);
}

.xr-sections {
  padding-left: 0 !important;
  display: grid;
  grid-template-columns: 150px auto auto 1fr 20px 20px;
}

.xr-section-item {
  display: contents;
}

.xr-section-item input {
  display: none;
}

.xr-section-item input + label {
  color: var(--xr-disabled-color);
}

.xr-section-item input:enabled + label {
  cursor: pointer;
  color: var(--xr-font-color2);
}

.xr-section-item input:enabled + label:hover {
  color: var(--xr-font-color0);
}

.xr-section-summary {
  grid-column: 1;
  color: var(--xr-font-color2);
  font-weight: 500;
}

.xr-section-summary > span {
  display: inline-block;
  padding-left: 0.5em;
}

.xr-section-summary-in:disabled + label {
  color: var(--xr-font-color2);
}

.xr-section-summary-in + label:before {
  display: inline-block;
  content: '►';
  font-size: 11px;
  width: 15px;
  text-align: center;
}

.xr-section-summary-in:disabled + label:before {
  color: var(--xr-disabled-color);
}

.xr-section-summary-in:checked + label:before {
  content: '▼';
}

.xr-section-summary-in:checked + label > span {
  display: none;
}

.xr-section-summary,
.xr-section-inline-details {
  padding-top: 4px;
  padding-bottom: 4px;
}

.xr-section-inline-details {
  grid-column: 2 / -1;
}

.xr-section-details {
  display: none;
  grid-column: 1 / -1;
  margin-bottom: 5px;
}

.xr-section-summary-in:checked ~ .xr-section-details {
  display: contents;
}

.xr-array-wrap {
  grid-column: 1 / -1;
  display: grid;
  grid-template-columns: 20px auto;
}

.xr-array-wrap > label {
  grid-column: 1;
  vertical-align: top;
}

.xr-preview {
  color: var(--xr-font-color3);
}

.xr-array-preview,
.xr-array-data {
  padding: 0 5px !important;
  grid-column: 2;
}

.xr-array-data,
.xr-array-in:checked ~ .xr-array-preview {
  display: none;
}

.xr-array-in:checked ~ .xr-array-data,
.xr-array-preview {
  display: inline-block;
}

.xr-dim-list {
  display: inline-block !important;
  list-style: none;
  padding: 0 !important;
  margin: 0;
}

.xr-dim-list li {
  display: inline-block;
  padding: 0;
  margin: 0;
}

.xr-dim-list:before {
  content: '(';
}

.xr-dim-list:after {
  content: ')';
}

.xr-dim-list li:not(:last-child):after {
  content: ',';
  padding-right: 5px;
}

.xr-has-index {
  font-weight: bold;
}

.xr-var-list,
.xr-var-item {
  display: contents;
}

.xr-var-item > div,
.xr-var-item label,
.xr-var-item > .xr-var-name span {
  background-color: var(--xr-background-color-row-even);
  margin-bottom: 0;
}

.xr-var-item > .xr-var-name:hover span {
  padding-right: 5px;
}

.xr-var-list > li:nth-child(odd) > div,
.xr-var-list > li:nth-child(odd) > label,
.xr-var-list > li:nth-child(odd) > .xr-var-name span {
  background-color: var(--xr-background-color-row-odd);
}

.xr-var-name {
  grid-column: 1;
}

.xr-var-dims {
  grid-column: 2;
}

.xr-var-dtype {
  grid-column: 3;
  text-align: right;
  color: var(--xr-font-color2);
}

.xr-var-preview {
  grid-column: 4;
}

.xr-index-preview {
  grid-column: 2 / 5;
  color: var(--xr-font-color2);
}

.xr-var-name,
.xr-var-dims,
.xr-var-dtype,
.xr-preview,
.xr-attrs dt {
  white-space: nowrap;
  overflow: hidden;
  text-overflow: ellipsis;
  padding-right: 10px;
}

.xr-var-name:hover,
.xr-var-dims:hover,
.xr-var-dtype:hover,
.xr-attrs dt:hover {
  overflow: visible;
  width: auto;
  z-index: 1;
}

.xr-var-attrs,
.xr-var-data,
.xr-index-data {
  display: none;
  background-color: var(--xr-background-color) !important;
  padding-bottom: 5px !important;
}

.xr-var-attrs-in:checked ~ .xr-var-attrs,
.xr-var-data-in:checked ~ .xr-var-data,
.xr-index-data-in:checked ~ .xr-index-data {
  display: block;
}

.xr-var-data > table {
  float: right;
}

.xr-var-name span,
.xr-var-data,
.xr-index-name div,
.xr-index-data,
.xr-attrs {
  padding-left: 25px !important;
}

.xr-attrs,
.xr-var-attrs,
.xr-var-data,
.xr-index-data {
  grid-column: 1 / -1;
}

dl.xr-attrs {
  padding: 0;
  margin: 0;
  display: grid;
  grid-template-columns: 125px auto;
}

.xr-attrs dt,
.xr-attrs dd {
  padding: 0;
  margin: 0;
  float: left;
  padding-right: 10px;
  width: auto;
}

.xr-attrs dt {
  font-weight: normal;
  grid-column: 1;
}

.xr-attrs dt:hover span {
  display: inline-block;
  background: var(--xr-background-color);
  padding-right: 10px;
}

.xr-attrs dd {
  grid-column: 2;
  white-space: pre-wrap;
  word-break: break-all;
}

.xr-icon-database,
.xr-icon-file-text2,
.xr-no-icon {
  display: inline-block;
  vertical-align: middle;
  width: 1em;
  height: 1.5em !important;
  stroke-width: 0;
  stroke: currentColor;
  fill: currentColor;
}
</style><pre class="xr-text-repr-fallback">&lt;xarray.DataArray (band: 1, y: 2040, x: 2760)&gt;
[5630400 values with dtype=float32]
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 -2.554e+06 -2.553e+06 ... -4.589e+05 -4.582e+05
  * y            (y) float64 1.431e+06 1.43e+06 ... -1.789e+05 -1.797e+05
    spatial_ref  int64 0
Attributes:
    AREA_OR_POINT:       Area
    RepresentationType:  THEMATIC
    scale_factor:        1.0
    add_offset:          0.0
    long_name:           Band_1</pre><div class="xr-wrap" style="display:none"><div class="xr-header"><div class="xr-obj-type">xarray.DataArray</div><div class="xr-array-name"></div><ul class="xr-dim-list"><li><span class="xr-has-index">band</span>: 1</li><li><span class="xr-has-index">y</span>: 2040</li><li><span class="xr-has-index">x</span>: 2760</li></ul></div><ul class="xr-sections"><li class="xr-section-item"><div class="xr-array-wrap"><input id="section-07a44216-c7e9-4c3e-bf09-05797f282889" class="xr-array-in" type="checkbox" checked=""><label for="section-07a44216-c7e9-4c3e-bf09-05797f282889" title="Show/hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-array-preview xr-preview"><span>...</span></div><div class="xr-array-data"><pre>[5630400 values with dtype=float32]</pre></div></div></li><li class="xr-section-item"><input id="section-45702ccd-9313-4a8b-9f36-2f571d7bb2d2" class="xr-section-summary-in" type="checkbox" checked=""><label for="section-45702ccd-9313-4a8b-9f36-2f571d7bb2d2" class="xr-section-summary">Coordinates: <span>(4)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><ul class="xr-var-list"><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">band</span></div><div class="xr-var-dims">(band)</div><div class="xr-var-dtype">int64</div><div class="xr-var-preview xr-preview">1</div><input id="attrs-7abe2348-9c18-4330-81ae-bb29b9719977" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-7abe2348-9c18-4330-81ae-bb29b9719977" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-7ab7283f-5d0b-4006-a98d-f2d9ae45bc18" class="xr-var-data-in" type="checkbox"><label for="data-7ab7283f-5d0b-4006-a98d-f2d9ae45bc18" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array([1])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">x</span></div><div class="xr-var-dims">(x)</div><div class="xr-var-dtype">float64</div><div class="xr-var-preview xr-preview">-2.554e+06 ... -4.582e+05</div><input id="attrs-82bdd010-689c-45de-b968-6dee99bbb119" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-82bdd010-689c-45de-b968-6dee99bbb119" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-d4e93f34-6b7f-4f0c-9ee8-295caeb80f02" class="xr-var-data-in" type="checkbox"><label for="data-d4e93f34-6b7f-4f0c-9ee8-295caeb80f02" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array([-2553828.679341, -2553069.114766, -2552309.550191, ...,  -459709.146456,
        -458949.581881,  -458190.017307])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">y</span></div><div class="xr-var-dims">(y)</div><div class="xr-var-dtype">float64</div><div class="xr-var-preview xr-preview">1.431e+06 1.43e+06 ... -1.797e+05</div><input id="attrs-41c08bdf-3a8b-4b31-ba02-aec1281d9ae6" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-41c08bdf-3a8b-4b31-ba02-aec1281d9ae6" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-64354b5e-c81f-4b1d-a90d-705946daa73b" class="xr-var-data-in" type="checkbox"><label for="data-64354b5e-c81f-4b1d-a90d-705946daa73b" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array([1430769.171373, 1429979.350551, 1429189.52973 , ..., -178095.842566,
       -178885.663388, -179675.484209])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span>spatial_ref</span></div><div class="xr-var-dims">()</div><div class="xr-var-dtype">int64</div><div class="xr-var-preview xr-preview">0</div><input id="attrs-a2b71f70-7a76-4351-ab30-e0581f61d3a1" class="xr-var-attrs-in" type="checkbox"><label for="attrs-a2b71f70-7a76-4351-ab30-e0581f61d3a1" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-47472c06-2d9b-4ffa-9b5b-e59b5479f32e" class="xr-var-data-in" type="checkbox"><label for="data-47472c06-2d9b-4ffa-9b5b-e59b5479f32e" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>crs_wkt :</span></dt><dd>PROJCS["North_America_Albers_Equal_Area_Conic",GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101004,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4269"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",40],PARAMETER["longitude_of_center",-96],PARAMETER["standard_parallel_1",20],PARAMETER["standard_parallel_2",60],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]</dd><dt><span>semi_major_axis :</span></dt><dd>6378137.0</dd><dt><span>semi_minor_axis :</span></dt><dd>6356752.314140356</dd><dt><span>inverse_flattening :</span></dt><dd>298.257222101004</dd><dt><span>reference_ellipsoid_name :</span></dt><dd>GRS 1980</dd><dt><span>longitude_of_prime_meridian :</span></dt><dd>0.0</dd><dt><span>prime_meridian_name :</span></dt><dd>Greenwich</dd><dt><span>geographic_crs_name :</span></dt><dd>NAD83</dd><dt><span>horizontal_datum_name :</span></dt><dd>North American Datum 1983</dd><dt><span>projected_crs_name :</span></dt><dd>North_America_Albers_Equal_Area_Conic</dd><dt><span>grid_mapping_name :</span></dt><dd>albers_conical_equal_area</dd><dt><span>standard_parallel :</span></dt><dd>(20.0, 60.0)</dd><dt><span>latitude_of_projection_origin :</span></dt><dd>40.0</dd><dt><span>longitude_of_central_meridian :</span></dt><dd>-96.0</dd><dt><span>false_easting :</span></dt><dd>0.0</dd><dt><span>false_northing :</span></dt><dd>0.0</dd><dt><span>spatial_ref :</span></dt><dd>PROJCS["North_America_Albers_Equal_Area_Conic",GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101004,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4269"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",40],PARAMETER["longitude_of_center",-96],PARAMETER["standard_parallel_1",20],PARAMETER["standard_parallel_2",60],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]</dd><dt><span>GeoTransform :</span></dt><dd>-2554208.4616283667 759.5645748584358 0.0 1431164.0817839599 0.0 -789.8208217668104</dd></dl></div><div class="xr-var-data"><pre>array(0)</pre></div></li></ul></div></li><li class="xr-section-item"><input id="section-0c22755c-9fd1-435b-9177-ac7aec848868" class="xr-section-summary-in" type="checkbox"><label for="section-0c22755c-9fd1-435b-9177-ac7aec848868" class="xr-section-summary">Indexes: <span>(3)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><ul class="xr-var-list"><li class="xr-var-item"><div class="xr-index-name"><div>band</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-974586d0-dc52-4a9e-9c7a-203a22f5884f" class="xr-index-data-in" type="checkbox"><label for="index-974586d0-dc52-4a9e-9c7a-203a22f5884f" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Index([1], dtype='int64', name='band'))</pre></div></li><li class="xr-var-item"><div class="xr-index-name"><div>x</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-80c8ba46-445d-4b30-ab9d-1f2b399ca488" class="xr-index-data-in" type="checkbox"><label for="index-80c8ba46-445d-4b30-ab9d-1f2b399ca488" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Index([-2553828.6793409376,  -2553069.114766079,  -2552309.550191221,
        -2551549.985616362,  -2550790.421041504, -2550030.8564666454,
        -2549271.291891787, -2548511.7273169286,   -2547752.16274207,
       -2546992.5981672117,
       ...
        -465026.0984802393,  -464266.5339053809, -463506.96933052246,
       -462747.40475566406,  -461987.8401808054,   -461228.275605947,
        -460468.7110310886,  -459709.1464562302,  -458949.5818813718,
        -458190.0173065134],
      dtype='float64', name='x', length=2760))</pre></div></li><li class="xr-var-item"><div class="xr-index-name"><div>y</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-d77cccbc-6dce-483d-8d25-12b63bee5e9b" class="xr-index-data-in" type="checkbox"><label for="index-d77cccbc-6dce-483d-8d25-12b63bee5e9b" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Index([ 1430769.1713730765,  1429979.3505513098,  1429189.5297295428,
         1428399.708907776,  1427609.8880860093,  1426820.0672642426,
        1426030.2464424756,  1425240.4256207088,   1424450.604798942,
        1423660.7839771751,
       ...
        -172567.0968135486, -173356.91763531533, -174146.73845708207,
       -174936.55927884905,  -175726.3801006158, -176516.20092238253,
       -177306.02174414927, -178095.84256591625,   -178885.663387683,
       -179675.48420944973],
      dtype='float64', name='y', length=2040))</pre></div></li></ul></div></li><li class="xr-section-item"><input id="section-779d0067-53ab-4d9e-b497-cf5d542f8cd7" class="xr-section-summary-in" type="checkbox" checked=""><label for="section-779d0067-53ab-4d9e-b497-cf5d542f8cd7" class="xr-section-summary">Attributes: <span>(5)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><dl class="xr-attrs"><dt><span>AREA_OR_POINT :</span></dt><dd>Area</dd><dt><span>RepresentationType :</span></dt><dd>THEMATIC</dd><dt><span>scale_factor :</span></dt><dd>1.0</dd><dt><span>add_offset :</span></dt><dd>0.0</dd><dt><span>long_name :</span></dt><dd>Band_1</dd></dl></div></li></ul></div></div>
</div>
</div>
<div id="2ef42b6f-38af-4547-984e-ff736b6498e4" class="cell" data-execution_count="40">
<div class="sourceCode cell-code" id="cb58" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb58-1">dem_102008_gdalwarp_xr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>].values</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="40">
<pre><code>array([[ nan,  nan,  nan, ..., 609., 609., 609.],
       [ nan,  nan,  nan, ..., 608., 607., 606.],
       [ nan,  nan,  nan, ..., 607., 606., 605.],
       ...,
       [ nan,  nan,  nan, ..., 601., 599., 595.],
       [ nan,  nan,  nan, ..., 598., 596., 593.],
       [ nan,  nan,  nan, ..., 595., 593., 590.]], dtype=float32)</code></pre>
</div>
</div>
<p>Clearly this is different than the values in <code>dem_102008_xr</code>. It seems like <code>rio.reproject</code> is behaving differently than <code>gdalwarp</code>.</p>
<div id="759b3f05-de5b-48f1-b188-17a66405ad60" class="cell" data-execution_count="41">
<div class="sourceCode cell-code" id="cb60" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb60-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Summary stats of new resample raster</span></span>
<span id="cb60-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'crs: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_gdalwarp_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>crs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb60-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'bounds: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_gdalwarp_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>bounds()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb60-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'shape: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_gdalwarp_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb60-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'Number of cells: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_gdalwarp_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> dem_102008_gdalwarp_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb60-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'Number of data cells: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>dem_102008_gdalwarp_xr.isnull())<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>values<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb60-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'nodata value: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_gdalwarp_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>nodata<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb60-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'original nodata value: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_gdalwarp_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>encoded_nodata<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>crs: ESRI:102008
bounds: (-2554208.4616283667, -180070.39462033313, -457810.2350190842, 1431164.0817839599)
shape: (1, 2040, 2760)
Number of cells: 5,630,400
Number of data cells: 4,421,827
nodata value: nan
original nodata value: -9999.0</code></pre>
</div>
</div>
<p>So, there are many more data values in this raster than there are in <code>dem_102008_xr</code>. Obviously the bounds are different (not sure why they are different) but that in and of itself shouldn’t affect the invidual pixel values within the bounds, or does it?</p>
<div id="46e8f144-3430-4773-92ab-4e1b2f04071a" class="cell" data-execution_count="42">
<div class="sourceCode cell-code" id="cb62" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb62-1">dem_102008_gdalwarp_xr.plot()</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/river_map_python/index_files/figure-html/cell-43-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Why does this look different than the raster plotted based on the reprojection done with <code>rio.reproject</code>? The rasters are the same shape and have the same CRS. But the bounds are different, individual data values are different for some pixels and the resulting plots are different - the one based on the gdalwarp raster has standard x and y coordinates.</p>
<div id="f0fb484f-fd32-48de-8844-7fdb24bc43a1" class="cell" data-execution_count="43">
<div class="sourceCode cell-code" id="cb63" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb63-1">CRS(dem_102008_gdalwarp_xr.rio.crs).axis_info</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="43">
<pre><code>[Axis(name=Easting, abbrev=, direction=east, unit_auth_code=EPSG, unit_code=9001, unit_name=metre),
 Axis(name=Northing, abbrev=, direction=north, unit_auth_code=EPSG, unit_code=9001, unit_name=metre)]</code></pre>
</div>
</div>
<p>Let’s make sure they both have the same CRS.</p>
<div id="db2c9889-afed-46f2-b21b-1ecc36315527" class="cell" data-execution_count="44">
<div class="sourceCode cell-code" id="cb65" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb65-1"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> dem_102008_xr.rio.crs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> dem_102008_gdalwarp_xr.rio.crs:</span>
<span id="cb65-2">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'Both have CRS = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>crs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb65-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb65-4">    printf(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'rioxarray CRS: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>crs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">gdalwarp CRS: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_gdalwarp_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>crs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Both have CRS = ESRI:102008</code></pre>
</div>
</div>
<p>So, it doesn’t seem to have anything to do with the projection differences.</p>
<div id="dd194a0d-5c97-446f-afd5-6d07db4d9222" class="cell" data-execution_count="45">
<div class="sourceCode cell-code" id="cb67" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb67-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'resolution: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>resolution()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb67-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'gdalwarp  : </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_gdalwarp_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>resolution()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>resolution: (759.5870288351857, -1121.4450217146702)
gdalwarp  : (759.5645748584358, -789.8208217668104)</code></pre>
</div>
</div>
<p>Pixel height differs.</p>
<div id="9c39099e-0d7b-4d8a-a5e6-f01c8c9caa75" class="cell" data-execution_count="46">
<div class="sourceCode cell-code" id="cb69" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb69-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(dem_102008_xr.attrs)</span>
<span id="cb69-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(dem_102008_gdalwarp_xr.attrs)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>{'AREA_OR_POINT': 'Area', 'RepresentationType': 'THEMATIC', 'scale_factor': 1.0, 'add_offset': 0.0, 'long_name': 'Band_1'}
{'AREA_OR_POINT': 'Area', 'RepresentationType': 'THEMATIC', 'scale_factor': 1.0, 'add_offset': 0.0, 'long_name': 'Band_1'}</code></pre>
</div>
</div>
<div id="b3c92e22-ae45-40b3-b0f6-e229220df4d4" class="cell" data-execution_count="47">
<div class="sourceCode cell-code" id="cb71" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb71-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(dem_102008_xr.rio.grid_mapping)</span>
<span id="cb71-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(dem_102008_gdalwarp_xr.rio.grid_mapping)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>spatial_ref
spatial_ref</code></pre>
</div>
</div>
<p>Let’s check the affine transform that was used for each of these.</p>
<div id="afa459c0-cf91-490f-abcd-2a2b1558e505" class="cell" data-execution_count="48">
<div class="sourceCode cell-code" id="cb73" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb73-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'transform</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>,dem_102008_xr.rio.transform(),<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb73-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'_cached_transform</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>,dem_102008_xr.rio._cached_transform(), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb73-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'recalced transform</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>, dem_102008_xr.rio.transform(recalc<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>transform
 | 759.59, 0.00,-2554208.46|
| 0.00,-1121.45, 1722498.45|
| 0.00, 0.00, 1.00| 

_cached_transform
 | 759.59, 0.00,-2554208.46|
| 0.00,-1121.45, 1722498.45|
| 0.00, 0.00, 1.00| 

recalced transform
 | 759.59, 0.00,-2554208.46|
| 0.00,-1121.45, 1722498.45|
| 0.00, 0.00, 1.00|</code></pre>
</div>
</div>
<div id="2e6de7d0-e9f6-40b5-8521-92d509ddcab0" class="cell" data-execution_count="49">
<div class="sourceCode cell-code" id="cb75" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb75-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'transform</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>,dem_102008_gdalwarp_xr.rio.transform(),<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb75-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'_cached_transform</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>,dem_102008_gdalwarp_xr.rio._cached_transform(),<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb75-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'recalced transform</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>,dem_102008_gdalwarp_xr.rio.transform(recalc<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>transform
 | 759.56, 0.00,-2554208.46|
| 0.00,-789.82, 1431164.08|
| 0.00, 0.00, 1.00| 

_cached_transform
 | 759.56, 0.00,-2554208.46|
| 0.00,-789.82, 1431164.08|
| 0.00, 0.00, 1.00| 

recalced transform
 | 759.56, 0.00,-2554208.46|
| 0.00,-789.82, 1431164.08|
| 0.00, 0.00, 1.00|</code></pre>
</div>
</div>
<p>Ok, there’s another difference. Time to learn more about the transforms. The transform matrices are used to make <a href="https://en.wikipedia.org/wiki/Affine_transformation">affine transformations</a> between rasters and CRS. <a href="https://www.perrygeo.com/python-affine-transforms.html">This post</a> explains affine transformations in the context of using Python for GIS work.</p>
<p>The elements of the above matrices are:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Bbmatrix%7D%0Aa%20&amp;%20b%20&amp;%20c%5C%5C%0Ad%20&amp;%20e%20&amp;%20f%5C%5C%0Ag%20&amp;%20h%20&amp;%20i%0A%5Cend%7Bbmatrix%7D%0A"></p>
<p>where</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?a"> - width of a pixel</li>
<li><img src="https://latex.codecogs.com/png.latex?b"> - row rotation (typically 0)</li>
<li><img src="https://latex.codecogs.com/png.latex?c"> - x-coordinate of upper left corner of upper left pixel</li>
<li><img src="https://latex.codecogs.com/png.latex?d"> - column rotation (typically 0)</li>
<li><img src="https://latex.codecogs.com/png.latex?e"> - height of a pixel (typicaly negative)</li>
<li><img src="https://latex.codecogs.com/png.latex?f"> - y-coordinate of upper left corner of upper left pixel</li>
</ul>
<p>The <img src="https://latex.codecogs.com/png.latex?g=0">, <img src="https://latex.codecogs.com/png.latex?h=0">, and <img src="https://latex.codecogs.com/png.latex?i=1"> entries are constants.</p>
<p>It’s the second row of the matrices for <code>dem_102008_xr</code> and <code>dem_102008_gdalwarp_xr</code> that differ by a large amount. The pixel size and y-coordinates are really quite different for the two matrices.</p>
<ul>
<li><code>dem_102008_xr</code> - | 0.00,-1121.45, 1722498.45|</li>
<li><code>dem_102008_gdalwarp_xr</code> - | 0.00,-789.82, 1431164.08|</li>
</ul>
<p>Both rasters have 2040 rows.</p>
<div id="8203fe66-3a95-4249-b134-4124784bdd44" class="cell" data-execution_count="50">
<div class="sourceCode cell-code" id="cb77" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb77-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'bounds:</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t\t</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>bounds()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb77-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'gdalbounds:</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dem_102008_gdalwarp_xr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>bounds()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>bounds:      (-2554208.4616283686, -565249.3965455483, -457748.2620432562, 1722498.447752379)
gdalbounds:  (-2554208.4616283667, -180070.39462033313, -457810.2350190842, 1431164.0817839599)</code></pre>
</div>
</div>
<p>We should be able to match the bounds using pixel height and the number of rows.</p>
<div id="f19cbda0-20a8-407a-89c3-d93037bc8fe6" class="cell" data-execution_count="51">
<div class="sourceCode cell-code" id="cb79" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb79-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># rioxarray version</span></span>
<span id="cb79-2"><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2040</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1121.45</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1722498.45</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="51">
<pre><code>-565259.55</code></pre>
</div>
</div>
<div id="84508b81-a1ca-4282-b9f4-53edbb095fd2" class="cell" data-execution_count="52">
<div class="sourceCode cell-code" id="cb81" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb81-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># gdalwarp version</span></span>
<span id="cb81-2"><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2040</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">789.82</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1431164.08</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="52">
<pre><code>-180068.71999999997</code></pre>
</div>
</div>
<p>Yep, the relationship between the pixel height and the bounds in the y-dimension are as expected.</p>
<p><strong>So, the question is, why does <code>rio.reproject</code> and <code>gdalwarp</code> using different affine transformations for what seems like the same reprojection to ESRI:102008 with the same shape?</strong></p>
<p>It appears from these two sources that GDAL/PROJ make the choice from among possibly multiple suitable transformations:</p>
<ul>
<li><a href="https://gis.stackexchange.com/questions/441871/return-used-transformation-from-gdal-warp-or-similar">https://gis.stackexchange.com/questions/441871/return-used-transformation-from-gdal-warp-or-similar</a></li>
<li><a href="https://gdal.org/tutorials/osr_api_tut.html">https://gdal.org/tutorials/osr_api_tut.html</a></li>
</ul>
<p>It seems like the <code>-ct</code> argument in gdalwarp can be used to specify a PROJ or WKT2 string that forces a specific transformation to be used.</p>
<p>You can see what PROJ is doing by running this at the command line.</p>
<pre><code>projinfo -s epsg:4326 -t esri:102008</code></pre>
<p>Similar issues related to differences between projections done with rioxarray and <code>gdalwarp</code> include:</p>
<ul>
<li><a href="https://github.com/rasterio/rasterio/issues/2052">https://github.com/rasterio/rasterio/issues/2052</a></li>
<li><a href="https://github.com/rasterio/rasterio/issues/1990">https://github.com/rasterio/rasterio/issues/1990</a></li>
</ul>
<p>Okay, we’ll just leave this issue here for now.</p>
</section>
<section id="a-little-more-reprojection-exploration" class="level3">
<h3 class="anchored" data-anchor-id="a-little-more-reprojection-exploration">A little more reprojection exploration</h3>
<p>As part of trying to understand the various ways of getting the reprojection done, I tried to explore things in a bit greater detail. For example, what was the relationship between the bounds reported by <code>rio.bounds</code> and the actual values in the xarray <code>DataArray</code>?</p>
<div id="cfe4422d-55fe-4520-af0b-37af3f33d4b1" class="cell" data-execution_count="53">
<div class="sourceCode cell-code" id="cb84" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb84-1">gdal_bounds <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dem_102008_gdalwarp_xr.rio.bounds()</span>
<span id="cb84-2">gdal_bounds</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="53">
<pre><code>(-2554208.4616283667,
 -180070.39462033313,
 -457810.2350190842,
 1431164.0817839599)</code></pre>
</div>
</div>
<div id="09db2c31-89a4-480b-a0bb-1ac866e1c5ce" class="cell" data-execution_count="54">
<div class="sourceCode cell-code" id="cb86" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb86-1">minx_gdal <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dem_102008_gdalwarp_xr[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'x'</span>].values.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>()</span>
<span id="cb86-2">maxx_gdal <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dem_102008_gdalwarp_xr[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'x'</span>].values.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>()</span>
<span id="cb86-3">miny_gdal <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dem_102008_gdalwarp_xr[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'y'</span>].values.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>()</span>
<span id="cb86-4">maxy_gdal <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dem_102008_gdalwarp_xr[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'y'</span>].values.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>()</span>
<span id="cb86-5"></span>
<span id="cb86-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"minx: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>minx_gdal<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb86-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"maxx: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>maxx_gdal<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb86-8"></span>
<span id="cb86-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"miny: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>miny_gdal<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb86-10"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"maxy: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>maxy_gdal<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>minx: -2553828.6793409376
maxx: -458190.0173065134
miny: -179675.48420944973
maxy: 1430769.1713730765</code></pre>
</div>
</div>
<div id="cb2dbbb6-b198-419a-9245-dd2bc3bc223b" class="cell" data-execution_count="55">
<div class="sourceCode cell-code" id="cb88" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb88-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"minx diff: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>ureg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>Quantity(gdal_bounds[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> minx_gdal, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'meters'</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>) </span>
<span id="cb88-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"miny diff: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>ureg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>Quantity(gdal_bounds[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> miny_gdal, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'meters'</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>) </span>
<span id="cb88-3"></span>
<span id="cb88-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"maxx diff: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>ureg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>Quantity(gdal_bounds[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> maxx_gdal, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'meters'</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>) </span>
<span id="cb88-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"maxy diff: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>ureg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>Quantity(gdal_bounds[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> maxy_gdal, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'meters'</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>) </span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>minx diff: -379.78228742908686 meter
miny diff: -394.9104108833999 meter
maxx diff: 379.7822874292033 meter
maxy diff: 394.9104108833708 meter</code></pre>
</div>
</div>
<p>The difference between the bounds as reported by rioxarray and the minimums and maximums based on the <code>values</code> of the appropriate dimensions in the underlying xarray <code>DataArray</code> are half of the corresponding x and y pixel sizes. So, <code>rio.bounds</code> is the edge of the pixels and <code>values</code> must be the midpoint of the pixels. Makes sense.</p>
</section>
</section>
<section id="creating-maps-with-cartopy" class="level2">
<h2 class="anchored" data-anchor-id="creating-maps-with-cartopy">Creating maps with Cartopy</h2>
<p>One method for creating maps in Python has been the <a href="https://matplotlib.org/basemap/stable/">Basemap</a> toolkit which is part of matplotlib. This has now been deprecated in favor of the <a href="http://scitools.org.uk/cartopy/docs/latest/">Cartopy project</a>. So, that’s what we’ll use.</p>
<p>From the <a href="https://scitools.org.uk/cartopy/docs/latest/">Cartopy docs</a>:</p>
<blockquote class="blockquote">
<ul>
<li><p>Cartopy is a Python package designed for geospatial data processing in order to produce maps and other geospatial data analyses.</p></li>
<li><p>Cartopy makes use of the powerful PROJ, NumPy and Shapely libraries and includes a programmatic interface built on top of Matplotlib for the creation of publication quality maps.</p></li>
<li><p>Key features of cartopy are its object oriented projection definitions, and its ability to transform points, lines, vectors, polygons and images between those projections.</p></li>
<li><p>You will find cartopy especially useful for large area / small scale data, where Cartesian assumptions of spherical data traditionally break down. If you’ve ever experienced a singularity at the pole or a cut-off at the dateline, it is likely you will appreciate cartopy’s unique features!</p></li>
</ul>
</blockquote>
<section id="projections-in-cartopy" class="level3">
<h3 class="anchored" data-anchor-id="projections-in-cartopy">Projections in Cartopy</h3>
<p>Cartopy provides an object oriented way to work with map projections through its <code>cartopy.crs.CRS</code> class. We need to import the <code>cartopy.crs</code> module.</p>
<div id="4ef9cdf0-679d-4d92-867f-abcfa35b3b9e" class="cell" data-execution_count="56">
<div class="sourceCode cell-code" id="cb90" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb90-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> cartopy.crs <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> ccrs                   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># import projections</span></span></code></pre></div>
</div>
<p>Let’s create a <code>CRS</code> object for the Albers Equal Area projection that we’ll use for the map. Set the central lat/lon and standard parallels to something that makes sense for the region being mapped. I’m going to specify these parameters explicitly even though they are the default values.</p>
<div id="c5bfdb53-c692-4a0b-ab8f-797881da062d" class="cell" data-execution_count="57">
<div class="sourceCode cell-code" id="cb91" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb91-1">crs_aea <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ccrs.AlbersEqualArea(central_longitude<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">96</span>,</span>
<span id="cb91-2">                               central_latitude<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">40</span>,</span>
<span id="cb91-3">                               standard_parallels<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">20.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">60.0</span>))</span></code></pre></div>
</div>
</section>
<section id="basic-cartopy-plot" class="level3">
<h3 class="anchored" data-anchor-id="basic-cartopy-plot">Basic cartopy plot</h3>
<p>For the map, we’ll experiment with using map extents based on both of the DEM rasters we created.</p>
<div id="ad259bbb-db15-4ea1-8507-dfdb1623a24a" class="cell" data-execution_count="58">
<div class="sourceCode cell-code" id="cb92" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb92-1">minx_102008_gdalwarp, maxx_102008_gdalwarp, miny_102008_gdalwarp, maxy_102008_gdalwarp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb92-2">    dem_102008_gdalwarp_xr.rio.bounds()</span></code></pre></div>
</div>
<p>The order of these bounds in Cartopy’s <a href="https://scitools.org.uk/cartopy/docs/latest/reference/generated/cartopy.mpl.geoaxes.GeoAxes.html#cartopy.mpl.geoaxes.GeoAxes.set_extent">set_extent()</a> function is different than the order in the bounds as reported by rioxarray. Cartopy uses (minx, maxx, miny, maxy).</p>
<div id="a7d13832-c28c-48c1-97fd-47244e0f77c3" class="cell" data-execution_count="59">
<div class="sourceCode cell-code" id="cb93" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb93-1">cartopy_extent_102008_gdalwarp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (minx_102008_gdalwarp, maxx_102008_gdalwarp, miny_102008_gdalwarp, maxy_102008_gdalwarp)</span>
<span id="cb93-2">cartopy_extent_102008 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (minx_102008, maxx_102008, miny_102008, maxy_102008)</span></code></pre></div>
</div>
<p>We start with standard matplotlib figure creation.</p>
<div id="9aa32eb3-eb9f-4e25-9418-e81b5994c16a" class="cell" data-execution_count="60">
<div class="sourceCode cell-code" id="cb94" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb94-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create figure and set size</span></span>
<span id="cb94-2">fig1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> plt.figure()</span>
<span id="cb94-3">fig1.set_figheight(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>)</span>
<span id="cb94-4">fig1.set_figwidth(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<pre><code>&lt;Figure size 1600x900 with 0 Axes&gt;</code></pre>
</div>
</div>
<p>Cartopy contains a <code>GeoAxes</code> class built on top of the regular matplotlib <code>Axes</code> class and embues with geospatial powers. When creating such an <code>Axes</code> object, you need to pass in the CRS.</p>
<p>Next we can set the map extent with <code>GeoAxes.set_extent()</code>. We’ll start with the extent based on the raster reprojected with rioxarray.</p>
<p>Then, <code>add_geometries</code> method is used to add the vector layers to the <code>GeoAxes</code> instance. One gotcha is that this method doesn’t seem to work correctly with points. To get around this, we can simply use the <code>plot</code> method in GeoPandas (which is also based on matplotlib).</p>
<p>Similarly, we can use the <code>plot</code> method from xarray and rioxarray to add the DEM raster to the plot.</p>
<div id="d3b065bd-68a3-4033-bb1a-4237fabbcf8f" class="cell" data-execution_count="61">
<div class="sourceCode cell-code" id="cb96" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb96-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create figure and set size</span></span>
<span id="cb96-2">fig1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> plt.figure()</span>
<span id="cb96-3">fig1.set_figheight(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>)</span>
<span id="cb96-4">fig1.set_figwidth(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>)</span>
<span id="cb96-5"></span>
<span id="cb96-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># create a set of axes with desired projection  </span></span>
<span id="cb96-7">ax1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> plt.axes(projection <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crs_aea)</span>
<span id="cb96-8"></span>
<span id="cb96-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set map extent</span></span>
<span id="cb96-10">ax1.set_extent(cartopy_extent_102008, crs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crs_aea)</span>
<span id="cb96-11"></span>
<span id="cb96-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add sites and basins</span></span>
<span id="cb96-13">ax1.add_geometries(basins_102008_gdf.geometry, crs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>crs_aea, facecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#a2d572"</span>, edgecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"grey"</span>, linewidth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># for Lat/Lon data.</span></span>
<span id="cb96-14">sites_102008_gdf.plot(ax<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>ax1, marker<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"D"</span>) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># add_geometries doesn't seem to work with points</span></span>
<span id="cb96-15"></span>
<span id="cb96-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add state and province boundaries</span></span>
<span id="cb96-17">ax1.add_geometries(state_102008_gdf.geometry, crs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crs_aea, facecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>, edgecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>, linewidth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)</span>
<span id="cb96-18">ax1.add_geometries(province_102008_gdf.geometry, crs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crs_aea, facecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>, edgecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>, linewidth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)</span>
<span id="cb96-19"></span>
<span id="cb96-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add rivers</span></span>
<span id="cb96-21">ax1.add_geometries(rivers_102008_gdf.geometry, crs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crs_aea, edgecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#487bb6"</span>, linewidth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.75</span>)</span>
<span id="cb96-22"></span>
<span id="cb96-23"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add DEM raster</span></span>
<span id="cb96-24">dem_102008_xr.plot(ax<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>ax1, cmap<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'terrain'</span>)</span>
<span id="cb96-25"></span>
<span id="cb96-26">ax1.set_title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Map 1 - using DEM and extent from rioxarray processed raster"</span>)                    </span>
<span id="cb96-27">plt.show()</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/river_map_python/index_files/figure-html/cell-62-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Hmm, the way that rioxarray did the projection resulted in numerous no data values within the extent of the map. At this point, I don’t completely understand what is going on. What happens if we use the extent and raster based on <code>gdalwarp</code>?</p>
<div id="989445b8-fef4-4368-b635-27472763242f" class="cell" data-execution_count="62">
<div class="sourceCode cell-code" id="cb97" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb97-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create figure and set size</span></span>
<span id="cb97-2">fig2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> plt.figure()</span>
<span id="cb97-3">fig2.set_figheight(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>)</span>
<span id="cb97-4">fig2.set_figwidth(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>)</span>
<span id="cb97-5"></span>
<span id="cb97-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># create a set of axes with desired projection  </span></span>
<span id="cb97-7">ax2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> plt.axes(projection <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crs_aea)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># create a set of axes with desired projection    </span></span>
<span id="cb97-8"></span>
<span id="cb97-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set map extent</span></span>
<span id="cb97-10">ax2.set_extent(cartopy_extent_102008_gdalwarp, crs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crs_aea)</span>
<span id="cb97-11"></span>
<span id="cb97-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add sites and basins</span></span>
<span id="cb97-13">ax2.add_geometries(basins_102008_gdf.geometry, crs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>crs_aea, facecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#a2d572"</span>, edgecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"grey"</span>, linewidth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># for Lat/Lon data.</span></span>
<span id="cb97-14">sites_102008_gdf.plot(ax<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>ax2, marker<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"D"</span>) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># add_geometries doesn't seem to work with points</span></span>
<span id="cb97-15"></span>
<span id="cb97-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add state and province boundaries</span></span>
<span id="cb97-17">ax2.add_geometries(state_102008_gdf.geometry, crs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crs_aea, facecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>, edgecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>, linewidth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)</span>
<span id="cb97-18">ax2.add_geometries(province_102008_gdf.geometry, crs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crs_aea, facecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>, edgecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>, linewidth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)</span>
<span id="cb97-19"></span>
<span id="cb97-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add rivers</span></span>
<span id="cb97-21">ax2.add_geometries(rivers_102008_gdf.geometry, crs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crs_aea, edgecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#487bb6"</span>, linewidth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.75</span>)</span>
<span id="cb97-22"></span>
<span id="cb97-23"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add DEM raster</span></span>
<span id="cb97-24">dem_102008_gdalwarp_xr.plot(ax<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>ax2, cmap<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'terrain'</span>)</span>
<span id="cb97-25"></span>
<span id="cb97-26">ax2.set_title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Map 2 - using DEM and extent from gdalwarp processed raster"</span>)                    </span>
<span id="cb97-27">plt.show()</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/river_map_python/index_files/figure-html/cell-63-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Now the raster doesn’t cover the entire map in the y-dimension. This is consistent with the affine transformation used by <code>gdalwarp</code> but certainly isn’t desirable. The original lat/lon bounds based on EPSG:4326 included all of the vector layers in their entirety. I’m a bit baffled by these reprojection and bounding box issues. It will have to be the topic of another post as I’ve wasted way too much time fighting with these problems.</p>
</section>
<section id="using-built-in-and-naturalearth-features" class="level3">
<h3 class="anchored" data-anchor-id="using-built-in-and-naturalearth-features">Using built in and NaturalEarth features</h3>
<p>Cartopy makes it easy to use preinstalled features such as state borders and rivers as well as those from the <a href="https://www.naturalearthdata.com/">Natural Earth dataset</a> through the <code>cartopy.feature</code> module. To do this we use the <a href="https://scitools.org.uk/cartopy/docs/latest/reference/generated/cartopy.mpl.geoaxes.GeoAxes.html#cartopy.mpl.geoaxes.GeoAxes.add_feature">add_feature</a> <code>GeoAxes</code> method. Standard matplotlib keyword arguments can be passed in as well to style the feature. Use the <a href="https://matplotlib.org/stable/gallery/misc/zorder_demo.html">zorder parameter</a> to control which layers are above and below other layers. Higher numbered layers are plotted on top of lower numbered layers.</p>
<div id="0d320a43-36a8-4ed2-a6ef-a4fc05308da8" class="cell" data-execution_count="63">
<div class="sourceCode cell-code" id="cb98" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb98-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> cartopy.feature <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> cf                 <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># import features</span></span>
<span id="cb98-2"></span>
<span id="cb98-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a land feature to act as the basemap</span></span>
<span id="cb98-4">land_50m <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cf.NaturalEarthFeature(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'physical'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'land'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'50m'</span>,</span>
<span id="cb98-5">                                  edgecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'k'</span>, facecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb98-6"></span>
<span id="cb98-7">fig <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> plt.figure()</span>
<span id="cb98-8">fig.set_figheight(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>)</span>
<span id="cb98-9">fig.set_figwidth(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>)</span>
<span id="cb98-10"></span>
<span id="cb98-11">crs_aea <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ccrs.AlbersEqualArea(central_longitude<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">96</span>,</span>
<span id="cb98-12">                               central_latitude<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">40</span>,</span>
<span id="cb98-13">                               standard_parallels<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">20.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">60.0</span>))</span>
<span id="cb98-14"></span>
<span id="cb98-15">ax <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> plt.axes(projection <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crs_aea)</span>
<span id="cb98-16">ax.set_extent(cartopy_extent_102008, crs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crs_aea)</span>
<span id="cb98-17"></span>
<span id="cb98-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add features</span></span>
<span id="cb98-19">ax.add_feature(land_50m, zorder<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb98-20">ax.add_feature(cf.LAKES, zorder<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span>
<span id="cb98-21">ax.add_feature(cf.OCEAN, zorder<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span>
<span id="cb98-22">ax.add_feature(cf.BORDERS, zorder<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>)</span>
<span id="cb98-23">ax.add_feature(cf.COASTLINE, zorder<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>)</span>
<span id="cb98-24">ax.add_feature(cf.STATES, zorder<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>)</span>
<span id="cb98-25">ax.add_feature(cf.RIVERS, zorder<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>)</span>
<span id="cb98-26"></span>
<span id="cb98-27"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add our vector layers</span></span>
<span id="cb98-28">ax.add_geometries(basins_102008_gdf.geometry, crs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crs_aea, zorder<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>,</span>
<span id="cb98-29">                  facecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#a2d572"</span>, edgecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"grey"</span>, linewidth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># for Lat/Lon data.</span></span>
<span id="cb98-30">sites_102008_gdf.plot(ax<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>ax, marker<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"D"</span>)</span>
<span id="cb98-31"></span>
<span id="cb98-32"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add DEM raster</span></span>
<span id="cb98-33">dem_102008_xr.plot(ax<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>ax, cmap<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'terrain'</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/river_map_python/index_files/figure-html/cell-64-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="more-cartopy-and-other-resources" class="level3">
<h3 class="anchored" data-anchor-id="more-cartopy-and-other-resources">More Cartopy and other resources</h3>
<ul>
<li><a href="https://uoftcoders.github.io/studyGroup/lessons/python/cartography/lesson/">Cartography and mapping in Python</a></li>
<li><a href="https://geohackweek.github.io/visualization/04-geopandas-and-cartopy/">Plotting Actual Things: geopandas and cartopy</a></li>
<li><a href="https://earth-env-data-science.github.io/lectures/mapping_cartopy.html">Maps in Scientific Python</a></li>
<li><a href="https://github.com/mdbartos/pysheds">pyheds - Simple and fast watershed delineation in python</a></li>
<li><a href="https://geophysics.in/how-to-plot-raster-and-vector-maps-together-in-python/">Plotting raster and vector data together in python</a></li>
</ul>
</section>
</section>
<section id="more-python-mapping-tools" class="level2">
<h2 class="anchored" data-anchor-id="more-python-mapping-tools">More Python mapping tools</h2>
<p>This post has barely scratched the surface of <a href="https://pyviz.org/tools.html#geospatial">Python based mapping tools</a>. We will explore other tools such as Folium and leafmap in subsequent posts.</p>
<ul>
<li>Folium - https://python-visualization.github.io/folium/latest/</li>
<li>leafmap - https://leafmap.org/</li>
<li>ipyleaflet - https://github.com/jupyter-widgets/ipyleaflet</li>
<li>hvPlot - https://github.com/holoviz/hvplot</li>
<li>Cartopy - https://github.com/SciTools/cartopy</li>
<li>geoviews - https://github.com/holoviz/geoviews</li>
<li>eomaps - https://github.com/raphaelquast/eomaps</li>
</ul>
</section>
<section id="using-r-to-create-a-river-map" class="level2">
<h2 class="anchored" data-anchor-id="using-r-to-create-a-river-map">Using R to create a river map</h2>
<p>In Part 3 of this series we’ll take a look at using R to create a similar river map. For the R based map, I decided to create a separate post written with <a href="https://quarto.org/">Quarto</a> to avoid messing around with the <a href="https://rpy2.github.io/">r2py package</a> within Jupyter notebooks.</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{isken2024,
  author = {Isken, Mark},
  title = {A Vector-Raster River Map, Three Ways - {Part} 2: {Python}},
  date = {2024-02-16},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-isken2024" class="csl-entry quarto-appendix-citeas">
Isken, Mark. 2024. <span>“A Vector-Raster River Map, Three Ways - Part
2: Python.”</span> February 16, 2024.
</div></div></section></div> ]]></description>
  <category>geonewb</category>
  <category>geospatial</category>
  <category>python</category>
  <category>qgis</category>
  <guid>https://bitsofanalytics.org/posts/river_map_python/</guid>
  <pubDate>Fri, 16 Feb 2024 05:00:00 GMT</pubDate>
</item>
<item>
  <title>A vector-raster river map, three ways - Part 1: QGIS</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/river_map_qgis/</link>
  <description><![CDATA[ 



<section id="creating-a-river-map-using-qgis-python-and-r" class="level2">
<h2 class="anchored" data-anchor-id="creating-a-river-map-using-qgis-python-and-r">Creating a river map using QGIS, Python, and R</h2>
<p>This is part of my <a href="https://bitsofanalytics.org/posts/geonewb-intro/">geonewb</a> series of posts.</p>
<p>I’m using the <a href="https://www.drivendata.org/competitions/254/reclamation-water-supply-forecast-dev/page/796/">Water Supply Rodeo Forecast Challenge</a> run by the <a href="https://www.drivendata.org/">Driven Data</a> folks to further my geospatial analysis learning. This series of posts is going to focus on a basic step that was motivated by the challenge - create a map showing the relevant rivers and river basins. Such a map would combine several vector and raster datasets:</p>
<ul>
<li>a vector layer with the stream guage sites</li>
<li>a vector layer with watershed basin boundaries</li>
<li>a vector layer showing US state boundaries and Canadian province boundaries</li>
<li>a vector layer showing rivers</li>
<li>a raster layer containing a DEM</li>
</ul>
<p>My goal is to create maps using three different technologies: QGIS, Python and R. By doing this I can get a sense of the open source mapping landscape at a beginner level. In this first post, we’ll use <a href="https://qgis.org/en/site/">QGIS</a>, a widely used open source GIS package. This is <strong>NOT</strong> a basic QGIS tutorial - there are plenty of those out there and the documentation is quite good.</p>
<div class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> io</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> os</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> pathlib <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Path</span>
<span id="cb1-4"></span>
<span id="cb1-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb1-6"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> IPython.display <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Image</span></code></pre></div>
</div>
<div class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1">WSFR_DATA_ROOT <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> os.environ[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'WSFR_DATA_ROOT'</span>]</span></code></pre></div>
</div>
<p>Here’s what one of the final QGIS maps looks like:</p>
<div class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1">Image(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./images/WaterSupplyForecastRodeo.png'</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<p><img src="https://bitsofanalytics.org/posts/river_map_qgis/index_files/figure-html/cell-4-output-1.png" class="img-fluid"></p>
</div>
</div>
</section>
<section id="data-sources-and-python-libraries" class="level2">
<h2 class="anchored" data-anchor-id="data-sources-and-python-libraries">Data sources and Python libraries</h2>
<p>All of the data we need is freely available. Even though this first map is created with QGIS, we will start to use some Python libraries to explore the data that we’ll be using. In particular, we will use the <a href="https://geopandas.org/en/stable/index.html">GeoPandas</a> package to read vector data into <code>GeoDataFrame</code> objects. GeoPandas adds geospatial functionality to pandas.</p>
<blockquote class="blockquote">
<p>GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by <a href="https://shapely.readthedocs.io/en/stable/index.html">shapely</a>. GeoPandas further depends on <a href="https://fiona.readthedocs.io/">fiona</a> for file access and matplotlib for plotting.</p>
</blockquote>
<p>For raster data, we’ll use the <a href="https://docs.xarray.dev/en/stable/">xarray</a> and <a href="https://corteva.github.io/rioxarray/html/readme.html">rioxarray</a> packages.</p>
<p>The xarray package builds on top of NumPy N-d arrays and adds ability to create and work with labels for the dimensions.</p>
<p>The two main data structures are <code>DataArray</code> (a N-d generalization of a <code>pandas.Series</code>) and <code>DataSet</code> (an N-d generalization of a <code>pandas.DataFrame</code>). The <a href="https://docs.xarray.dev/en/stable/getting-started-guide/why-xarray.html">Overview: Why xarray?</a> page has a nice level of detail on the case for xarray and its link to geospatial analysis.</p>
<p>The rioxarray package extends the xarray package to facilitate reading raster data into xarray objects. The actual reading of the raster file is done using another Python package, <a href="https://rasterio.readthedocs.io/en/latest/">rasterio</a>. From the rasterio docs:</p>
<blockquote class="blockquote">
<p>Geographic information systems use GeoTIFF and other formats to organize and store gridded raster datasets such as satellite imagery and terrain models. Rasterio reads and writes these formats and provides a Python API based on Numpy N-dimensional arrays and GeoJSON.</p>
</blockquote>
<section id="sites-and-basins-vector" class="level3">
<h3 class="anchored" data-anchor-id="sites-and-basins-vector">Sites and basins (vector)</h3>
<p>The streamflow guage sites and associated watershed basins are defined in the <code>geospatial.gpkg</code> <a href="https://mapscaping.com/reading-and-writing-geopackage-in-python/">GeoPackage</a> file available from the <a href="https://www.drivendata.org/competitions/254/reclamation-water-supply-forecast-dev/data/">WSFR data downloads page</a>. GeoPackage files can contain multiple layers of both vector and raster data. Individual layers can be read using GeoPandas’ <code>read_file</code> method by passing the layer name. The <code>layer=</code> argument is actually passed along to <a href="https://fiona.readthedocs.io/en/stable/index.html">Fiona</a>, a Python wrapper for accessing vector data via the <a href="https://gdal.org/">GDAL/OGR library</a>.</p>
<p>Under the hood, a GeoPackage is a SQLite database that conforms to the <a href="https://www.geopackage.org/spec/">GeoPackage standard</a> developed by the <a href="https://www.ogc.org/">Open Geospatial Consortium</a>. Both <code>basins</code> and <code>sites</code> are tables in the GeoPackage (SQLite database) and can be accessed directly with a SQLite database browser or any tool for working with SQLite databases. If you omit the <code>layer=</code> argument, GeoPandas returns the <code>basins</code> layer, likely as it’s the first layer in whatever internal indexing scheme is used with the collection of layers.</p>
<div class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> geopandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> gpd</span>
<span id="cb4-2"></span>
<span id="cb4-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Reading a GeoPackage file</span></span>
<span id="cb4-4">geospatial_input_file <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Path(WSFR_DATA_ROOT, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'geospatial.gpkg'</span>)</span>
<span id="cb4-5">basins_gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> gpd.read_file(geospatial_input_file, layer<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'basins'</span>)</span>
<span id="cb4-6">sites_gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> gpd.read_file(geospatial_input_file, layer<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sites'</span>)</span></code></pre></div>
</div>
<div class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">basins_gdf</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="5">
<div>


<table class="dataframe table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">site_id</th>
<th data-quarto-table-cell-role="th">name</th>
<th data-quarto-table-cell-role="th">area</th>
<th data-quarto-table-cell-role="th">geometry</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>hungry_horse_reservoir_inflow</td>
<td>Hungry Horse Reservoir Inflow</td>
<td>1681.780</td>
<td>POLYGON ((-113.09701 47.24399, -113.09730 47.2...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>snake_r_nr_heise</td>
<td>Snake River near Heise</td>
<td>5719.410</td>
<td>MULTIPOLYGON (((-110.79196 44.40127, -110.7922...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>pueblo_reservoir_inflow</td>
<td>Pueblo Reservoir Inflow</td>
<td>4615.460</td>
<td>POLYGON ((-105.67340 38.15883, -105.67315 38.1...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>sweetwater_r_nr_alcova</td>
<td>Sweetwater River near Alcova</td>
<td>2377.280</td>
<td>POLYGON ((-107.32822 42.21621, -107.32903 42.2...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>missouri_r_at_toston</td>
<td>Missouri River at Toston</td>
<td>14676.200</td>
<td>POLYGON ((-110.63001 46.30856, -110.63006 46.3...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">5</td>
<td>animas_r_at_durango</td>
<td>Animas River at Durango</td>
<td>700.901</td>
<td>POLYGON ((-107.87584 37.27614, -107.87759 37.2...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">6</td>
<td>yampa_r_nr_maybell</td>
<td>Yampa River near Maybell</td>
<td>3381.680</td>
<td>MULTIPOLYGON (((-107.02802 40.02615, -107.0277...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">7</td>
<td>libby_reservoir_inflow</td>
<td>Libby Reservoir Inflow</td>
<td>9030.450</td>
<td>MULTIPOLYGON (((-114.85817 48.50961, -114.8582...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">8</td>
<td>boise_r_nr_boise</td>
<td>Boise River near Boise</td>
<td>2687.340</td>
<td>MULTIPOLYGON (((-115.23528 44.09596, -115.2355...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">9</td>
<td>green_r_bl_howard_a_hanson_dam</td>
<td>Green River below Howard Hanson Dam</td>
<td>221.234</td>
<td>POLYGON ((-121.31474 47.13373, -121.31530 47.1...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">10</td>
<td>taylor_park_reservoir_inflow</td>
<td>Taylor Park Reservoir Inflow</td>
<td>254.415</td>
<td>MULTIPOLYGON (((-106.74944 39.04268, -106.7499...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">11</td>
<td>dillon_reservoir_inflow</td>
<td>Dillon Reservoir Inflow</td>
<td>328.429</td>
<td>POLYGON ((-106.04383 39.35748, -106.04564 39.3...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">12</td>
<td>ruedi_reservoir_inflow</td>
<td>Ruedi Reservoir Inflow</td>
<td>223.740</td>
<td>POLYGON ((-106.52087 39.15795, -106.52216 39.1...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">13</td>
<td>fontenelle_reservoir_inflow</td>
<td>Fontenelle Reservoir Inflow</td>
<td>4199.150</td>
<td>POLYGON ((-110.06865 42.02568, -110.06944 42.0...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">14</td>
<td>weber_r_nr_oakley</td>
<td>Weber River near Oakley</td>
<td>162.429</td>
<td>POLYGON ((-111.06529 40.67623, -111.06633 40.6...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">15</td>
<td>san_joaquin_river_millerton_reservoir</td>
<td>San Joaquin River - Millerton Reservoir</td>
<td>1886.720</td>
<td>MULTIPOLYGON (((-120.44227 39.29433, -120.4423...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">16</td>
<td>merced_river_yosemite_at_pohono_bridge</td>
<td>Merced River - Yosemite at Pohono Bridge</td>
<td>321.858</td>
<td>POLYGON ((-119.44368 37.86102, -119.44401 37.8...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">17</td>
<td>american_river_folsom_lake</td>
<td>American River - Folsom Reservoir</td>
<td>1677.530</td>
<td>MULTIPOLYGON (((-119.52661 37.04015, -119.5265...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">18</td>
<td>colville_r_at_kettle_falls</td>
<td>Colville River at Kettle Falls</td>
<td>1086.250</td>
<td>POLYGON ((-117.88747 48.70666, -117.88776 48.7...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">19</td>
<td>stehekin_r_at_stehekin</td>
<td>Stehekin River at Stehekin</td>
<td>319.664</td>
<td>POLYGON ((-120.93255 48.19748, -120.93284 48.1...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">20</td>
<td>detroit_lake_inflow</td>
<td>Detroit Lake Inflow</td>
<td>452.383</td>
<td>POLYGON ((-121.85271 44.46766, -121.85379 44.4...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">21</td>
<td>virgin_r_at_virtin</td>
<td>Virgin River at Virgin</td>
<td>948.039</td>
<td>MULTIPOLYGON (((-112.73274 37.06034, -112.7327...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">22</td>
<td>skagit_ross_reservoir</td>
<td>Ross Reservoir Inflow</td>
<td>800.749</td>
<td>MULTIPOLYGON (((-120.98078 48.74358, -120.9810...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">23</td>
<td>boysen_reservoir_inflow</td>
<td>Boysen Reservoir Inflow</td>
<td>7706.580</td>
<td>POLYGON ((-108.50590 42.46772, -108.50643 42.4...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">24</td>
<td>pecos_r_nr_pecos</td>
<td>Pecos River near Pecos</td>
<td>171.839</td>
<td>MULTIPOLYGON (((-105.50099 35.90638, -105.5007...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">25</td>
<td>owyhee_r_bl_owyhee_dam</td>
<td>Owyhee River below Owyhee Dam</td>
<td>11659.800</td>
<td>POLYGON ((-118.19040 43.09658, -118.19034 43.0...</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
<div class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1">sites_gdf</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="6">
<div>


<table class="dataframe table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">site_id</th>
<th data-quarto-table-cell-role="th">name</th>
<th data-quarto-table-cell-role="th">geometry</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>hungry_horse_reservoir_inflow</td>
<td>Hungry Horse Reservoir Inflow</td>
<td>POINT (-114.03786 48.35658)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>snake_r_nr_heise</td>
<td>Snake River near Heise</td>
<td>POINT (-111.66000 43.61250)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>pueblo_reservoir_inflow</td>
<td>Pueblo Reservoir Inflow</td>
<td>POINT (-104.71803 38.27167)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>sweetwater_r_nr_alcova</td>
<td>Sweetwater River near Alcova</td>
<td>POINT (-107.13394 42.48996)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>missouri_r_at_toston</td>
<td>Missouri River at Toston</td>
<td>POINT (-111.42028 46.14657)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">5</td>
<td>animas_r_at_durango</td>
<td>Animas River at Durango</td>
<td>POINT (-107.88035 37.27917)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">6</td>
<td>yampa_r_nr_maybell</td>
<td>Yampa River near Maybell</td>
<td>POINT (-108.03341 40.50275)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">7</td>
<td>libby_reservoir_inflow</td>
<td>Libby Reservoir Inflow</td>
<td>POINT (-115.31872 48.40066)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">8</td>
<td>boise_r_nr_boise</td>
<td>Boise River near Boise</td>
<td>POINT (-116.05955 43.52767)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">9</td>
<td>green_r_bl_howard_a_hanson_dam</td>
<td>Green River below Howard Hanson Dam</td>
<td>POINT (-121.79789 47.28371)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">10</td>
<td>taylor_park_reservoir_inflow</td>
<td>Taylor Park Reservoir Inflow</td>
<td>POINT (-106.60920 38.81833)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">11</td>
<td>dillon_reservoir_inflow</td>
<td>Dillon Reservoir Inflow</td>
<td>POINT (-106.06641 39.62554)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">12</td>
<td>ruedi_reservoir_inflow</td>
<td>Ruedi Reservoir Inflow</td>
<td>POINT (-106.81865 39.36387)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">13</td>
<td>fontenelle_reservoir_inflow</td>
<td>Fontenelle Reservoir Inflow</td>
<td>POINT (-110.06667 42.02778)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">14</td>
<td>weber_r_nr_oakley</td>
<td>Weber River near Oakley</td>
<td>POINT (-111.24796 40.73717)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">15</td>
<td>san_joaquin_river_millerton_reservoir</td>
<td>San Joaquin River - Millerton Reservoir</td>
<td>POINT (-119.72431 36.98439)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">16</td>
<td>merced_river_yosemite_at_pohono_bridge</td>
<td>Merced River - Yosemite at Pohono Bridge</td>
<td>POINT (-119.66567 37.71628)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">17</td>
<td>american_river_folsom_lake</td>
<td>American River - Folsom Reservoir</td>
<td>POINT (-121.16436 38.70453)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">18</td>
<td>colville_r_at_kettle_falls</td>
<td>Colville River at Kettle Falls</td>
<td>POINT (-118.06249 48.59435)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">19</td>
<td>stehekin_r_at_stehekin</td>
<td>Stehekin River at Stehekin</td>
<td>POINT (-120.69177 48.32958)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">20</td>
<td>detroit_lake_inflow</td>
<td>Detroit Lake Inflow</td>
<td>POINT (-122.29744 44.75378)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">21</td>
<td>virgin_r_at_virtin</td>
<td>Virgin River at Virgin</td>
<td>POINT (-113.18078 37.20415)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">22</td>
<td>skagit_ross_reservoir</td>
<td>Ross Reservoir Inflow</td>
<td>POINT (-121.06761 48.73217)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">23</td>
<td>boysen_reservoir_inflow</td>
<td>Boysen Reservoir Inflow</td>
<td>POINT (-108.17899 43.42496)</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">24</td>
<td>pecos_r_nr_pecos</td>
<td>Pecos River near Pecos</td>
<td>POINT (-105.68270 35.70835)</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">25</td>
<td>owyhee_r_bl_owyhee_dam</td>
<td>Owyhee River below Owyhee Dam</td>
<td>POINT (-117.25583 43.65444)</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</section>
<section id="state-and-provincial-boundaries-vector" class="level3">
<h3 class="anchored" data-anchor-id="state-and-provincial-boundaries-vector">State and provincial boundaries (vector)</h3>
<p>The US government provides <a href="https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html">Cartographic Boundary Files</a> in both geodatabase and shapefile formats. They are available at different levels of resolution and also by geographic region.</p>
<blockquote class="blockquote">
<p>The cartographic boundary files are simplified representations of selected geographic areas from the Census Bureau’s Master Address File/Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) System. These boundary files are specifically designed for small scale thematic mapping. As of 2019, cartographic boundary files are available in shapefile, geodatabase, and Keyhole Markup Language (KML) format. For more details about these files, including their appropriate usage, please see our <a href="https://www.census.gov/programs-surveys/geography/technical-documentation/naming-convention/cartographic-boundary-file.html">Cartographic Boundary File Description</a> page.</p>
</blockquote>
<p>I downloaded the <a href="https://www2.census.gov/geo/tiger/GENZ2022/gdb/cb_2022_us_all_20m.zip">lowest resolution (1:20,000,000) zipped geodatabase file</a> of the entire US. Uncompressing it results in a folder that is the geodatabase.</p>
<p>Since some of the basins are in Canada, I found <a href="https://open.canada.ca/data/en/dataset/a883eb14-0c0e-45c4-b8c4-b54c4a819edb">similar maps for Canada in shapefile format</a>.</p>
</section>
<section id="river-layer-vector" class="level3">
<h3 class="anchored" data-anchor-id="river-layer-vector">River layer (vector)</h3>
<p>For this layer we can use <a href="https://www.hydrosheds.org/products/hydrorivers">HydroRIVERS</a> data, a part of the <a href="https://www.hydrosheds.org/">HydroSHEDS</a> project. Both geodatase and shapefile formats are downloadable from <a href="https://www.hydrosheds.org/products/hydrorivers">https://www.hydrosheds.org/products/hydrorivers</a>. There are different files for different regions of the world. The North American and Central America data as a zipped geodatabase is available from <a href="https://data.hydrosheds.org/file/HydroRIVERS/HydroRIVERS_v10_na.gdb.zip">here</a> and is ~72Mb in size (compressed). Unzip it after downloading. It’s a lot of rivers.</p>
</section>
<section id="dem-layer-raster" class="level3">
<h3 class="anchored" data-anchor-id="dem-layer-raster">DEM layer (raster)</h3>
<p>Since this project is all about hydrology, we will use a specially “hydrologically conditioned DEM” available as part of the <a href="https://www.hydrosheds.org/products/hydrosheds">HydroSHEDS core layer</a>.</p>
<ul>
<li>go to <a href="https://www.hydrosheds.org/hydrosheds-core-downloads">https://www.hydrosheds.org/hydrosheds-core-downloads</a>,</li>
<li>select the Conditioned DEM tab near bottom of page,</li>
<li>download the <a href="https://data.hydrosheds.org/file/hydrosheds-v1-con/na_con_3s.zip">compressed file for North and Central America</a>; it’s ~2.7GB,</li>
<li>uncompress the downloaded file and you’ll get a folder containing documentation and the GeoTIFF file, <code>na_con_3s.tif</code>, with the DEM raster.</li>
</ul>
</section>
</section>
<section id="using-qgis-to-create-the-map" class="level2">
<h2 class="anchored" data-anchor-id="using-qgis-to-create-the-map">Using QGIS to create the map</h2>
<p>QGIS is a widely used, free and open source, GIS package. Check out the <a href="https://docs.qgis.org/3.28/en/docs/user_manual/index.html">documentation</a>.</p>
<p>The main steps we’ll do are:</p>
<ol type="1">
<li>Create a new project</li>
<li>Connect to the geospatial.gpkg file using the Data Source Manager and add the sites (points) and basins (polygons) layers.</li>
<li>Add the state and province boundaries from a geodatabase and shapefile, respectively.</li>
<li>Add the DEM layer from a GeoTIFF file.</li>
<li>Add the rivers from a geodatabase.</li>
<li>Make styling changes and create a print layout.</li>
</ol>
<section id="adding-the-sites-and-basins" class="level3">
<h3 class="anchored" data-anchor-id="adding-the-sites-and-basins">Adding the sites and basins</h3>
<p>Use the Data Source Manager to connect to the <code>geospatial.gpkg</code> file and select the <code>sites</code> and <code>basins</code> layers to add to the map. After doing that, the map looks like this.</p>
<div class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1">Image(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./images/qgis_sites_basins.png'</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="7">
<p><img src="https://bitsofanalytics.org/posts/river_map_qgis/index_files/figure-html/cell-8-output-1.png" class="img-fluid"></p>
</div>
</div>
</section>
<section id="adding-state-and-provice-boundaries" class="level3">
<h3 class="anchored" data-anchor-id="adding-state-and-provice-boundaries">Adding state and provice boundaries</h3>
<p>Add a new vector layer (using Data Source Manager or Layer | Add Layer | Add Vector Layer …). Set the option to Directory instead of File and set the Source Type to OpenFileGDB. Browse to the <code>cb_2022_us_all_20m.gdb</code> folder, select it. Click Add button. You’ll see a list of all the available layers. We just want the state boundary layer. After moving the added layer to the bottom of the map, we have something that looks like this. We’ll change the styling later.</p>
<div class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1">Image(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./images/add_state_boundaries.png'</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="8">
<p><img src="https://bitsofanalytics.org/posts/river_map_qgis/index_files/figure-html/cell-9-output-1.png" class="img-fluid"></p>
</div>
</div>
<p>For the Canadian province borders, just add another vector layer but browse to the <code>lpr_000b16a_e.shp</code> shapefile that was part of the <a href="https://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/files-fichiers/2016/lpr_000b16a_e.zip">downloaded zip file</a>.</p>
<p>Then change the styling of these two layers as you see fit. I gave them no fill and different colored borders.</p>
</section>
<section id="adding-dem-layer" class="level3">
<h3 class="anchored" data-anchor-id="adding-dem-layer">Adding DEM layer</h3>
<p>Add a new raster layer based on the <code>na_con_3s.tif</code> GeoTIFF file downloaded earlier. By default it will load as a singleband gray scale item. I changed this to a singleband pseudocolor and chose a color gradient using earth tones.</p>
<div class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1">Image(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./images/add_DEM.png'</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="9">
<p><img src="https://bitsofanalytics.org/posts/river_map_qgis/index_files/figure-html/cell-10-output-1.png" class="img-fluid"></p>
</div>
</div>
</section>
<section id="adding-rivers-vector" class="level3">
<h3 class="anchored" data-anchor-id="adding-rivers-vector">Adding rivers (vector)</h3>
<p>Like the state boundaries, the river data is in a geodatabase folder. Be careful as when you uncompress the zip file, <code>HydroRIVERS_v10_na.gdb.zip</code>, you end up with a folder, <code>HydroRIVERS_v10_na.gdb</code> that is NOT a geodatabase. The actual geodatabase folder is inside this folder and is named <code>HydroRIVERS_v10_na.gdb</code>. This layer may take a bit to load as there are a lot of rivers.</p>
<p>I decided to filter this layer to include only rivers whose <code>ORD_FLOW</code> value was less than or equal to 6. You can right click on the layer and select Filter and then build the expression <code>ORD_FLOW &lt;= 6</code> or whatever you want.</p>
<p>From the HydroRIVERS technical documentation, the <code>ORD_FLOW</code> attribute is defined as:</p>
<blockquote class="blockquote">
<p>Indicator of river order using river flow to distinguish logarithmic size classes: order 1 represents river reaches with a long-term average discharge ≥ 100,000 m 3 /s; order 2 represents river reaches with a long-term average discharge ≥ 10,000 m 3 /s and &lt; 100,000 m 3 /s; … order 9 represents river reaches with a long-term average discharge ≥ 0.001 m 3 /s and &lt; 0.01 m 3 /s; and order 10 represents river reaches with a long-term average discharge &lt; 0.001 m 3 /s (i.e., 0 in the provided data due to rounding to 3 digits).</p>
</blockquote>
<div class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1">Image(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./images/add_rivers.png'</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="10">
<p><img src="https://bitsofanalytics.org/posts/river_map_qgis/index_files/figure-html/cell-11-output-1.png" class="img-fluid"></p>
</div>
</div>
</section>
<section id="styling-changes-title-and-legend" class="level3">
<h3 class="anchored" data-anchor-id="styling-changes-title-and-legend">Styling changes, title and legend</h3>
<p>I experimented with the colors and styles for the sites and basins layer until I found something I liked.</p>
<p>In order to add things like titles, legends, scale bar or other annotations, you need to create a new Print Layout from the main Project menu. See <a href="https://docs.qgis.org/3.28/en/docs/user_manual/print_composer/overview_composer.html">this QGIS documentation page</a> for all the details.</p>
<p>This seemed like a good stopping point and our map looks like the first version shown in this notebook.</p>
<p>Now our challenge is to duplicate this map using Python and R.</p>
</section>
</section>
<section id="glossing-over-some-important-issues" class="level2">
<h2 class="anchored" data-anchor-id="glossing-over-some-important-issues">Glossing over some important issues</h2>
<p>When using QGIS, sometimes we can get a false sense of security regarding our understanding of fundamental, and often complex, issues. In particular, the above figure shows a 2D map on a screen. The world is 3D and roughly shaped like an ellipsoid. So, how exactly were the different datasets projected onto a 2D surface? What coordinate reference systems (CRS) are used in the various vector and raster datasets used? Did QGIS automatically convert all of the data to use a common CRS? In the next post, when we try to recreate the above map with Python, we’ll be forced to confront these issues, and many more. For example, here’s another version of the map above which uses what is known as the Albers Equal-Area projection. We’ll wade into the thicket that is CRSs and projections in the next post.</p>
<div class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1">Image(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./images/WaterSupplyForecastRodeo_Albers.png'</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="11">
<p><img src="https://bitsofanalytics.org/posts/river_map_qgis/index_files/figure-html/cell-12-output-1.png" class="img-fluid"></p>
</div>
</div>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{isken2024,
  author = {Isken, Mark},
  title = {A Vector-Raster River Map, Three Ways - {Part} 1: {QGIS}},
  date = {2024-01-15},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-isken2024" class="csl-entry quarto-appendix-citeas">
Isken, Mark. 2024. <span>“A Vector-Raster River Map, Three Ways - Part
1: QGIS.”</span> January 15, 2024.
</div></div></section></div> ]]></description>
  <category>geonewb</category>
  <category>geospatial</category>
  <category>python</category>
  <category>qgis</category>
  <guid>https://bitsofanalytics.org/posts/river_map_qgis/</guid>
  <pubDate>Mon, 15 Jan 2024 05:00:00 GMT</pubDate>
</item>
<item>
  <title>hillmaker 0.8.0 released</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/hillmaker-080-released/hillmaker_080_released.html</link>
  <description><![CDATA[ 



<p>The <a href="https://github.com/misken/hillmaker">hillmaker team</a> is happy to announce that we have released a new version of hillmaker that contains some significant improvements. Probably the biggest improvement is actual documentation - <a href="https://hillmaker.readthedocs.io/en/latest/intro.html">https://hillmaker.readthedocs.io/en/latest/intro.html</a></p>
<p>Other improvements include:</p>
<ul>
<li>choice of CLI, an object oriented API and a function based API that makes hillmaker easy to use from the command line, Jupyter notebooks or Python scripts,</li>
<li>many input parameters for controlling hillmaker computations and outputs,</li>
<li>enhanced plotting capabilities,</li>
<li>can specify inputs via a <a href="https://toml.io/en/">TOML</a> formatted config file,</li>
<li>input validation handled via a <a href="https://docs.pydantic.dev/latest/">pydantic</a> model,</li>
<li>automated tests for occupancy computations,</li>
<li>added extensive conservation of flow checks,</li>
<li>detailed logging,</li>
<li>summary reports for length of stay and implied operating hours.</li>
<li>added examples to docstrings for API related elements.</li>
</ul>
<p>See the <a href="https://github.com/misken/hillmaker/blob/main/CHANGELOG.md">CHANGELOG</a> for more details on changes in this version.</p>
<p>A few things haven’t changed:</p>
<ul>
<li>main GitHub site is still <a href="https://github.com/misken/hillmaker" class="uri">https://github.com/misken/hillmaker</a>,</li>
<li>You can install using pip or conda-forge (coming soon) or using source from GitHub (see main GitHub site or the docs).</li>
<li>In addition to the new documentation, you can find other example Jupyter notebooks and Python scripts at the <a href="https://github.com/misken/hillmaker-examples">hillmaker-examples repo on GitHub</a>.</li>
</ul>
<section id="future-plans" class="level2">
<h2 class="anchored" data-anchor-id="future-plans">Future plans</h2>
<p>For the next major release we hope to have a form based GUI as another user interface option. Those who have used the old <a href="https://sourceforge.net/projects/hillmaker/">MS Access version of hillmaker</a> will recall that it had a form based GUI that made it easy to use by non-programmers. We also will add additional analytical capabilities as well as better scenario management functionality.</p>


</section>

 ]]></description>
  <category>python</category>
  <category>hillmaker</category>
  <guid>https://bitsofanalytics.org/posts/hillmaker-080-released/hillmaker_080_released.html</guid>
  <pubDate>Thu, 16 Nov 2023 05:00:00 GMT</pubDate>
</item>
<item>
  <title>Trying out DataCamp AI Assistant</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/datacamp_ai_1/</link>
  <description><![CDATA[ 



<section id="an-invitation" class="level1">
<h1>An invitation</h1>
<p>Today I got an email from <a href="">DataCamp</a> inviting me to try out their new AI Assistant (powered by OpenAI) within their cloud based notebook environment called Workspace. I teach analytics courses in a business school. For one of those courses, students get access to DataCamp for the semester but its use in the course is entirely optional. They have some very nice tutorials for those getting started in analytics and data science and it gives students another way to start to learn to do analytics work R and Python.</p>
<p>The AI Assistant invite was not expected and I really hadn’t even thought about DataCamp rolling out something like this - but it makes perfect sense. So, I figured I might as well give it a try. Now, I’ve been (trying) keeping up with the LLM/GPT/AI frenzy that’s been happening since last fall, but other than a few basic interactions with ChatGPT3.5, I really haven’t used them much at all. As an educator, I have been closely following discussions in the higher ed community about the impact of these tools. I say all that to emphasize that I’m very much a GPT newb and what follows is by no means a rigorous or exhaustive look at this AI Assistant. I’m sure there are better ways to interact with this thing. This is just me messing around with some typical kinds of things I teach and use in my classes.</p>
<section id="basic-and-not-so-basic-analysis-of-cycle-share-data" class="level2">
<h2 class="anchored" data-anchor-id="basic-and-not-so-basic-analysis-of-cycle-share-data">Basic (and not so basic) analysis of cycle share data</h2>
<p>Within the Workspace cloud notebook environment, there are a bunch of preloaded datasets including some cycle share data that has already been summarized in terms of number rides per date and includes some weather related variables. I did some basic pandas group by queries and simple plots and the AI Assistant worked quite well. In my classes I’ve used cycle share data but we typically work with the raw trip data in which each row is a bike rental instance. I noticed that you can upload your own dataset into a Workspace. Ok, let’s see what this thing can do with a dataset that is not pre-loaded, but is very prominent out on the interwebs.</p>
<p>I uploaded the <code>trip.csv</code> datafile from the <a href="https://www.kaggle.com/datasets/pronto/cycle-share-dataset">Pronto Cycleshare Dataset</a>. It’s a few hundred thousand rows and just under 50Mb in size. No problem with the upload.</p>
<p>Since I knew we were going to be doing some datetime work, I helped the AI with the import by asking:</p>
<blockquote class="blockquote">
<p>Import the trip.csv file into a pandas dataframe. The starttime and stoptime fields are datetimes.</p>
</blockquote>
<p>It generated the following perfectly correct code.</p>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb1-2"></span>
<span id="cb1-3">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'trip.csv'</span>, parse_dates<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'stoptime'</span>])</span></code></pre></div>
<p>Here’s what the dataframe looks like in terms of column names.</p>
<div class="cell" data-execution_count="16">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1">df.info()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;class 'pandas.core.frame.DataFrame'&gt;
RangeIndex: 286857 entries, 0 to 286856
Data columns (total 12 columns):
 #   Column             Non-Null Count   Dtype         
---  ------             --------------   -----         
 0   trip_id            286857 non-null  int64         
 1   starttime          286857 non-null  datetime64[ns]
 2   stoptime           286857 non-null  datetime64[ns]
 3   bikeid             286857 non-null  object        
 4   tripduration       286857 non-null  float64       
 5   from_station_name  286857 non-null  object        
 6   to_station_name    286857 non-null  object        
 7   from_station_id    286857 non-null  object        
 8   to_station_id      286857 non-null  object        
 9   usertype           286857 non-null  object        
 10  gender             181557 non-null  object        
 11  birthyear          181553 non-null  float64       
dtypes: datetime64[ns](2), float64(2), int64(1), object(7)
memory usage: 26.3+ MB</code></pre>
</div>
</div>
<p>Let’s find the 8 most popular stations to rent from.</p>
<blockquote class="blockquote">
<p>List the 8 most popular stations from which to rent a bike.</p>
</blockquote>
<p>The generated code contains a column name that doesn’t exist in the dataframe. No doubt, there’s another cycle share dataset out there that does use this column name.</p>
<div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1">popular_stations <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start station name'</span>].value_counts().head(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>)</span>
<span id="cb4-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(popular_stations)</span></code></pre></div>
<p>When I clicked <strong>Fix Error</strong>, I got:</p>
<p><em>The error is caused by the key ‘start station name’ not being present in the DataFrame ‘df’. This could be due to a misspelling or the key not existing in the DataFrame. To fix the error, we can check the column names in the DataFrame and use the correct key. Here’s the corrected code:</em></p>
<pre><code>popular_stations = df['Start Station'].value_counts().head(8)
print(popular_stations)</code></pre>
<p><em>Assuming the column name in the DataFrame is ‘Start Station’, this code should work without any errors.</em></p>
<p>So, it can’t go find a potentially correct field name but did help find the root cause of the error. I manually fixed the code.</p>
<div class="cell" data-executiontime="71" data-lastsuccessfullyexecutedcode="popular_stations = df['from_station_id'].value_counts().head(8) print(popular_stations)" data-execution_count="17">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1">popular_stations <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from_station_id'</span>].value_counts().head(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>)</span>
<span id="cb6-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(popular_stations)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>WF-01     13054
CH-07     11392
BT-01     10934
CBD-13    10049
SLU-15     9994
CH-02      9639
CH-08      9468
SLU-01     8382
Name: from_station_id, dtype: int64</code></pre>
</div>
</div>
<p>Let’s turn this into a bar chart. Can I somehow reference the previous output? Does it know about the correct field name? I tried this:</p>
<blockquote class="blockquote">
<p>Create a bar plot from popular_stations.</p>
</blockquote>
<p>Again, it used the wrong column name to (re)compute <code>popular_stations</code> (even though it didn’t need to recompute it). I fixed up the code manually.</p>
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> matplotlib.pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb8-2"></span>
<span id="cb8-3">popular_stations <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from_station_id'</span>].value_counts().head(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>)</span>
<span id="cb8-4"></span>
<span id="cb8-5">plt.bar(popular_stations.index, popular_stations.values)</span>
<span id="cb8-6">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Most Popular Stations to Rent From'</span>)</span>
<span id="cb8-7">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Station ID'</span>)</span>
<span id="cb8-8">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Number of Rentals'</span>)</span>
<span id="cb8-9">plt.show()</span></code></pre></div>
<p><img src="https://bitsofanalytics.org/posts/datacamp_ai_1/index_files/figure-html/cell-6-output-1.png" class="img-fluid"></p>
<p>In another attempt at doing the same thing, the AI Assistant used the correct column name from the previous code chunk. Strangely, it then recomputed <code>popular_stations</code> instead of just using it.</p>
<p>Now let’s make a line chart of the number of rides by date.</p>
<blockquote class="blockquote">
<p>make a line chart of the number of rides by date.</p>
</blockquote>
<p>It used a non-existent column for the start date even though we told it that <code>starttime</code> and <code>stoptime</code> were dates. Not surprisingly, the Fix Error identified the bad column name but didn’t go further than that in terms of suggesting possible columns to use (i.e.&nbsp;the ones whose datatype is date). So, here’s the manually fixed up code:</p>
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> matplotlib.pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb9-2"></span>
<span id="cb9-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># group the data by date and count the number of rides</span></span>
<span id="cb9-4">rides_by_date <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].dt.date).size()</span>
<span id="cb9-5"></span>
<span id="cb9-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># plot the line chart</span></span>
<span id="cb9-7">plt.plot(rides_by_date.index, rides_by_date.values)</span>
<span id="cb9-8">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Number of Rides by Date'</span>)</span>
<span id="cb9-9">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Date'</span>)</span>
<span id="cb9-10">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Number of Rides'</span>)</span>
<span id="cb9-11">plt.show()</span></code></pre></div>
<p><img src="https://bitsofanalytics.org/posts/datacamp_ai_1/index_files/figure-html/cell-7-output-1.png" class="img-fluid"></p>
<p>I asked it to make the x-axis more readable and it produced the following bit of correct code with rotated axis labels.</p>
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> matplotlib.pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb10-2"></span>
<span id="cb10-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># group the data by date and count the number of rides</span></span>
<span id="cb10-4">rides_by_date <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].dt.date).size()</span>
<span id="cb10-5"></span>
<span id="cb10-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># plot the line chart</span></span>
<span id="cb10-7">plt.plot(rides_by_date.index, rides_by_date.values)</span>
<span id="cb10-8">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Number of Rides by Date'</span>)</span>
<span id="cb10-9">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Date'</span>)</span>
<span id="cb10-10">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Number of Rides'</span>)</span>
<span id="cb10-11"></span>
<span id="cb10-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># rotate the x-axis labels for readability</span></span>
<span id="cb10-13">plt.xticks(rotation<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">45</span>)</span>
<span id="cb10-14"></span>
<span id="cb10-15">plt.show()</span></code></pre></div>
<p><img src="https://bitsofanalytics.org/posts/datacamp_ai_1/index_files/figure-html/cell-8-output-1.png" class="img-fluid"></p>
<p>Strangely, I tried to reproduce the above behavior but the generated code went back to using non-existent column names and instead of rotating the text it used <code>DateFormatter</code> objects. Also, the generated code would do things like trying to convert columns to datetimes even if they were already datetimes.</p>
</section>
<section id="and-now-for-something-more-difficult" class="level2">
<h2 class="anchored" data-anchor-id="and-now-for-something-more-difficult">And now for something more difficult</h2>
<p>Ok, let’s try something tougher like computing the average number of riders by day of week. This is not particularly easy since you have to worry about things like the specific date range used (could have a different number of, say, Mondays than the other days of the week) and the possibility of there being zero rides within a given group. Now, the zero thing is a non-issue for overall average number of rides by day of week, but, if we do something like computing the average number of rides by starting station by day of week and hour of day, there most certainly will be instances of zero rides in some of those groups. If they aren’t properly accounted for, the averages will be inflated. I did a <a href="https://bitsofanalytics.org/posts/daily-averages-cycleshare-part1/daily_averages_cycleshare_part1.html">blog post on this issue</a> several years ago (so, there’s a chance it’s in the training data for the LLM).</p>
<blockquote class="blockquote">
<p>Compute the average number of bike rentals by day of week.</p>
</blockquote>
<p>Here’s what I got for one of the attempts:</p>
<div class="cell" data-execution_count="21">
<div class="cell-output cell-output-display" data-execution_count="21">
<p><img src="https://bitsofanalytics.org/posts/datacamp_ai_1/index_files/figure-html/cell-9-output-1.png" class="img-fluid"></p>
</div>
</div>
<p>Another time I tried the same query I got the following code. Well, actually it wanted to read some completely different csv file but I manually fixed it. Nevertheless, the computation is wrong (really wrong). If you look back at the line plot, you’ll see that there are around 400-500 rides per day.</p>
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#import pandas as pd</span></span>
<span id="cb11-2"></span>
<span id="cb11-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># load the data</span></span>
<span id="cb11-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#df = pd.read_csv('trip.csv')</span></span>
<span id="cb11-5"></span>
<span id="cb11-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># convert starttime column to datetime datatype</span></span>
<span id="cb11-7">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.to_datetime(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>])</span>
<span id="cb11-8"></span>
<span id="cb11-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># create a new column for day of week</span></span>
<span id="cb11-10">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'starttime'</span>].dt.day_name()</span>
<span id="cb11-11"></span>
<span id="cb11-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># group the data by day of week and calculate the average number of rides</span></span>
<span id="cb11-13">avg_rides_by_day <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'tripduration'</span>].count() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'day_of_week'</span>].nunique()</span></code></pre></div>
<pre><code>day_of_week
Friday       6185.714286
Monday       6029.142857
Saturday     5496.000000
Sunday       4729.714286
Thursday     6278.000000
Tuesday      6126.571429
Wednesday    6134.428571
Name: tripduration, dtype: float64</code></pre>
<p>Yeah, that’s not even close to being right. The AI Assistant also like to repeat import statements and reread files. This makes it difficult to create a coherent workflow as it’s easy to inadvertantly clobber previous data prep code or you have to remember to comment out unneeded file rereads.</p>
<p>I also tried asking for code to do average number of rides by station by day of week by hour of day. Again, got bad fields and incorrect computations. There are definitely groups with zero rides and this code isn’t taking that into account.</p>
<div class="cell" data-execution_count="23">
<div class="cell-output cell-output-display" data-execution_count="23">
<p><img src="https://bitsofanalytics.org/posts/datacamp_ai_1/index_files/figure-html/cell-11-output-1.png" class="img-fluid"></p>
</div>
</div>
<p>That’s enough for now.</p>
</section>
<section id="parting-thoughts" class="level2">
<h2 class="anchored" data-anchor-id="parting-thoughts">Parting thoughts</h2>
<p>I certainly wasn’t surprised that the AI Assistant failed miserably on this last task as it’s not a simple query and the correct approach is unlikely to be very prominent in the training data. I was a bit surprised how often the AI Assistant would use non-existent column names or filenames, or try to do datatype conversions on columns that were already of the desired data type. It is certainly capable of creating boilerplate code for simple things which can then be manually patched up (e.g.&nbsp;fixing column names). I’m sure it will improve over time via some sort of reinforcement learning or non-LLM based tweaks to prevent things like nonexistent column name use. For now, I’m sticking with StackOverflow and writing my own code.</p>


</section>
</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div quarto-reuse="quarto-reuse" class="quarto-appendix-contents"><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</a></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{isken2023,
  author = {Mark Isken},
  title = {Trying Out {DataCamp} {AI} {Assistant}},
  date = {2023-04-17},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-isken2023" class="csl-entry quarto-appendix-citeas">
Mark Isken. 2023. <span>“Trying Out DataCamp AI Assistant.”</span> April
17, 2023.
</div></div></section></div> ]]></description>
  <category>python</category>
  <category>teaching</category>
  <category>AI</category>
  <guid>https://bitsofanalytics.org/posts/datacamp_ai_1/</guid>
  <pubDate>Mon, 17 Apr 2023 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Hello world maps in R using both raster and vector data</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r.html</link>
  <description><![CDATA[ 




<p>Recently I’ve done a bunch of <a href="https://bitsofanalytics.org/#category=geonewb">geonewb posts</a> focusing on learning about geospatial analysis using Python. While I’ve barely scratched the surface, I decided to simultaneously explore the same topic using R. I find it helpful to conceptually map (no pun intended) between Python and R packages for doing similar tasks.</p>
<p>A bunch of years ago I did do a <a href="https://bitsofanalytics.org/posts/state-parks-ggmap/state-parks-ggmap.html">post on using ggmap</a> for a simple plot of the state of Michigan (raster data) with dots showing the location and relative size of our state parks (vector data). At the time, one could use <code>ggmap::get_map()</code> to download raster basemaps from Google with no API. Those days are over. Similarly, the terms of use for <a href="https://www.openstreetmap.org/#map=16/42.7348/-83.1595">Open Street Map</a> don’t allow for the <code>get_map()</code> function to use OSM as a source of data for <strong>ggmap</strong> see <a href="https://github.com/dkahle/ggmap/issues/117">Do not use OpenStreetMap export endpoint #117</a>. More importantly, the software landscape has continued to evolve with new packages such as <a href="https://r-spatial.github.io/sf/">sf</a>, <a href="https://rspatial.org/pkg/index.html">terra</a> and <a href="https://r-tmap.github.io/tmap/">tmap</a>.</p>
<p>Let’s start out with a super simple project and use that to provide a framework for learning basic geocomputation in R. I used the word <em>geocomputation</em> purposefully as one of the first resources I stumbled across in doing this project is the <a href="https://r.geocompx.org/index.html">Geocomputation with R</a> (GCwR) online book. It’s a terrific resource. I particularly liked the inclusion of a detailed historical perspective on the geospatial landscape in R. <a href="https://r.geocompx.org/adv-map.html">Chapter 9 on Making Maps with R</a> introduces a number of R packages for creating a wide variety of map types. We will see a few of these packages in this brief tutorial.</p>
<p>Ok, let’s start our “Hello World” level project.</p>
<section id="mapping-blue-bird-nest-box-locations" class="level2">
<h2 class="anchored" data-anchor-id="mapping-blue-bird-nest-box-locations">Mapping blue bird nest box locations</h2>
<p>One of the programs run by the <a href="https://www.birds.cornell.edu/home/">Cornell Lab of Ornithology</a> is known as <a href="https://nestwatch.org/">NestWatch</a>.</p>
<blockquote class="blockquote">
<p>NestWatch is a nationwide nest-monitoring program designed to track status and trends in the reproductive biology of birds.</p>
</blockquote>
<p>Our local birding group has participated in this program for a number of years. One member has built numerous <a href="https://nestwatch.org/learn/all-about-birdhouses/birds/eastern-bluebird/">bluebird boxes</a> and our group has installed and monitored these nest boxes for a number of years. The location (longitude and latitude) of each box is recorded in a database and our goal is to construct a very simple map having:</p>
<ul>
<li>a background basemap showing roads and park boundaries,</li>
<li>each bluebird box shown on the map as a symbol,</li>
<li>the style of the box symbols should depend on the type of nest box - <a href="https://www.birdsandblooms.com/backyard-projects/diy-birdhouse/diy-birdhouse-bluebirds/">traditional</a> vs <a href="http://www.nabluebirdsociety.org/PDF/Peterson.pdf">Peterson</a>.</li>
</ul>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dplyr)    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># General data wrangling</span></span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(sf)       <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Simple features such as points, lines, polygons</span></span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(terra)    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Using raster data</span></span>
<span id="cb1-5"></span>
<span id="cb1-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(maptiles) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fetch map data from places like OpenStreetMap</span></span>
<span id="cb1-7"></span>
<span id="cb1-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggplot2)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Is capable of mapping through geom_sf()</span></span>
<span id="cb1-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggmap)    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fetch map data and plot it</span></span>
<span id="cb1-10"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyterra) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use tidyverse and ggplot2 with terra ojects</span></span>
<span id="cb1-11"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tmap)     <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Static and interactive maps</span></span>
<span id="cb1-12"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(mapsf)    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Static plotting of raster and vector data</span></span>
<span id="cb1-13"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(leaflet)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Interactive maps</span></span></code></pre></div>
</div>
<p>Read the data into a dataframe from a RDS file.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">site_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">readRDS</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/site_df.rds'</span>)</span></code></pre></div>
</div>
<p>List key columns for the nest boxes.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">site_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb3-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(site_name, latitude, longitude, box_type)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 50 × 4
   site_name   latitude longitude box_type   
   &lt;chr&gt;          &lt;dbl&gt;     &lt;dbl&gt; &lt;fct&gt;      
 1 Kestrel Box     42.8     -83.1 Kestrel    
 2 DTLP_K1         42.8     -83.1 Traditional
 3 CIP_K2          42.8     -83.1 Kestrel    
 4 PCT01           42.7     -83.2 Traditional
 5 PCT02           42.7     -83.2 Traditional
 6 PCT04           42.7     -83.2 Traditional
 7 PCT03           42.7     -83.2 Traditional
 8 PCT05           42.7     -83.2 Traditional
 9 PCT06           42.7     -83.2 Traditional
10 PCT07           42.7     -83.2 Traditional
# … with 40 more rows</code></pre>
</div>
</div>
<p>Count the number of nest boxes by box type.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">site_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(box_type) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 3 × 2
# Groups:   box_type [3]
  box_type        n
  &lt;fct&gt;       &lt;int&gt;
1 Kestrel         2
2 Peterson        9
3 Traditional    39</code></pre>
</div>
</div>
</section>
<section id="raster-and-vector-data-for-our-map" class="level2">
<h2 class="anchored" data-anchor-id="raster-and-vector-data-for-our-map">Raster and vector data for our map</h2>
<p>As the background for the map, we want to show roads, park boundaries and other standard street map elements. This can’t simply be an image file as we also need geographic, or spatial, awareness to be able to overlay the nest box location symbols. As you are likely aware, the two major classes of spatial data are <em>vector</em> and <em>raster</em>. Vector data consists of points, lines, polygons and generalizations of them. Raster data is based on a matrix of cells overlaid on a surface.</p>
<p>Imbueing these data types with geographic awareness involves things like choosing a specific coordinate reference system, or CRS (i.e.&nbsp;the meaning of the X and Y coordinates), an underlying model of our earth and a way of projecting 3D objects onto a 2D surface. See Chapters <a href="https://r.geocompx.org/spatial-class.html">2</a> and <a href="">7</a> of the <em>Geocomputation with R</em> book for the details. The <a href="https://docs.qgis.org/3.22/en/docs/gentle_gis_introduction/coordinate_reference_systems.html">QGIS (a free and open source desktop GIS package) documentation</a> also has a good overview of these topics as does the online text <a href="https://mgimond.github.io/Spatial/index.html">Introduction to GIS and Spatial Analysis</a> by Manuel Gimond.</p>
<p>Our underlying base map will be based some sort of raster file and the nest box points will be vector data based on the latitude and longitude columns of <code>site_df</code>. Two questions immediately arise:</p>
<ul>
<li>Where do we get a raster file for the base map of the area we are interested in?</li>
<li>How do we give true <em>spatial awareness</em> to the <code>site_df</code> dataframe?</li>
</ul>
<p>Let’s tackle the second question first.</p>
<section id="simple-features-and-creating-a-geographically-aware-dataframe" class="level3">
<h3 class="anchored" data-anchor-id="simple-features-and-creating-a-geographically-aware-dataframe">Simple features and creating a geographically aware dataframe</h3>
<p>While we have longitude and latitude fields in our dataframe, <code>site_df</code> is not really a spatially aware object. In order to make it easy to do spatial related queries, computations and visualizations, we need to fortify it with some spatial powers. To do this we’ll use the <strong>sf</strong> package.</p>
<blockquote class="blockquote">
<p>A package that provides <a href="https://en.wikipedia.org/wiki/Simple_Features">simple features access</a> for R.</p>
</blockquote>
<p>With <strong>sf</strong> we can represent simple planar features such as points, lines and polygons (and multi versions of these things) and manipulate them in dataframes or tibbles. In the Python world we used the <a href="https://geopandas.org/en/stable/">GeoPandas</a> package (which relied on the <a href="">shapely</a>https://shapely.readthedocs.io/en/stable/ package). Simple features is a general concept supported in many geocomputational tools including things like QGIS and PostGIS.</p>
<p>Before diving into using <strong>sf</strong> with our longitude/latitude fields, let’s create some basic things like points, lines, and polygons on an Cartesian X-Y coordinate plane. The example below is based on the <strong>sf</strong> docs with a bit of commentary more detailed exploration added.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a numeric matrix</span></span>
<span id="cb7-2">p <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbind</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.6</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.8</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.4</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.5</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.8</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.4</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.6</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.9</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.5</span>))</span>
<span id="cb7-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert the matrix to a MULTIPOINT object</span></span>
<span id="cb7-4">(mp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_multipoint</span>(p))</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>MULTIPOINT ((3.2 4), (3 4.6), (3.8 4.4), (3.5 3.8), (3.4 3.6), (3.9 4.5))</code></pre>
</div>
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">class</span>(mp)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "XY"         "MULTIPOINT" "sfg"       </code></pre>
</div>
</div>
<p>So, <code>mp</code> is an <code>sfg</code> object. These <code>sfg</code> objects could be store in an <code>sfc</code> object as a column in a spatially aware dataframe. It also looks like this specific object instance is a MULTIPOINT - of course it is, we used <code>st_multipoint</code>. I’m guessing that the <code>"XY"</code> means that its two dimensional with X and Y dimensions (no Z or M dimensions). The <code>st_multipoint</code> object does not have a CRS associated with it. That will happen later when we create an <strong>sf</strong> dataframe. We can use <code>sf::plot</code>, which just extends <code>base::plot</code> to visualize the points.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(mp, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pch =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bg =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>,</span>
<span id="cb11-2">     <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axes =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"X"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Y"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>What about lines? They are just ordered collections of POINT objects.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">s1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbind</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb12-2">(linestring <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_linestring</span>(s1))</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>LINESTRING (0 3, 0 4, 1 5, 2 5)</code></pre>
</div>
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(linestring, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axes =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"X"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Y"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r_files/figure-html/unnamed-chunk-5-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>You can create a MULTILINESTRING from a collection of lines in a list.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">s2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbind</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.8</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.8</span>))</span>
<span id="cb15-2">s3 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbind</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.4</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb15-3">(mls <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_multilinestring</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(s1, s2, s3)))</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>MULTILINESTRING ((0 3, 0 4, 1 5, 2 5), (0.2 3, 0.2 4, 1 4.8, 2 4.8), (0 4.4, 0.6 5))</code></pre>
</div>
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(mls, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axes =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"X"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Y"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r_files/figure-html/unnamed-chunk-6-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>POLYGON objects are just a set of ordered points that start and end with the same point.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">p1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbind</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>))</span>
<span id="cb18-2">pol1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_polygon</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(p1))</span>
<span id="cb18-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(pol1, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axes =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"X"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Y"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r_files/figure-html/unnamed-chunk-7-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>We can make holes or islands by using a few polygons.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1">p2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbind</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb19-2">pol2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_polygon</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(p1, p2))</span>
<span id="cb19-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(pol2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axes =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"X"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Y"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r_files/figure-html/unnamed-chunk-8-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>To represent a collection of polygons, use MULTIPOLYGON.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">p3 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbind</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>))</span>
<span id="cb20-2">p4 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbind</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.3</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.8</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.8</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.3</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.3</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>))[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,]</span>
<span id="cb20-3">p5 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbind</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))</span>
<span id="cb20-4">(mpol <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_multipolygon</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(p1, p2), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(p3, p4), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(p5))))</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>MULTIPOLYGON (((0 0, 1 0, 3 2, 2 4, 1 4, 0 0), (1 1, 1 2, 2 2, 1 1)), ((3 0, 4 0, 4 1, 3 1, 3 0), (3.3 0.3, 3.3 0.8, 3.8 0.8, 3.8 0.3, 3.3 0.3)), ((3 3, 4 2, 4 3, 3 3)))</code></pre>
</div>
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(mpol, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axes =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"X"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Y"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r_files/figure-html/unnamed-chunk-9-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>And to create a collection of simple features, we can use a GEOMETRYCOLLECTION.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1">(gc <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_geometrycollection</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(mp, mpol, linestring)))</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>GEOMETRYCOLLECTION (MULTIPOINT ((3.2 4), (3 4.6), (3.8 4.4), (3.5 3.8), (3.4 3.6), (3.9 4.5)), MULTIPOLYGON (((0 0, 1 0, 3 2, 2 4, 1 4, 0 0), (1 1, 1 2, 2 2, 1 1)), ((3 0, 4 0, 4 1, 3 1, 3 0), (3.3 0.3, 3.3 0.8, 3.8 0.8, 3.8 0.3, 3.3 0.3)), ((3 3, 4 2, 4 3, 3 3))), LINESTRING (0 3, 0 4, 1 5, 2 5))</code></pre>
</div>
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(gc, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axes =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"X"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Y"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r_files/figure-html/unnamed-chunk-10-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="creating-a-geodataframe" class="level3">
<h3 class="anchored" data-anchor-id="creating-a-geodataframe">Creating a geodataframe</h3>
<p>By creating a column in a dataframe (an <code>sfc</code> object) containing a geometric object (an <code>sfg</code> object) in each row and assigning some sort of CRS, we can get a spatially aware dataframe - a <em>geodataframe</em> you might say (that’s what Python says). The <code>site_df</code> dataframe contains <code>longitude</code> and <code>latitude</code> columns and we can use them with <code>sf::st_as_sf</code> to create a geodataframe. Be careful with the order of longitude and latitude (think X before Y).</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1">site_sf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_as_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> site_df, </span>
<span id="cb26-2">                    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">coords =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"longitude"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"latitude"</span>),</span>
<span id="cb26-3">                    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">crs =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"</span>)</span></code></pre></div>
</div>
<p>The <code>"+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"</code> is what is known as a PROJ4 string and it contains the parameters of a given CRS. The <a href="https://proj.org/usage/quickstart.html">PROJ library</a> is the thing that does transformations between different coordinate reference systems. The <code>+datum=WGS84 +ellps=WGS84</code> part specifies which specific ellipsoid model of the earth to use. There are other, simpler, ways to specify the CRS such as using <a href="https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset">EPSG codes</a>. In this case, <code>crs = 4326</code> gives us the same CRS as the PROJ4 string and is the familiar long/lat system using the WGS84 earth-centered ellipsoid model.</p>
<p>Under the hood, <strong>sf</strong> has built in interfaces to key geospatial libraries such as GDAL (reading and writing geographic data files), PROJ (represent and transform projected CRS), and GEOS (geometric operations). It seems that <strong>sf</strong> interacts directly with these low level libraries instead of using R based wrappers such as <strong>rgdal</strong> and <strong>rgeos</strong>. These wrapper libraries are being deprecated as we speak. Depending on whether you are using Windows, Mac or Linux, the GDAL, PROJ, and GEOS libraries may get installed when you install <strong>sf</strong> or you may need to do some library installation yourself (Linux) - all the details are in the <a href="https://r-spatial.github.io/sf/#installing">installation section of the <strong>sf</strong> docs</a>.</p>
<p>Here’s a look at a few important columns of <code>site_sf</code> for one of the parks. Notice that we don’t have to explicitly select the <code>geometry</code> column as it gets displayed automatically just because it is a geodataframe (it’s an <code>sf</code> object). We can use <code>mutate</code> to show a <a href="https://en.wikipedia.org/wiki/Well-known_text_representation_of_coordinate_reference_systems">WKT (well known text)</a> representation of each point object - in this case we see the longitude and latitude values for each nest box.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1">site_sf <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb27-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(area <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'BCNP'</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb27-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(site_name, box_type) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb27-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">coords_wkt =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_as_text</span>(geometry)) </span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Simple feature collection with 6 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -83.15487 ymin: 42.73292 xmax: -83.154 ymax: 42.7351
CRS:           +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
# A tibble: 6 × 4
  site_name box_type                geometry coords_wkt                
* &lt;chr&gt;     &lt;fct&gt;                &lt;POINT [°]&gt; &lt;chr&gt;                     
1 BCNP01    Traditional   (-83.15468 42.733) POINT (-83.15468 42.733)  
2 BCNP02    Peterson     (-83.1547 42.73292) POINT (-83.1547 42.73292) 
3 BCNP03    Peterson    (-83.15487 42.73436) POINT (-83.15487 42.73436)
4 BCNP04    Traditional (-83.15484 42.73442) POINT (-83.15484 42.73442)
5 BCNP05    Peterson    (-83.15403 42.73501) POINT (-83.15403 42.73501)
6 BCNP06    Traditional    (-83.154 42.7351) POINT (-83.154 42.7351)   </code></pre>
</div>
</div>
</section>
<section id="plotting-geodataframes---sf-ggplot2-and-tmap" class="level3">
<h3 class="anchored" data-anchor-id="plotting-geodataframes---sf-ggplot2-and-tmap">Plotting geodataframes - sf, ggplot2 and tmap</h3>
<p>What happens if we plot the <code>geometry</code> column?</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(site_sf<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>geometry, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axes =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r_files/figure-html/unnamed-chunk-13-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Not the prettiest map, but it’s a start. Since <code>sf::plot</code> is based on the base R plotting system, you might be wondering if we can use <strong>ggplot2</strong> to create this same plot - just in a more aesthetically please ggplot style. Yep, you can use <code>geom_sf</code>. Notice that we don’t need to specify the <code>geometry</code> column. Each <strong>sf</strong> geodataframe has one active default geometry column and that’s the column being used to locate the points on the plot. To plot other geometry columns (<code>gfc</code> objects) just specify them explicitly when using <strong>ggplot2</strong> functions.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> site_sf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#3399FF"</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r_files/figure-html/unnamed-chunk-14-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>What about other mapping packages like <strong>tmap</strong>?</p>
<ul>
<li>similar in spirit and philosophy to <strong>ggplot2</strong> but focused on mapping,</li>
<li>can make both static and interactive maps via mode setting,</li>
<li>can handle both vector and raster objects directly.</li>
</ul>
<p>Let’s figure out how to work with raster files and find a base map for our NestWatch map. Then we’ll use <strong>tmap</strong> (and try out a few other R mapping packages) to create our map.</p>
</section>
<section id="adding-a-raster-base-map" class="level3">
<h3 class="anchored" data-anchor-id="adding-a-raster-base-map">Adding a raster base map</h3>
<p>We return to our first question - where do we get a raster (or vector) file for the base map of the area we are interested in? Ideally, we’d like a free and open licensed raster map showing things like roads, park boundaries and trails in an aesthetically pleasing manner. We aren’t interested in satellite or aerial images, just a simple static background map on which to overlay the nest box points from <code>site_sf</code>.</p>
<p>Turns out there are several possible sources for a basemap for my use case. From GCwR, <a href="https://r.geocompx.org/read-write.html#retrieving-data">8.2 Retrieving open data</a> talks about general sources and <a href="https://r.geocompx.org/read-write.html#geographic-data-packages">8.3 Geographic data packages</a> is about R packages providing access to various open map data sources.</p>
<p>Since we want a map of a pretty limited area, any map retrieval function (in any R package or web service) is going to want a specification of our location of interest. Usually this is done either by specifying a longitude/latitude pair or by a <em>bounding box</em> (left/bottom/right/top). To get the values for this map, I just used OpenStreetMap to find the park I was interested in - <a href="https://www.openstreetmap.org/way/788666584">Bear Creek Nature Park</a> and downloaded the XML file containing the vector data (nodes making up the polygon for the park borders). However, OSM uses its own special data structures for representing vector data and translating it to <strong>sf</strong> objects is a bit tricky to understand. It’s all explained in <a href="https://cran.r-project.org/web/packages/osmdata/vignettes/osm-sf-translation.html">this vignette for the <code>osmdata</code> package</a>.</p>
<p>Let’s read that file with the <strong>osmdata</strong> package.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1">bcnp_osmdata_sf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> osmdata<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">osmdata_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">doc =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'basemaps/bcnp_vector.osm'</span>)</span></code></pre></div>
<div class="cell-output cell-output-error">
<pre><code>Error in eval(expr, envir, enclos): way can not be found</code></pre>
</div>
</div>
<p>This error is related to the way that features are represented in OSM and in the downloaded XML file. Let’s not get bogged down and instead look for alternative ways to get a base map.</p>
<p>When I was downloading the XML file from Open Street Map, I grabbed the geographic coordinates for the bounding box of interest.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># left</span></span>
<span id="cb33-2">xmin <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">83.1567</span></span>
<span id="cb33-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># bottom</span></span>
<span id="cb33-4">ymin <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">42.7314</span></span>
<span id="cb33-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># right</span></span>
<span id="cb33-6">xmax <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">83.1509</span></span>
<span id="cb33-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># top</span></span>
<span id="cb33-8">ymax <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">42.7368</span></span>
<span id="cb33-9"></span>
<span id="cb33-10">bbox <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(xmin, ymin, xmax, ymax)</span>
<span id="cb33-11"></span>
<span id="cb33-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># interior point</span></span>
<span id="cb33-13">longitude <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">83.1510</span></span>
<span id="cb33-14">latitude <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">42.7325</span></span></code></pre></div>
</div>
<p>Let’s start with <strong>ggmap</strong>. From the <a href="https://github.com/dkahle/ggmap">ggmap GitHub repo</a>:</p>
<blockquote class="blockquote">
<p><strong>ggmap</strong> is an R package that makes it easy to retrieve raster map tiles from popular online mapping services like <a href="https://developers.google.com/maps/documentation/maps-static?hl=en">Google Maps</a> and <a href="http://maps.stamen.com/">Stamen Maps</a> and plot them using the <strong>ggplot2</strong> framework:</p>
</blockquote>
<p>Since <strong>ggmap</strong> is built on top of <strong>ggplot2</strong>, you can all of the <code>geom_*</code>’s in <strong>ggplot2</strong>. There is a <code>qmplot</code> function that is analogous to <code>qplot</code> but with a background map automatically included. The map sources that were originally available via <strong>ggmap</strong> include Google, Open Street Map (OSM) and Stamen Maps. Unfortunately, due to OSM usage restrictions, it is no longer a valid source for the <code>get_map</code> function. You can learn more about this issue from one of their GitHub issues:</p>
<ul>
<li><a href="https://github.com/dkahle/ggmap/issues/117">Do not use OpenStreetMap export endpoint #117</a></li>
</ul>
<p>Google Maps needs an API key. I also tried Stamen Maps as the source but got nothing but 404 errors. Let’s try something else. The <a href="https://github.com/riatelab/maptiles">maptiles package</a> looks promising.</p>
<blockquote class="blockquote">
<p>To create maps from tiles, maptiles downloads, composes and displays tiles from a large number of providers (e.g.&nbsp;OpenStreetMap, Stamen, Esri, CARTO, or Thunderforest).</p>
</blockquote>
<p>Let’s get the default with just a bounding box specified. The default provider is Open Street Map and we’ll get the entire tile (or tiles) needed to contain the bounding box.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(maptiles)</span>
<span id="cb34-2"></span>
<span id="cb34-3">bbox_maptiles <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ext</span>(xmin, xmax, ymin, ymax)</span>
<span id="cb34-4">bcnp_basemap_osm_default <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> maptiles<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_tiles</span>(bbox_maptiles)</span></code></pre></div>
</div>
<p>This should be a <code>SpatRaster</code>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1">bcnp_basemap_osm_default</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>class       : SpatRaster 
dimensions  : 430, 584, 3  (nrow, ncol, nlyr)
resolution  : 1.88e-05, 1.88e-05  (x, y)
extent      : -83.16101, -83.15003, 42.73087, 42.73895  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source(s)   : memory
colors RGB  : 1, 2, 3 
names       : red, green, blue 
min values  :  12,    21,    8 
max values  : 255,   255,  255 </code></pre>
</div>
</div>
<p>Plot it.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(bcnp_basemap_osm_default)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r_files/figure-html/unnamed-chunk-19-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>We can either crop it now or we could download with <code>crop = TRUE</code></p>
<div class="cell">
<div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb38-1">bcnp_basemap_osm_default <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">crop</span>(bcnp_basemap_osm_default, bbox_maptiles)</span>
<span id="cb38-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(bcnp_basemap_osm_default)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r_files/figure-html/unnamed-chunk-20-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Now let’s overlay the points. We’ll plot the map using <code>terra::plotRGB</code> as well as <code>terra::plot()</code> (which extends <code>base::plot()</code>).</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb39-1">bcnp_points <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> site_sf <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb39-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(area <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'BCNP'</span>))</span>
<span id="cb39-3"></span>
<span id="cb39-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># plot_tiles() is a wrapper for terra::plotRGB()</span></span>
<span id="cb39-5">terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plotRGB</span>(bcnp_basemap_osm_default)</span>
<span id="cb39-6"></span>
<span id="cb39-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Notice the add=TRUE so that points added to previous plot</span></span>
<span id="cb39-8">terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_geometry</span>(bcnp_points), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">add =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb39-9">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pch =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#3399FF"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bg =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#3399FF"</span>)</span>
<span id="cb39-10"></span>
<span id="cb39-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add credit</span></span>
<span id="cb39-12"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mtext</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_credit</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"OpenStreetMap"</span>),</span>
<span id="cb39-13">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">side =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">line =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">adj =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cex =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>,</span>
<span id="cb39-14">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">font =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r_files/figure-html/unnamed-chunk-21-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Each of the pair of nest boxes consist of one traditional box and one Peterson style box. The point color should correspond to the nest box type. I’m no expert with base plotting, but I’m guessing that the easiest way to do this is to plot separate sets of points for each box type.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb40-1">bcnp_points_traditional <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> site_sf <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb40-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(area <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'BCNP'</span>), box_type <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Traditional'</span>)</span>
<span id="cb40-3"></span>
<span id="cb40-4">bcnp_points_peterson <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> site_sf <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb40-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(area <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'BCNP'</span>), box_type <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Peterson'</span>)</span>
<span id="cb40-6"></span>
<span id="cb40-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># plot_tiles() is a wrapper for terra::plotRGB()</span></span>
<span id="cb40-8">terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plotRGB</span>(bcnp_basemap_osm_default)</span>
<span id="cb40-9"></span>
<span id="cb40-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Notice the add=TRUE so that points added to previous plot</span></span>
<span id="cb40-11">terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_geometry</span>(bcnp_points_traditional), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">add =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb40-12">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pch =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#3399FF"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bg =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#3399FF"</span>)</span>
<span id="cb40-13"></span>
<span id="cb40-14">terra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_geometry</span>(bcnp_points_peterson), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">add =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb40-15">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pch =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#CC3333"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bg =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#CC3333"</span>)</span>
<span id="cb40-16"></span>
<span id="cb40-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add credit</span></span>
<span id="cb40-18"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mtext</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_credit</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"OpenStreetMap"</span>),</span>
<span id="cb40-19">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">side =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">line =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">adj =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cex =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>,</span>
<span id="cb40-20">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">font =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r_files/figure-html/unnamed-chunk-22-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Instead of additional elements to this map, let’s explore other options for making maps from this same data.</p>
</section>
</section>
<section id="other-mapping-packages" class="level2">
<h2 class="anchored" data-anchor-id="other-mapping-packages">Other mapping packages</h2>
<section id="ggplot2---yes-it-can-make-maps" class="level3">
<h3 class="anchored" data-anchor-id="ggplot2---yes-it-can-make-maps">ggplot2 - yes, it can make maps</h3>
<p>As we saw above, the very popular <strong>ggplot2</strong> package has some built in mapping capabilities. The <code>geom_sf()</code> function makes it easy to plot simple features such as the nest box points. What about adding the base map?</p>
<p>The <a href="https://dieghernan.github.io/tidyterra/">tidyterra package</a> provides tidyverse functionality for <strong>terra</strong> <code>SpatVector</code> and <code>SpatRaster</code> objects. This includes <code>geom_</code> functions for <strong>ggplot2</strong>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb41" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb41-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyterra)</span></code></pre></div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb42-1">plot_ggplot <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb42-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_spatraster_rgb</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> bcnp_basemap_osm_default) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb42-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_spatvector</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> bcnp_points, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> box_type, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> box_type), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb42-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggtitle</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"BCNP Nest Box Locations"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb42-5">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Need both the following with same name to harmonize the two legends</span></span>
<span id="cb42-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_discrete</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Nest Box Type"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb42-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_shape_discrete</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Nest Box Type"</span>)</span>
<span id="cb42-8"></span>
<span id="cb42-9">plot_ggplot</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r_files/figure-html/unnamed-chunk-24-1.png" class="img-fluid figure-img" width="720"></p>
</figure>
</div>
</div>
</div>
<p>Well, that was easy. Let’s try <strong>tmap</strong>.</p>
</section>
<section id="tmap---thematic-maps" class="level3">
<h3 class="anchored" data-anchor-id="tmap---thematic-maps">tmap - thematic maps</h3>
<p>The <a href="https://r-tmap.github.io/tmap/"><strong>tmap</strong> package</a> is very <strong>ggplot2</strong>ish. However, it can create both static and interactive maps by setting its <em>mode</em> via <code>tmap_mode("plot")</code> and <code>tmap_mode("view")</code>, respectively.</p>
<p>The main idea is that each distinct geographic item (e.g.&nbsp;a <code>SpatRaster</code> object) is added to the plot with <code>tm_shape()</code> followed by a series of layer functions that specify the specifics of how to display the item.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb43-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tmap)</span></code></pre></div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb44-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set mode for static plot</span></span>
<span id="cb44-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tmap_mode</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"plot"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>tmap mode set to plotting</code></pre>
</div>
<div class="sourceCode cell-code" id="cb46" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb46-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add basemap raster</span></span>
<span id="cb46-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tm_shape</span>(bcnp_basemap_osm_default) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb46-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tm_rgb</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb46-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add nest box points</span></span>
<span id="cb46-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tm_shape</span>(bcnp_points) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb46-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tm_symbols</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"box_type"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb46-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tm_layout</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"BCNP Nest Box Locations"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb46-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tm_compass</span>()</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r_files/figure-html/unnamed-chunk-26-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Even though there are no Kestrel boxes, since it’s a level in the <code>box_type</code> column, it shows up in the legend. Also, if I map symbol shape to <code>box_type</code>, we get the double legend result we saw with <strong>ggplot2</strong>. There are a ton of arguments for the <code>tm_layout()</code> for customizing different aspects of the map layout (e.g.&nbsp;titles, legends, margins, frames, …). These are details for another day.</p>
<p>Overall, it was quite easy to create a basic thematic map with <strong>tmap</strong> and the similarities in philosophy to <strong>ggplot2</strong> should make it attractive to a large number of R users. Additional <strong>tmap</strong> resources include:</p>
<ul>
<li><a href="https://cran.r-project.org/web/packages/tmap/vignettes/tmap-getstarted.html">Getting started tutorial</a> - Hello world for <strong>tmap</strong></li>
<li><a href="https://r.geocompx.org/adv-map.html">Ch 9 of GCwR on Making Maps</a> - <strong>tmap</strong> featured prominently</li>
<li><a href="https://r-tmap.github.io/tmap-book/">Elegant and informative maps with tmap</a> - draft of online book</li>
<li><a href="https://mtennekes.github.io/tmap4/index.html">Sneak peek at v4 of tmap</a> - extendability, more aesthetics support</li>
</ul>
<p>It looks v4 of <strong>tmap</strong> has some large changes. In addition to the major push for extendability by developers, currently <strong>tmap</strong> uses <a href="https://r-spatial.github.io/stars/"><strong>stars</strong> for raster manipulation</a> and v4 is slated to have better support for <strong>terra</strong> raster objects. You can install the <a href="https://github.com/r-tmap/tmap/">development version</a> if you want to check it out.</p>
</section>
<section id="mapsf---another-thematic-cartography-package" class="level3">
<h3 class="anchored" data-anchor-id="mapsf---another-thematic-cartography-package">mapsf - another thematic cartography package</h3>
<p>The <a href="https://riatelab.github.io/mapsf/">mapsf package</a> is the successor to the <a href="https://github.com/riatelab/cartography"><strong>cartography</strong> package</a>.</p>
<blockquote class="blockquote">
<p>Create and integrate thematic maps in your R workflow. This package helps to design various cartographic representations such as proportional symbols, choropleth or typology maps. It also offers several functions to display layout elements that improve the graphic presentation of maps (e.g.&nbsp;scale bar, north arrow, title, labels). mapsf maps sf objects on base graphics.</p>
</blockquote>
<p>While <strong>tmap</strong> seems to be built on <strong>ggplot2</strong> for plotting, <strong>map_sf</strong> relies on extensions to <strong>base</strong> plotting.</p>
<p>There is a <a href="https://riatelab.github.io/mapsf/articles/mapsf.html">Getting Started tutorial</a>. The package has three main groups of functions:</p>
<ul>
<li>Symbology - <code>mf_map</code> is the main function which takes as input an <strong>sf</strong> object, the variables to plot and a plot type.</li>
<li>Map layout - legends, title, scale bar, and more.</li>
<li>Utilities - other stuff</li>
</ul>
<p>Between the Getting Started tutorial and the <a href="https://riatelab.github.io/mapsf/reference/index.html">Reference documentation</a> and a little trial and error, it was pretty easy to put the following map together.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb47" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb47-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(mapsf)</span></code></pre></div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb48" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb48-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mf_theme</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"default"</span>)</span>
<span id="cb48-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Initialize a map using an sf object (which has an extent)</span></span>
<span id="cb48-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mf_init</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> bbox_maptiles)</span>
<span id="cb48-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add the raster background</span></span>
<span id="cb48-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mf_raster</span>(bcnp_basemap_osm_default, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">add =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb48-6"></span>
<span id="cb48-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># plot the points, specifying a vector of symbols and colors for the </span></span>
<span id="cb48-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># levels of the box_type variable.</span></span>
<span id="cb48-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mf_symb</span>(bcnp_points, </span>
<span id="cb48-10">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">var =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"box_type"</span>, </span>
<span id="cb48-11">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pch =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">22</span>),</span>
<span id="cb48-12">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pal =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red1"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"green"</span>),</span>
<span id="cb48-13">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">add =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb48-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># layout</span></span>
<span id="cb48-15"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mf_layout</span>(</span>
<span id="cb48-16">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"BCNP Nest Box Locations"</span>,</span>
<span id="cb48-17">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">credits =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Base raster from OpenStreetMap"</span></span>
<span id="cb48-18">)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r_files/figure-html/unnamed-chunk-28-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="leaflet-for-r---interactive-mapping-with-js-leaflet" class="level3">
<h3 class="anchored" data-anchor-id="leaflet-for-r---interactive-mapping-with-js-leaflet">leaflet for R - interactive mapping with JS leaflet</h3>
<p>The R <a href="https://rstudio.github.io/leaflet/">leaflet package</a> brings Leaflet to R (from JS) so that you can create interactive maps in R.</p>
<blockquote class="blockquote">
<p><a href="https://leafletjs.com/">Leaflet</a> is one of the most popular open-source JavaScript libraries for interactive maps. It’s used by websites ranging from The New York Times and The Washington Post to GitHub and Flickr, as well as GIS specialists like OpenStreetMap, Mapbox, and CartoDB.</p>
</blockquote>
<p>What kind of interactivity is possible? This list is from the <strong>leaflet</strong> docs:</p>
<ul>
<li>Interactive panning/zooming</li>
<li>Compose maps using arbitrary combinations of:
<ul>
<li>Map tiles</li>
<li>Markers</li>
<li>Polygons</li>
<li>Lines</li>
<li>Popups</li>
<li>GeoJSON</li>
</ul></li>
<li>Create maps right from the R console or RStudio</li>
<li>Embed maps in knitr/R Markdown documents and Shiny apps</li>
<li>Easily render spatial objects from the sp or sf packages, or data frames with latitude/longitude columns</li>
<li>Use map bounds and mouse events to drive Shiny logic</li>
<li>Display maps in non spherical mercator projections</li>
<li>Augment map features using chosen plugins from leaflet plugins repository</li>
</ul>
<p>Let’s recreate our simple Nest Watch map with <strong>leaflet</strong>. Creating a map with <strong>leaflet</strong> consists of the same basic steps we’ve seen with the other packages.</p>
<ul>
<li>Create a map object based on some extent or bounding box</li>
<li>Add layers corresponding to the things we want to display on the map</li>
<li>Show the map</li>
</ul>
<div class="cell">
<div class="sourceCode cell-code" id="cb49" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb49-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(leaflet)</span></code></pre></div>
</div>
<p>Let’s create a map by grabbing tiles from OpenStreetMap. Since we want to be able to pan and zoom, we need a dynamic source of map tiles instead of the static raster file we’ve used so far.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb50" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb50-1">m1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">leaflet</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb50-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">addTiles</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span>  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add default OpenStreetMap map tiles</span></span>
<span id="cb50-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">addMarkers</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lng=</span>longitude, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lat=</span>latitude, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">popup=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bear Creek Nature Park"</span>)</span>
<span id="cb50-4">m1  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Print the map</span></span></code></pre></div>
<div class="cell-output-display">
<div class="leaflet html-widget html-fill-item-overflow-hidden html-fill-item" id="htmlwidget-b7a2ac69d48e301e0ed6" style="width:100%;height:464px;"></div>
<script type="application/json" data-for="htmlwidget-b7a2ac69d48e301e0ed6">{"x":{"options":{"crs":{"crsClass":"L.CRS.EPSG3857","code":null,"proj4def":null,"projectedBounds":null,"options":{}}},"calls":[{"method":"addTiles","args":["https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png",null,null,{"minZoom":0,"maxZoom":18,"tileSize":256,"subdomains":"abc","errorTileUrl":"","tms":false,"noWrap":false,"zoomOffset":0,"zoomReverse":false,"opacity":1,"zIndex":1,"detectRetina":false,"attribution":"&copy; <a href=\"https://openstreetmap.org\">OpenStreetMap<\/a> contributors, <a href=\"https://creativecommons.org/licenses/by-sa/2.0/\">CC-BY-SA<\/a>"}]},{"method":"addMarkers","args":[42.7325,-83.151,null,null,null,{"interactive":true,"draggable":false,"keyboard":true,"title":"","alt":"","zIndexOffset":0,"opacity":1,"riseOnHover":false,"riseOffset":250},"Bear Creek Nature Park",null,null,null,null,{"interactive":false,"permanent":false,"direction":"auto","opacity":1,"offset":[0,0],"textsize":"10px","textOnly":false,"className":"","sticky":true},null]}],"limits":{"lat":[42.7325,42.7325],"lng":[-83.151,-83.151]}},"evals":[],"jsHooks":[]}</script>
</div>
</div>
<p>We can pan and zoom. We can click the marker to see the pop-up message. Pretty cool. Let’s add the nest boxes. A few things to note about the code below:</p>
<ul>
<li>uses the <code>colorFactor()</code> convenience function to map box type to colors. Note how its return value is used in the <code>color =</code> argument to <code>addCircleMarkers</code>.</li>
<li><code>addCircleMarkers</code> adds circular markers who size remains constant independent of zoom level.</li>
<li>the <code>values =</code> argument for <code>addLegend</code> seems to fail if I use a formula such as <code>~box_type</code> even though it would seem that the default <code>data = getMapData(map)</code> would inherit the <code>data</code> from the call to <code>addCircleMarkers</code>. See <a href="https://github.com/rstudio/leaflet/issues/485">this GitHub issue</a>.</li>
</ul>
<div class="cell">
<div class="sourceCode cell-code" id="cb51" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb51-1">m2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">leaflet</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb51-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">addTiles</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span>  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add default OpenStreetMap map tiles</span></span>
<span id="cb51-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">addMarkers</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lng=</span>longitude, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lat=</span>latitude, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">popup=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bear Creek Nature Park"</span>)</span>
<span id="cb51-4"></span>
<span id="cb51-5">pal <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colorFactor</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red1"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"green"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">domain =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Peterson"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Traditional"</span>))</span>
<span id="cb51-6">m2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">addCircleMarkers</span>(m2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data=</span>bcnp_points, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pal</span>(box_type), </span>
<span id="cb51-7">                       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">stroke =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fillOpacity =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: sf layer has inconsistent datum (+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs).
Need '+proj=longlat +datum=WGS84'</code></pre>
</div>
<div class="sourceCode cell-code" id="cb53" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb53-1">m2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">addLegend</span>(m2, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bottomright"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pal =</span> pal, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> bcnp_points<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>box_type,</span>
<span id="cb53-2">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Nest Box Types"</span>,</span>
<span id="cb53-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">opacity =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb53-4"></span>
<span id="cb53-5">m2  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Print the map</span></span></code></pre></div>
<div class="cell-output-display">
<div class="leaflet html-widget html-fill-item-overflow-hidden html-fill-item" id="htmlwidget-d8a1cf1160ae9842f28f" style="width:100%;height:464px;"></div>
<script type="application/json" data-for="htmlwidget-d8a1cf1160ae9842f28f">{"x":{"options":{"crs":{"crsClass":"L.CRS.EPSG3857","code":null,"proj4def":null,"projectedBounds":null,"options":{}}},"calls":[{"method":"addTiles","args":["https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png",null,null,{"minZoom":0,"maxZoom":18,"tileSize":256,"subdomains":"abc","errorTileUrl":"","tms":false,"noWrap":false,"zoomOffset":0,"zoomReverse":false,"opacity":1,"zIndex":1,"detectRetina":false,"attribution":"&copy; <a href=\"https://openstreetmap.org\">OpenStreetMap<\/a> contributors, <a href=\"https://creativecommons.org/licenses/by-sa/2.0/\">CC-BY-SA<\/a>"}]},{"method":"addMarkers","args":[42.7325,-83.151,null,null,null,{"interactive":true,"draggable":false,"keyboard":true,"title":"","alt":"","zIndexOffset":0,"opacity":1,"riseOnHover":false,"riseOffset":250},"Bear Creek Nature Park",null,null,null,null,{"interactive":false,"permanent":false,"direction":"auto","opacity":1,"offset":[0,0],"textsize":"10px","textOnly":false,"className":"","sticky":true},null]},{"method":"addCircleMarkers","args":[[42.733,42.73292,42.73436,42.73442,42.73501,42.7351],[-83.15468,-83.1547,-83.15487,-83.15484,-83.15403,-83.154],10,null,null,{"interactive":true,"className":"","stroke":false,"color":["#00FF00","#FF0000","#FF0000","#00FF00","#FF0000","#00FF00"],"weight":5,"opacity":0.5,"fill":true,"fillColor":["#00FF00","#FF0000","#FF0000","#00FF00","#FF0000","#00FF00"],"fillOpacity":0.5},null,null,null,null,null,{"interactive":false,"permanent":false,"direction":"auto","opacity":1,"offset":[0,0],"textsize":"10px","textOnly":false,"className":"","sticky":true},null]},{"method":"addLegend","args":[{"colors":["#FF0000","#00FF00"],"labels":["Peterson","Traditional"],"na_color":null,"na_label":"NA","opacity":1,"position":"bottomright","type":"factor","title":"Nest Box Types","extra":null,"layerId":null,"className":"info legend","group":null}]}],"limits":{"lat":[42.7325,42.7351],"lng":[-83.15487,-83.151]}},"evals":[],"jsHooks":[]}</script>
</div>
</div>
<p>Obviously, this barely scratches the surface of <strong>leaflet</strong> for interactive maps in R. But, it gives us a good place to stop in this Hello World map post.</p>
</section>
<section id="mapview---create-interactive-maps-quickly" class="level3">
<h3 class="anchored" data-anchor-id="mapview---create-interactive-maps-quickly">mapview - create interactive maps quickly</h3>
<p>The R <a href="https://r-spatial.github.io/mapview/index.html">mapview package</a> is all about making it easy to create interactive map visualizations.</p>
<blockquote class="blockquote">
<p>mapview provides functions to very quickly and conveniently create interactive visualisations of spatial data. It’s main goal is to fill the gap of quick (not presentation grade) interactive plotting to examine and visually investigate both aspects of spatial data, the geometries and their attributes.</p>
</blockquote>
<p>Often a one-liner specifying a vector data source is enough to create an interactive map with (list from the docs):</p>
<ul>
<li>a layer control to switch between 5 different background maps</li>
<li>a scale bar</li>
<li>information on mouse cursor position and zoom level of the current view</li>
<li>a zoom-to-layer button to easily navigate to the displayed layer</li>
<li>popups listing all attribute entries of the respective features (when clicked)</li>
<li>labels of the feature IDs (when hovered over)</li>
<li>zoom control buttons provided by the underlying leaflet map</li>
<li>attribution information of the active map layer in the bottom right corner of the map</li>
</ul>
<p>Let’s give it a whirl.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb54" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb54-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(mapview)</span></code></pre></div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb55" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb55-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mapview</span>(bcnp_points, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">zcol=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"box_type"</span>))</span></code></pre></div>
<div class="cell-output-display">
<div class="leaflet html-widget html-fill-item-overflow-hidden html-fill-item" id="htmlwidget-9462f643ee30f8b81ae6" style="width:100%;height:464px;"></div>
<script type="application/json" data-for="htmlwidget-9462f643ee30f8b81ae6">{"x":{"options":{"minZoom":1,"maxZoom":52,"crs":{"crsClass":"L.CRS.EPSG3857","code":null,"proj4def":null,"projectedBounds":null,"options":{}},"preferCanvas":false,"bounceAtZoomLimits":false,"maxBounds":[[[-90,-370]],[[90,370]]]},"calls":[{"method":"addProviderTiles","args":["CartoDB.Positron","CartoDB.Positron","CartoDB.Positron",{"errorTileUrl":"","noWrap":false,"detectRetina":false,"pane":"tilePane"}]},{"method":"addProviderTiles","args":["CartoDB.DarkMatter","CartoDB.DarkMatter","CartoDB.DarkMatter",{"errorTileUrl":"","noWrap":false,"detectRetina":false,"pane":"tilePane"}]},{"method":"addProviderTiles","args":["OpenStreetMap","OpenStreetMap","OpenStreetMap",{"errorTileUrl":"","noWrap":false,"detectRetina":false,"pane":"tilePane"}]},{"method":"addProviderTiles","args":["Esri.WorldImagery","Esri.WorldImagery","Esri.WorldImagery",{"errorTileUrl":"","noWrap":false,"detectRetina":false,"pane":"tilePane"}]},{"method":"addProviderTiles","args":["OpenTopoMap","OpenTopoMap","OpenTopoMap",{"errorTileUrl":"","noWrap":false,"detectRetina":false,"pane":"tilePane"}]},{"method":"createMapPane","args":["point",440]},{"method":"addCircleMarkers","args":[[42.733,42.73292,42.73436,42.73442,42.73501,42.7351],[-83.15468,-83.1547,-83.15487,-83.15484,-83.15403,-83.154],6,null,"bcnp_points - box_type",{"crs":{"crsClass":"L.CRS.EPSG3857","code":null,"proj4def":null,"projectedBounds":null,"options":{}},"pane":"point","stroke":true,"color":"#333333","weight":1,"opacity":[0.9,0.9,0.9,0.9,0.9,0.9],"fill":true,"fillColor":["#FDE333","#009B95","#009B95","#FDE333","#009B95","#FDE333"],"fillOpacity":[0.6,0.6,0.6,0.6,0.6,0.6]},null,null,["<div class='scrollableContainer'><table class=mapview-popup id='popup'><tr class='coord'><td><\/td><th><b>Feature ID&emsp;<\/b><\/th><td>1&emsp;<\/td><\/tr><tr><td>1<\/td><th>site_name&emsp;<\/th><td>BCNP01&emsp;<\/td><\/tr><tr><td>2<\/td><th>substrate&emsp;<\/th><td>nest box / birdhouse&emsp;<\/td><\/tr><tr><td>3<\/td><th>height_above_ground&emsp;<\/th><td>5&emsp;<\/td><\/tr><tr><td>4<\/td><th>height_above_ground_units&emsp;<\/th><td>FALSE&emsp;<\/td><\/tr><tr><td>5<\/td><th>entrance_diameter&emsp;<\/th><td>1.5&emsp;<\/td><\/tr><tr><td>6<\/td><th>entrance_diameter_units&emsp;<\/th><td>in&emsp;<\/td><\/tr><tr><td>7<\/td><th>entrance_orientation&emsp;<\/th><td>se&emsp;<\/td><\/tr><tr><td>8<\/td><th>site_elevation&emsp;<\/th><td>957&emsp;<\/td><\/tr><tr><td>9<\/td><th>site_elevation_units&emsp;<\/th><td>FALSE&emsp;<\/td><\/tr><tr><td>10<\/td><th>habitat_info_1&emsp;<\/th><td>natural grassland and prairie&emsp;<\/td><\/tr><tr><td>11<\/td><th>habitat_info_2&emsp;<\/th><td>woodland/forest&emsp;<\/td><\/tr><tr><td>12<\/td><th>habitat_info_3&emsp;<\/th><td>NA&emsp;<\/td><\/tr><tr><td>13<\/td><th>nest_attempts&emsp;<\/th><td>5&emsp;<\/td><\/tr><tr><td>14<\/td><th>comments&emsp;<\/th><td>Traditional style bluebird nest box&emsp;<\/td><\/tr><tr><td>15<\/td><th>box_type&emsp;<\/th><td>Traditional&emsp;<\/td><\/tr><tr><td>16<\/td><th>area&emsp;<\/th><td>BCNP&emsp;<\/td><\/tr><tr><td>17<\/td><th>geometry&emsp;<\/th><td>sfc_POINT&emsp;<\/td><\/tr><\/table><\/div>","<div class='scrollableContainer'><table class=mapview-popup id='popup'><tr class='coord'><td><\/td><th><b>Feature ID&emsp;<\/b><\/th><td>2&emsp;<\/td><\/tr><tr><td>1<\/td><th>site_name&emsp;<\/th><td>BCNP02&emsp;<\/td><\/tr><tr><td>2<\/td><th>substrate&emsp;<\/th><td>nest box / birdhouse&emsp;<\/td><\/tr><tr><td>3<\/td><th>height_above_ground&emsp;<\/th><td>5&emsp;<\/td><\/tr><tr><td>4<\/td><th>height_above_ground_units&emsp;<\/th><td>FALSE&emsp;<\/td><\/tr><tr><td>5<\/td><th>entrance_diameter&emsp;<\/th><td>1.5&emsp;<\/td><\/tr><tr><td>6<\/td><th>entrance_diameter_units&emsp;<\/th><td>in&emsp;<\/td><\/tr><tr><td>7<\/td><th>entrance_orientation&emsp;<\/th><td>se&emsp;<\/td><\/tr><tr><td>8<\/td><th>site_elevation&emsp;<\/th><td>957&emsp;<\/td><\/tr><tr><td>9<\/td><th>site_elevation_units&emsp;<\/th><td>FALSE&emsp;<\/td><\/tr><tr><td>10<\/td><th>habitat_info_1&emsp;<\/th><td>natural grassland and prairie&emsp;<\/td><\/tr><tr><td>11<\/td><th>habitat_info_2&emsp;<\/th><td>woodland/forest&emsp;<\/td><\/tr><tr><td>12<\/td><th>habitat_info_3&emsp;<\/th><td>NA&emsp;<\/td><\/tr><tr><td>13<\/td><th>nest_attempts&emsp;<\/th><td>8&emsp;<\/td><\/tr><tr><td>14<\/td><th>comments&emsp;<\/th><td>Peterson nest box&emsp;<\/td><\/tr><tr><td>15<\/td><th>box_type&emsp;<\/th><td>Peterson&emsp;<\/td><\/tr><tr><td>16<\/td><th>area&emsp;<\/th><td>BCNP&emsp;<\/td><\/tr><tr><td>17<\/td><th>geometry&emsp;<\/th><td>sfc_POINT&emsp;<\/td><\/tr><\/table><\/div>","<div class='scrollableContainer'><table class=mapview-popup id='popup'><tr class='coord'><td><\/td><th><b>Feature ID&emsp;<\/b><\/th><td>3&emsp;<\/td><\/tr><tr><td>1<\/td><th>site_name&emsp;<\/th><td>BCNP03&emsp;<\/td><\/tr><tr><td>2<\/td><th>substrate&emsp;<\/th><td>nest box / birdhouse&emsp;<\/td><\/tr><tr><td>3<\/td><th>height_above_ground&emsp;<\/th><td>5&emsp;<\/td><\/tr><tr><td>4<\/td><th>height_above_ground_units&emsp;<\/th><td>FALSE&emsp;<\/td><\/tr><tr><td>5<\/td><th>entrance_diameter&emsp;<\/th><td>1.5&emsp;<\/td><\/tr><tr><td>6<\/td><th>entrance_diameter_units&emsp;<\/th><td>in&emsp;<\/td><\/tr><tr><td>7<\/td><th>entrance_orientation&emsp;<\/th><td>se&emsp;<\/td><\/tr><tr><td>8<\/td><th>site_elevation&emsp;<\/th><td>957&emsp;<\/td><\/tr><tr><td>9<\/td><th>site_elevation_units&emsp;<\/th><td>FALSE&emsp;<\/td><\/tr><tr><td>10<\/td><th>habitat_info_1&emsp;<\/th><td>natural grassland and prairie&emsp;<\/td><\/tr><tr><td>11<\/td><th>habitat_info_2&emsp;<\/th><td>woodland/forest&emsp;<\/td><\/tr><tr><td>12<\/td><th>habitat_info_3&emsp;<\/th><td>NA&emsp;<\/td><\/tr><tr><td>13<\/td><th>nest_attempts&emsp;<\/th><td>7&emsp;<\/td><\/tr><tr><td>14<\/td><th>comments&emsp;<\/th><td>Peterson nest box&emsp;<\/td><\/tr><tr><td>15<\/td><th>box_type&emsp;<\/th><td>Peterson&emsp;<\/td><\/tr><tr><td>16<\/td><th>area&emsp;<\/th><td>BCNP&emsp;<\/td><\/tr><tr><td>17<\/td><th>geometry&emsp;<\/th><td>sfc_POINT&emsp;<\/td><\/tr><\/table><\/div>","<div class='scrollableContainer'><table class=mapview-popup id='popup'><tr class='coord'><td><\/td><th><b>Feature ID&emsp;<\/b><\/th><td>4&emsp;<\/td><\/tr><tr><td>1<\/td><th>site_name&emsp;<\/th><td>BCNP04&emsp;<\/td><\/tr><tr><td>2<\/td><th>substrate&emsp;<\/th><td>nest box / birdhouse&emsp;<\/td><\/tr><tr><td>3<\/td><th>height_above_ground&emsp;<\/th><td>5&emsp;<\/td><\/tr><tr><td>4<\/td><th>height_above_ground_units&emsp;<\/th><td>FALSE&emsp;<\/td><\/tr><tr><td>5<\/td><th>entrance_diameter&emsp;<\/th><td>1.5&emsp;<\/td><\/tr><tr><td>6<\/td><th>entrance_diameter_units&emsp;<\/th><td>in&emsp;<\/td><\/tr><tr><td>7<\/td><th>entrance_orientation&emsp;<\/th><td>se&emsp;<\/td><\/tr><tr><td>8<\/td><th>site_elevation&emsp;<\/th><td>958&emsp;<\/td><\/tr><tr><td>9<\/td><th>site_elevation_units&emsp;<\/th><td>FALSE&emsp;<\/td><\/tr><tr><td>10<\/td><th>habitat_info_1&emsp;<\/th><td>natural grassland and prairie&emsp;<\/td><\/tr><tr><td>11<\/td><th>habitat_info_2&emsp;<\/th><td>woodland/forest&emsp;<\/td><\/tr><tr><td>12<\/td><th>habitat_info_3&emsp;<\/th><td>NA&emsp;<\/td><\/tr><tr><td>13<\/td><th>nest_attempts&emsp;<\/th><td>4&emsp;<\/td><\/tr><tr><td>14<\/td><th>comments&emsp;<\/th><td>Traditional style nest box&emsp;<\/td><\/tr><tr><td>15<\/td><th>box_type&emsp;<\/th><td>Traditional&emsp;<\/td><\/tr><tr><td>16<\/td><th>area&emsp;<\/th><td>BCNP&emsp;<\/td><\/tr><tr><td>17<\/td><th>geometry&emsp;<\/th><td>sfc_POINT&emsp;<\/td><\/tr><\/table><\/div>","<div class='scrollableContainer'><table class=mapview-popup id='popup'><tr class='coord'><td><\/td><th><b>Feature ID&emsp;<\/b><\/th><td>5&emsp;<\/td><\/tr><tr><td>1<\/td><th>site_name&emsp;<\/th><td>BCNP05&emsp;<\/td><\/tr><tr><td>2<\/td><th>substrate&emsp;<\/th><td>nest box / birdhouse&emsp;<\/td><\/tr><tr><td>3<\/td><th>height_above_ground&emsp;<\/th><td>5&emsp;<\/td><\/tr><tr><td>4<\/td><th>height_above_ground_units&emsp;<\/th><td>FALSE&emsp;<\/td><\/tr><tr><td>5<\/td><th>entrance_diameter&emsp;<\/th><td>1.5&emsp;<\/td><\/tr><tr><td>6<\/td><th>entrance_diameter_units&emsp;<\/th><td>in&emsp;<\/td><\/tr><tr><td>7<\/td><th>entrance_orientation&emsp;<\/th><td>se&emsp;<\/td><\/tr><tr><td>8<\/td><th>site_elevation&emsp;<\/th><td>950&emsp;<\/td><\/tr><tr><td>9<\/td><th>site_elevation_units&emsp;<\/th><td>FALSE&emsp;<\/td><\/tr><tr><td>10<\/td><th>habitat_info_1&emsp;<\/th><td>natural grassland and prairie&emsp;<\/td><\/tr><tr><td>11<\/td><th>habitat_info_2&emsp;<\/th><td>natural grassland and prairie&emsp;<\/td><\/tr><tr><td>12<\/td><th>habitat_info_3&emsp;<\/th><td>NA&emsp;<\/td><\/tr><tr><td>13<\/td><th>nest_attempts&emsp;<\/th><td>8&emsp;<\/td><\/tr><tr><td>14<\/td><th>comments&emsp;<\/th><td>Peterson nest box&emsp;<\/td><\/tr><tr><td>15<\/td><th>box_type&emsp;<\/th><td>Peterson&emsp;<\/td><\/tr><tr><td>16<\/td><th>area&emsp;<\/th><td>BCNP&emsp;<\/td><\/tr><tr><td>17<\/td><th>geometry&emsp;<\/th><td>sfc_POINT&emsp;<\/td><\/tr><\/table><\/div>","<div class='scrollableContainer'><table class=mapview-popup id='popup'><tr class='coord'><td><\/td><th><b>Feature ID&emsp;<\/b><\/th><td>6&emsp;<\/td><\/tr><tr><td>1<\/td><th>site_name&emsp;<\/th><td>BCNP06&emsp;<\/td><\/tr><tr><td>2<\/td><th>substrate&emsp;<\/th><td>nest box / birdhouse&emsp;<\/td><\/tr><tr><td>3<\/td><th>height_above_ground&emsp;<\/th><td>5&emsp;<\/td><\/tr><tr><td>4<\/td><th>height_above_ground_units&emsp;<\/th><td>FALSE&emsp;<\/td><\/tr><tr><td>5<\/td><th>entrance_diameter&emsp;<\/th><td>1.5&emsp;<\/td><\/tr><tr><td>6<\/td><th>entrance_diameter_units&emsp;<\/th><td>in&emsp;<\/td><\/tr><tr><td>7<\/td><th>entrance_orientation&emsp;<\/th><td>se&emsp;<\/td><\/tr><tr><td>8<\/td><th>site_elevation&emsp;<\/th><td>948&emsp;<\/td><\/tr><tr><td>9<\/td><th>site_elevation_units&emsp;<\/th><td>FALSE&emsp;<\/td><\/tr><tr><td>10<\/td><th>habitat_info_1&emsp;<\/th><td>natural grassland and prairie&emsp;<\/td><\/tr><tr><td>11<\/td><th>habitat_info_2&emsp;<\/th><td>woodland/forest&emsp;<\/td><\/tr><tr><td>12<\/td><th>habitat_info_3&emsp;<\/th><td>NA&emsp;<\/td><\/tr><tr><td>13<\/td><th>nest_attempts&emsp;<\/th><td>4&emsp;<\/td><\/tr><tr><td>14<\/td><th>comments&emsp;<\/th><td>Traditional style nest box&emsp;<\/td><\/tr><tr><td>15<\/td><th>box_type&emsp;<\/th><td>Traditional&emsp;<\/td><\/tr><tr><td>16<\/td><th>area&emsp;<\/th><td>BCNP&emsp;<\/td><\/tr><tr><td>17<\/td><th>geometry&emsp;<\/th><td>sfc_POINT&emsp;<\/td><\/tr><\/table><\/div>"],{"maxWidth":800,"minWidth":50,"autoPan":true,"keepInView":false,"closeButton":true,"closeOnClick":true,"className":""},["Traditional","Peterson","Peterson","Traditional","Peterson","Traditional"],{"interactive":false,"permanent":false,"direction":"auto","opacity":1,"offset":[0,0],"textsize":"10px","textOnly":false,"className":"","sticky":true},null]},{"method":"addScaleBar","args":[{"maxWidth":100,"metric":true,"imperial":true,"updateWhenIdle":true,"position":"bottomleft"}]},{"method":"addHomeButton","args":[-83.15487,42.73292,-83.154,42.7351,true,"bcnp_points - box_type","Zoom to bcnp_points - box_type","<strong> bcnp_points - box_type <\/strong>","bottomright"]},{"method":"addLayersControl","args":[["CartoDB.Positron","CartoDB.DarkMatter","OpenStreetMap","Esri.WorldImagery","OpenTopoMap"],"bcnp_points - box_type",{"collapsed":true,"autoZIndex":true,"position":"topleft"}]},{"method":"addLegend","args":[{"colors":["#A6BF6F","#FDE333"],"labels":["Peterson","Traditional"],"na_color":null,"na_label":"NA","opacity":1,"position":"topright","type":"factor","title":"bcnp_points - box_type","extra":null,"layerId":null,"className":"info legend","group":"bcnp_points - box_type"}]}],"limits":{"lat":[42.73292,42.7351],"lng":[-83.15487,-83.154]},"fitBounds":[42.73292,-83.15487,42.7351,-83.154,[]]},"evals":[],"jsHooks":{"render":[{"code":"function(el, x, data) {\n  return (\n      function(el, x, data) {\n      // get the leaflet map\n      var map = this; //HTMLWidgets.find('#' + el.id);\n      // we need a new div element because we have to handle\n      // the mouseover output separately\n      // debugger;\n      function addElement () {\n      // generate new div Element\n      var newDiv = $(document.createElement('div'));\n      // append at end of leaflet htmlwidget container\n      $(el).append(newDiv);\n      //provide ID and style\n      newDiv.addClass('lnlt');\n      newDiv.css({\n      'position': 'relative',\n      'bottomleft':  '0px',\n      'background-color': 'rgba(255, 255, 255, 0.7)',\n      'box-shadow': '0 0 2px #bbb',\n      'background-clip': 'padding-box',\n      'margin': '0',\n      'padding-left': '5px',\n      'color': '#333',\n      'font': '9px/1.5 \"Helvetica Neue\", Arial, Helvetica, sans-serif',\n      'z-index': '700',\n      });\n      return newDiv;\n      }\n\n\n      // check for already existing lnlt class to not duplicate\n      var lnlt = $(el).find('.lnlt');\n\n      if(!lnlt.length) {\n      lnlt = addElement();\n\n      // grab the special div we generated in the beginning\n      // and put the mousmove output there\n\n      map.on('mousemove', function (e) {\n      if (e.originalEvent.ctrlKey) {\n      if (document.querySelector('.lnlt') === null) lnlt = addElement();\n      lnlt.text(\n                           ' lon: ' + (e.latlng.lng).toFixed(5) +\n                           ' | lat: ' + (e.latlng.lat).toFixed(5) +\n                           ' | zoom: ' + map.getZoom() +\n                           ' | x: ' + L.CRS.EPSG3857.project(e.latlng).x.toFixed(0) +\n                           ' | y: ' + L.CRS.EPSG3857.project(e.latlng).y.toFixed(0) +\n                           ' | epsg: 3857 ' +\n                           ' | proj4: +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs ');\n      } else {\n      if (document.querySelector('.lnlt') === null) lnlt = addElement();\n      lnlt.text(\n                      ' lon: ' + (e.latlng.lng).toFixed(5) +\n                      ' | lat: ' + (e.latlng.lat).toFixed(5) +\n                      ' | zoom: ' + map.getZoom() + ' ');\n      }\n      });\n\n      // remove the lnlt div when mouse leaves map\n      map.on('mouseout', function (e) {\n      var strip = document.querySelector('.lnlt');\n      if( strip !==null) strip.remove();\n      });\n\n      };\n\n      //$(el).keypress(67, function(e) {\n      map.on('preclick', function(e) {\n      if (e.originalEvent.ctrlKey) {\n      if (document.querySelector('.lnlt') === null) lnlt = addElement();\n      lnlt.text(\n                      ' lon: ' + (e.latlng.lng).toFixed(5) +\n                      ' | lat: ' + (e.latlng.lat).toFixed(5) +\n                      ' | zoom: ' + map.getZoom() + ' ');\n      var txt = document.querySelector('.lnlt').textContent;\n      console.log(txt);\n      //txt.innerText.focus();\n      //txt.select();\n      setClipboardText('\"' + txt + '\"');\n      }\n      });\n\n      }\n      ).call(this.getMap(), el, x, data);\n}","data":null},{"code":"function(el, x, data) {\n  return (function(el,x,data){\n           var map = this;\n\n           map.on('keypress', function(e) {\n               console.log(e.originalEvent.code);\n               var key = e.originalEvent.code;\n               if (key === 'KeyE') {\n                   var bb = this.getBounds();\n                   var txt = JSON.stringify(bb);\n                   console.log(txt);\n\n                   setClipboardText('\\'' + txt + '\\'');\n               }\n           })\n        }).call(this.getMap(), el, x, data);\n}","data":null}]}}</script>
</div>
</div>
<p>That was impressive! Play around with changing the background layer - if you set it to OpenStreetMap, you’ll see our familiar map. Given that it’s a park, the OpenTopoMap option is a nice one. And, we didn’t even need to go find our own basemaps.</p>
</section>
</section>
<section id="in-summary" class="level2">
<h2 class="anchored" data-anchor-id="in-summary">In summary</h2>
<p>Having not done mapping in R for quite some time, it was exciting to see how this space has evolved. Key libraries are being updated (or replaced) and there are a multitude of mapping and geocomputation packages. There are also several nice online books for learning about doing geocomputation in R. The parallels with the Python geocomputation community make it pretty easy to wrap your brain around key concepts and the relationship between the various R and Python libraries (e.g.&nbsp;<strong>sf</strong> and <strong>GeoPandas</strong>). At this point, I don’t really have a clear preference between R and Python for geocomputation and mapping. I imagine I’ll end up using both quite a bit and plan to keep doing geonewb blog posts that focus on both.</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{isken2023,
  author = {Isken, Mark},
  title = {Hello World Maps in {R} Using Both Raster and Vector Data},
  date = {2023-04-03},
  url = {https://bitsofanalytics.org//posts/hello_world_mapping_r/hello_world_map_r.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-isken2023" class="csl-entry quarto-appendix-citeas">
Isken, Mark. 2023. <span>“Hello World Maps in R Using Both Raster and
Vector Data.”</span> April 3, 2023. <a href="https://bitsofanalytics.org//posts/hello_world_mapping_r/hello_world_map_r.html">https://bitsofanalytics.org//posts/hello_world_mapping_r/hello_world_map_r.html</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>geonewb</category>
  <category>mapping</category>
  <category>geocomputation</category>
  <category>birding</category>
  <category>nestwatch</category>
  <guid>https://bitsofanalytics.org/posts/hello_world_mapping_r/hello_world_map_r.html</guid>
  <pubDate>Mon, 03 Apr 2023 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Using tidymodels for simulation metamodeling</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/tidymodels-mm/tidymodels_mm.html</link>
  <description><![CDATA[ 



<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggplot2)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dplyr)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidymodels)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(MLmetrics)</span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(stringr)</span></code></pre></div>
</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>A few years ago I wrote a multipart blog series on using R to do simulation metamodeling. The blog posts were inspired by a research project in which I was using queueing inspired feature engineering to try to improve metamodel accuracy and interpretability. The setting for the study was a simulation of inpatient flow through hospital obstetrical units. That study also was an opportunity to learn about the <a href="https://topepo.github.io/caret/index.html">caret</a> package and do some more advanced R programming to automate model fitting and assessment with caret.</p>
<ul>
<li><a href="https://bitsofanalytics.org/posts/obsim-caret-1/obsim_caret_part1.html">Comparing predictive models for obstetrical unit occupancy using caret - Part 1</a></li>
<li><a href="https://bitsofanalytics.org/posts/obsim-caret-2/obsim_caret_part2.html">Comparing predictive model performance using caret - Part 2: A simple caret automation function</a></li>
<li><a href="https://bitsofanalytics.org/posts/obsim-caret-3/obsim_caret_part1.htm3">Comparing predictive model performance using caret - Part 3: Put it all together</a></li>
</ul>
<p>Eventually, I redid the entire study in Python and along with my co-authors, Osman Aydas and Yazan Roumani, turned it into a paper that just got accepted in the <a href="https://www.tandfonline.com/journals/tjsm20">Journal of Simulation</a> (not sure when it will appear but will update link when it does). For the new study, I rewrote the discrete event simulation model using <a href="https://simpy.readthedocs.io/en/latest/">SimPy</a> and we have released both the model and all of the metamodeling machinery as an open source project so that others can see exactly what we did and to show that everything is reproducible. Here a links to that repo as well as some blog posts I did on using SimPy for patient flow simulation modeling.</p>
<ul>
<li><a href="https://github.com/misken/obflowsim-mm">obflowsim-mm</a></li>
<li><a href="https://bitsofanalytics.org/posts/simpy-getting-started-patflow-model/simpy-getting-started.html">Getting started with SimPy for patient flow modeling</a></li>
<li><a href="https://bitsofanalytics.org/posts/simpy-oo-patflow-model/simpy-oo-patflow-model.html">An object oriented SimPy patient flow simulation model</a></li>
</ul>
<p>In addition to this inpatient flow model, we also created a SimPy model of an outpatient clinic and did another set of metamodeling experiments using it. For the clinic model, I decided to take the opportunity to become more familiar with the the R <a href="https://www.tidymodels.org/">tidymodels</a> package. I thought it would be a good idea to document some of the challenges and benefits of using tidymodels.</p>
</section>
<section id="background-on-the-simulation-study" class="level2">
<h2 class="anchored" data-anchor-id="background-on-the-simulation-study">Background on the simulation study</h2>
<p>Long ago, when I worked for a large healthcare system, I was involved in a project in which we were building decision support tools for clinic management. Our goal was to have a relatively simple, spreadsheet based tool, that could be used to assess the performance implications of various combinations of key clinic demand and resource related variables. For, example, we wanted to be able to see how increasing the number of patients scheduled in a four hour clinic block or the number of exam rooms per physician would impact things like patient wait times, resource utilization, and the length of time needed to care for all of the patients (since running the clinic past the scheduled closing time resulted in undesirable staffing related consequences and costs).</p>
<p>We developed a discrete event simulation model and ran a series of experiments in which we systematically varied key inputs and tracked key performance measures. Here’s a flow chart for which the simulation model was developed:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://bitsofanalytics.org/posts/tidymodels-mm/images/clinic_flow.png" class="img-fluid figure-img"></p>
<p></p><figcaption class="figure-caption">Clinic flow</figcaption><p></p>
</figure>
</div>
<p>The experimental design included the following input variables as their associated levels:</p>
<ul>
<li>number of medical technicians (2, 3, 4, 5)</li>
<li>number of exam rooms per physician (1, 2, 3)</li>
<li>mean number of minutes to complete the vital signs portion of the exam by the support staff (6, 9, 12 minutes)</li>
<li>mean number of minutes to complete the exam by the physician (10, 15, 20 mins)</li>
<li>the coefficient of variation squared of the exam time (1.0, 0.5, 0.2)</li>
<li>mean number of minutes to complete post-exam portion of the visit by the support staff (2, 5, 8 minutes)</li>
<li>number of patients in the 4 hour clinic session - patients arrive in groups of two at fixed intervals based on number of total patients. The 5 arrival levels depend on the exam time mean:
<ul>
<li>10 min: 32, 36, 40, 44, 48 patients</li>
<li>15 min: 24, 28, 32, 36, 40 patients</li>
<li>20 min: 16, 20, 24, 28, 32 patients</li>
</ul></li>
</ul>
<p>The design above leads to 4860 scenarios that were simulated.</p>
<p>The key performance measures (output variables) were:</p>
<ul>
<li>initial wait time for patient to see med tech for vital signs</li>
<li>mean time until patient saw physician to start exam</li>
<li>mean total time in clinic for the patient</li>
<li>end of the clinic day time</li>
</ul>
<p>We then developed an Excel based tool that simply did “table lookups” to create plots that showed how the performance measures varies across the levels of the input variables. That original study was then followed by a research project into simulation metamodels for this same clinic experiment. That led to a <a href="https://www.researchgate.net/publication/362805793_An_outpatient_clinic_simulation_metamodeling_odyssey">presentation at the INFORMS 2006 Annual Conference</a>.</p>
<p>Now I am revisiting this same scenario again for this current paper. However, now our goal is to see how queueing inspired feature engineering can improve various simulation metamodeling approaches such as polynomial regression, cubic splines, and random forests. In addition, a nonlinear queueing based power model was also developed - it has far few parameters than the polynomial or spline models and is much more transparent than any of the other modeling approaches.</p>
<p>I entirely rewrote the simulation model in Python (using SimPy) as well as all of the data input machinery for the metamodeling. Then, since I’ve been wanting to explore the tidymodels R package, I used R for the modeling fitting and output analysis. The simulation model can be found at my <a href="https://github.com/misken/op_clinic">op_clinic repo</a>.</p>
<p>In this blog post, I’m just going to focus on using the tidymodels package for model fitting and evaluation. We’ll consider a pretty straightforward case - the polynomial regression model, as well as a more complicated case - the nonlinear power model. The complicating factor in using tidymodels for the nonlinear model is that the <code>nls()</code> function is not yet supported as a built in <em>model engine</em> in tidymodels for general nonlinear regression modeling.</p>
</section>
<section id="about-the-queueing-inspired-terms" class="level2">
<h2 class="anchored" data-anchor-id="about-the-queueing-inspired-terms">About the queueing inspired terms</h2>
<p>In the OB patient flow metamodels featured in my previous blog posts, the queueing inspired terms were things related to steady state analysis of the tandem queueing system and relied on things like overall resource load and utilization and steady state queueing results for <img src="https://latex.codecogs.com/png.latex?M/G/s"> queueing systems. In contrast, this outpatient modeling problem involves transient analysis (a 4hr clinic block) of a system in which a finite number of scheduled patients arrive for care. Thus, steady state results are <strong>not</strong> relevant and we can (and do) intentionally overload the system since there are a finite number of patients who will eventually go through the clinic. Obviously, the more you overload the system, the longer the patient wait times and the longer the system has to operate beyond the 4 hour planned end of the day to clear all of the patients.</p>
<p>So, the queuing inspired features for this problem uses known results for mean wait time in transient <img src="https://latex.codecogs.com/png.latex?D/M/1"> queues (deterministic arrivals, exponentially distributed service times and a single server). In addition, staff, room and physician offered utilization terms were also included. Unlike in the steady state based scenario of the OB patient flow model, these terms could be greater than 1 (again, an overloaded system). All of the feature engineering was done as part of the Python simulation output analysis. The result is a csv file, <code>xy_q.csv</code> that contains all of the base inputs, queueing related features, and output measures. Our paper (link forthcoming when it’s eventually published, we hope) has all the details.</p>
<p>Let’s dive into tidymodels.</p>
</section>
<section id="overview-of-the-analysis-with-tidymodels" class="level2">
<h2 class="anchored" data-anchor-id="overview-of-the-analysis-with-tidymodels">Overview of the analysis with tidymodels</h2>
<p>Let’s just focus on one of the performance measures - initial wait time for patient to get staff to begin the clinic care process.</p>
<p>For this blog post we’ll fit three totally different types of metamodels:</p>
<ul>
<li>polynomial regression</li>
<li>nonlinear multiplicative power model based on queueing inspired features</li>
<li>random forest</li>
</ul>
<p>For the polynomial models, I was able to use the tidymodels package to create workflows that made it easy to fit a model using repeated k-crossfold resampling and then compute RMSE and an actual vs predicted plot for each model. The <a href="https://www.tmwr.org/">Tidy Modeling with R</a> online book by Kuhn and Silge provides a very good introduction to the tidymodels package and how its consitutient packages can be used for different parts of the modeling process.</p>
<p>For those who have used the <a href="https://scikit-learn.org/stable/index.html">scikit-learn</a> package in Python, tidymodels will feel familiar except that different parts of the modeling process are handled by different packages under the tidymodels meta-package:</p>
<ul>
<li>“empty” model objects are created (<a href="https://parsnip.tidymodels.org/">parsnip</a>)</li>
<li>adding model formulas and various preprocessing steps are done by creating a <em>recipe</em> (<a href="https://recipes.tidymodels.org/">recipes</a>)</li>
<li>resampling schemes such as k-crossfold validation or bootstrapping (<a href="https://rsample.tidymodels.org/">rsample</a>)</li>
<li>hyperparameter tuning (<a href="https://tune.tidymodels.org/">tune</a> and <a href="https://dials.tidymodels.org/">dials</a>)</li>
<li>model fitting (<a href="https://parsnip.tidymodels.org/">parsnip</a> and <a href="https://tune.tidymodels.org/">tune</a>)</li>
<li>model predictions (<a href="https://parsnip.tidymodels.org/">parsnip</a>)</li>
<li>model assessment (<a href="https://yardstick.tidymodels.org/">yardstick</a> and <a href="https://tune.tidymodels.org/">tune</a>)</li>
<li>put it all together into a modeling workflow, similar to a pipeline in scikit-learn (<a href="https://workflows.tidymodels.org/">workflows</a>)</li>
</ul>
<p>While the presence of so many different packages can be daunting at first, by loading tidymodels you get access to all of the packages. The details of which package does what became very important for this project when we needed to do some deep digging into the API and even the source code to figure out where things were going awry. More on this later.</p>
</section>
<section id="reading-in-of-the-full-input-output-dataset" class="level2">
<h2 class="anchored" data-anchor-id="reading-in-of-the-full-input-output-dataset">Reading in of the full input-output dataset</h2>
<p>First read in the matrix that contains all possible predictors as well as all possible target variables. Note that some of the predictors are the queueing inspired terms and others are the base inputs used in the simulation experimental design.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">xy_q <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read.csv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/xy_q.csv'</span>)</span></code></pre></div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str</span>(xy_q)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>'data.frame':   4860 obs. of  32 variables:
 $ patients_per_clinic_block: int  32 36 40 44 48 24 28 32 36 40 ...
 $ num_med_techs            : int  2 2 2 2 2 2 2 2 2 2 ...
 $ num_rooms                : int  2 2 2 2 2 2 2 2 2 2 ...
 $ num_rooms_per_provider   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ num_physicians           : int  2 2 2 2 2 2 2 2 2 2 ...
 $ vitals_time_mean         : int  6 6 6 6 6 6 6 6 6 6 ...
 $ vitals_time_cv2          : num  0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 ...
 $ exam_time_mean           : int  10 10 10 10 10 15 15 15 15 15 ...
 $ exam_time_cv2            : num  1 1 1 1 1 1 1 1 1 1 ...
 $ post_exam_time_mean      : int  2 2 2 2 2 2 2 2 2 2 ...
 $ post_exam_time_cv2       : num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
 $ room_turnover_time_mean  : num  2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 ...
 $ room_turnover_time_cv2   : num  0.143 0.143 0.143 0.143 0.143 ...
 $ off_util_staff           : num  0.7 0.787 0.875 0.963 1.05 ...
 $ off_util_physician       : num  0.667 0.75 0.833 0.917 1 ...
 $ off_util_room            : num  0.967 1.087 1.208 1.329 1.45 ...
 $ staff_eff_ia_time_mean   : num  15 13.3 12 10.9 10 ...
 $ staff_eff_svc_time_mean  : num  10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 ...
 $ staff_eff_svc_time_cv2   : num  0.419 0.419 0.419 0.419 0.419 ...
 $ exam_eff_ia_time_mean    : num  15 13.3 12 10.9 10 ...
 $ exam_eff_svc_time_mean   : num  27.1 32.8 40.1 49.4 60.6 ...
 $ exam_eff_svc_time_cv2    : num  1.19 1.06 1.06 1.09 1.11 ...
 $ mean_wait_i_dm1          : num  6.88 9.99 14.02 19.02 25.03 ...
 $ mean_wait_p_dm1          : num  5.68 8.27 11.63 15.88 21.05 ...
 $ mean_wait_r_dm1          : num  17.6 44.4 92.4 162.6 256 ...
 $ mean_wait_rp_dm1         : num  23.3 52.7 104 178.5 277 ...
 $ mean_wait_i              : num  0.359 0.44 0.544 0.622 0.959 ...
 $ mean_wait_r              : num  7.85 12.99 28.72 35.2 52.19 ...
 $ mean_wait_p              : num  0 0 0 0 0 0 0 0 0 0 ...
 $ mean_wait_rp             : num  7.85 12.99 28.72 35.2 52.19 ...
 $ mean_time_in_system      : num  27.3 32.3 48.8 54.5 72.6 ...
 $ mean_eod                 : num  267 274 306 317 356 ...</code></pre>
</div>
</div>
<p>The output (dependent) variables, or performance measures, are the last 6 in the list above. We won’t be using all of the input variables in the study for the paper as not all are relevant for the <code>mean_wait_i</code> (“i” for initial) variable we’ll be focusing on.</p>
</section>
<section id="details-on-data-partitioning" class="level2">
<h2 class="anchored" data-anchor-id="details-on-data-partitioning">Details on data partitioning</h2>
<section id="create-in-design-and-out-of-design-dataframes" class="level3">
<h3 class="anchored" data-anchor-id="create-in-design-and-out-of-design-dataframes">Create in design and out of design dataframes</h3>
<p>As described above, there are a total of 4860 scenarios and within those there are 5 different levels of number of patients per clinic session. In addition to assessing how accurate the various models are within the confines of the experimental design, I thought it would also be interesting to assess their relative peformance in extrapolation beyond the experimental design. So, I decided to treat the highest arrival rate level as the “out of design” holdout set. These scenarios correspond to every 5th row in the the <code>xy_q</code> dataframe.</p>
<p>Create sequence of row numbers to include in the out of design set and the in design set.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">ood_rows <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4860</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb5-2">design_rows <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">setdiff</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4860</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), ood_rows)</span></code></pre></div>
</div>
<p>Now subset the main dataframe to create the “in design” dataframe and “out of design” dataframe.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">xy_q_in <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> xy_q[design_rows, ]</span>
<span id="cb6-2">xy_q_out <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> xy_q[ood_rows, ]</span></code></pre></div>
</div>
<p>There are 3888 rows in the in design dataframe and 972 rows in the out of design dataframe.</p>
<p>To begin with, we’ll just be focusing on the the in design dataframe, <code>xy_q_in</code>.</p>
</section>
<section id="initial-split-of-in_split" class="level3">
<h3 class="anchored" data-anchor-id="initial-split-of-in_split">Initial split of in_split</h3>
<p>The partitioning that we did so far of <code>xy_q</code> into <code>xy_q_in</code> and <code>xy_q_out</code> is for eventually seeing how well the various models can extrapolate beyond the experimental design space. Now, let’s do an initial split of <code>xy_q_in</code> into training and test dataframes. Then we’ll use k-crossfold validation on the training data and eventually do final model comparisons on the test data. The <code>initial_split</code> function is from the rsample package which I describe more in the next section. I’m using the <code>rsample::</code> form in the function call just to highlight the fact that these functions are from the <code>rsample</code> package - we’ve loaded <code>tidymodels</code> so there’s really no need to do this.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span>)</span>
<span id="cb7-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Do the 75/25 split</span></span>
<span id="cb7-3">xy_q_in_split <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> rsample<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">initial_split</span>(xy_q_in, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prop =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.75</span>)</span>
<span id="cb7-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create train and test dataframes based on the split</span></span>
<span id="cb7-5">xy_q_in_train <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> rsample<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">training</span>(xy_q_in_split)</span>
<span id="cb7-6">xy_q_in_test <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> rsample<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">testing</span>(xy_q_in_split)</span></code></pre></div>
</div>
</section>
<section id="k-fold-cross-validation-on-in-design-scenarios-using-the-rsample-package" class="level3">
<h3 class="anchored" data-anchor-id="k-fold-cross-validation-on-in-design-scenarios-using-the-rsample-package">k-fold cross-validation on in design scenarios using the rsample package</h3>
<p>One of the packages that falls under the tidymodels umbrella is called rsample. As its name suggests, it support various resampling schemes such as bootstrapping and cross-validation.</p>
<p>In a nutshell, the <code>rsample::vfold_cv</code> function creates a dataframe of <a href="https://cran.r-project.org/web/packages/rsample/vignettes/rsample.html#individual-resamples-are-rsplit-objects">rsplit objects</a> that contains the dataset partition information for each resampling step (fold and repeat). The user can then fit models and compute errors on each partition and do whatever kind of error metric averaging desired.</p>
<p>I decided to do 10 repeats of 5-fold cross validation.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">57</span>)</span>
<span id="cb8-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Number of folds</span></span>
<span id="cb8-3">kfold_number <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb8-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Number of repeats of entire k-fold process</span></span>
<span id="cb8-5">kfold_repeats <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span></span>
<span id="cb8-6"></span>
<span id="cb8-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create the split object</span></span>
<span id="cb8-8">in_train_splits <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vfold_cv</span>(xy_q_in_train, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">v =</span> kfold_number, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">repeats =</span> kfold_repeats)</span></code></pre></div>
</div>
<p>We can see that <code>in_train_splits</code> is a tibble of <code>vfold_split</code> objects (which are just special case of <code>rsplit</code> objects).</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">in_train_splits</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>#  5-fold cross-validation repeated 10 times 
# A tibble: 50 × 3
   splits             id       id2  
   &lt;list&gt;             &lt;chr&gt;    &lt;chr&gt;
 1 &lt;split [2332/584]&gt; Repeat01 Fold1
 2 &lt;split [2333/583]&gt; Repeat01 Fold2
 3 &lt;split [2333/583]&gt; Repeat01 Fold3
 4 &lt;split [2333/583]&gt; Repeat01 Fold4
 5 &lt;split [2333/583]&gt; Repeat01 Fold5
 6 &lt;split [2332/584]&gt; Repeat02 Fold1
 7 &lt;split [2333/583]&gt; Repeat02 Fold2
 8 &lt;split [2333/583]&gt; Repeat02 Fold3
 9 &lt;split [2333/583]&gt; Repeat02 Fold4
10 &lt;split [2333/583]&gt; Repeat02 Fold5
# … with 40 more rows</code></pre>
</div>
</div>
<p>Here’s how we can access individual split data. The rsample library uses the term <em>analysis</em> data instead of <em>train</em> and <em>assessment</em> data instead of <em>test</em>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get the first resample object</span></span>
<span id="cb11-2">first_resample <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> in_train_splits<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>splits[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb11-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Peek at train</span></span>
<span id="cb11-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">analysis</span>(first_resample))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>     patients_per_clinic_block num_med_techs num_rooms num_rooms_per_provider
1156                        32             4         6                      3
2914                        44             5         2                      1
1589                        28             5         6                      3
1903                        24             2         6                      3
569                         28             3         4                      2
2133                        40             3         2                      1
     num_physicians vitals_time_mean vitals_time_cv2 exam_time_mean
1156              2               12             0.2             10
2914              2               12             0.2             10
1589              2                6             0.2             20
1903              2                6             0.2             20
569               2                9             0.2             20
2133              2                9             0.2             10
     exam_time_cv2 post_exam_time_mean post_exam_time_cv2
1156           1.0                   5                0.5
2914           0.2                   5                0.5
1589           1.0                   8                0.5
1903           0.2                   2                0.5
569            1.0                   2                0.5
2133           0.2                   8                0.5
     room_turnover_time_mean room_turnover_time_cv2 off_util_staff
1156                     2.5              0.1428571       0.650000
2914                     2.5              0.1428571       0.715000
1589                     2.5              0.1428571       0.385000
1903                     2.5              0.1428571       0.525000
569                      2.5              0.1428571       0.525000
2133                     2.5              0.1428571       1.083333
     off_util_physician off_util_room staff_eff_ia_time_mean
1156          0.6666667     0.3888889               15.00000
2914          0.9166667     1.6041667               10.90909
1589          1.1666667     0.5930556               17.14286
1903          1.0000000     0.4083333               20.00000
569           1.1666667     0.7145833               17.14286
2133          0.8333333     1.7083333               12.00000
     staff_eff_svc_time_mean staff_eff_svc_time_cv2 exam_eff_ia_time_mean
1156                    9.75              0.4469540              15.00000
2914                    7.80              0.4469540              10.90909
1589                    6.60              0.5274468              17.14286
1903                   10.50              0.4188926              20.00000
569                     9.00              0.4667363              17.14286
2133                   13.00              0.4155455              12.00000
     exam_eff_svc_time_mean exam_eff_svc_time_cv2 mean_wait_i_dm1
1156               28.31571             1.2781330       5.1373286
2914               39.24771             1.7920118       5.8719084
1589               72.35856             4.5544262       0.6727572
1903               55.25698             2.1498538       2.8133543
569                68.13819             5.2083896       2.4523833
2133               61.72793             0.9745578      29.5965876
     mean_wait_p_dm1 mean_wait_r_dm1 mean_wait_rp_dm1 mean_wait_i mean_wait_r
1156        5.678379       0.2374558         5.915835  0.12145435  0.06615067
2914       15.875804     106.2935891       122.169393  0.04530920 56.45996278
1589       41.185808       7.7328543        48.918662  0.01092027  8.06474237
1903       27.943627       1.6566003        29.600227  0.24243105  0.17063897
569        41.185808      25.7387255        66.924533  0.02372471 12.89212641
2133       11.631343     201.6879895       213.319333  1.01295877 74.78263011
     mean_wait_p mean_wait_rp mean_time_in_system mean_eod
1156    2.382189     2.448340            30.18538 262.8861
2914    0.000000    56.459963            83.92953 371.1970
1589   16.301227    24.365969            57.58385 319.3207
1903    9.240288     9.410927            38.58189 272.0874
569    10.966330    23.858456            54.59284 319.7577
2133    0.000000    74.782630           103.74635 406.5454</code></pre>
</div>
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Peek at test</span></span>
<span id="cb13-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">assessment</span>(first_resample))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>     patients_per_clinic_block num_med_techs num_rooms num_rooms_per_provider
3991                        32             3         6                      3
1268                        32             5         2                      1
1958                        32             2         6                      3
2006                        16             2         6                      3
2233                        24             3         4                      2
3953                        32             3         6                      3
     num_physicians vitals_time_mean vitals_time_cv2 exam_time_mean
3991              2               12             0.2             10
1268              2                6             0.2             15
1958              2                9             0.2             15
2006              2                9             0.2             20
2233              2                9             0.2             20
3953              2               12             0.2             15
     exam_time_cv2 post_exam_time_mean post_exam_time_cv2
3991           0.5                   5                0.5
1268           1.0                   5                0.5
1958           0.2                   5                0.5
2006           0.2                   8                0.5
2233           0.2                   5                0.5
3953           0.5                   2                0.5
     room_turnover_time_mean room_turnover_time_cv2 off_util_staff
3991                     2.5              0.1428571      0.8666667
1268                     2.5              0.1428571      0.3600000
1958                     2.5              0.1428571      1.1000000
2006                     2.5              0.1428571      0.6500000
2233                     2.5              0.1428571      0.5500000
3953                     2.5              0.1428571      0.7333333
     off_util_physician off_util_room staff_eff_ia_time_mean
3991          0.6666667     0.3888889                     15
1268          1.0000000     1.5000000                     15
1958          1.0000000     0.5000000                     15
2006          0.6666667     0.3388889                     30
2233          1.0000000     0.6875000                     20
3953          1.0000000     0.4333333                     15
     staff_eff_svc_time_mean staff_eff_svc_time_cv2 exam_eff_ia_time_mean
3991                    13.0              0.4469540                    15
1268                     5.4              0.3867186                    15
1958                    16.5              0.4144845                    15
2006                    19.5              0.4155455                    30
2233                    11.0              0.4144845                    20
3953                    11.0              0.4587043                    15
     exam_eff_svc_time_mean exam_eff_svc_time_cv2 mean_wait_i_dm1
3991               38.48086             0.8543810      15.3024853
1268               47.80331             4.0144639       0.4398748
1958               80.76920             0.9994200      33.4057701
2006               48.63299             0.6357165       8.6701294
2233               58.82556             2.0912196       3.3819381
3953               52.59627             2.3435171       8.2328360
     mean_wait_p_dm1 mean_wait_r_dm1 mean_wait_rp_dm1 mean_wait_i mean_wait_r
3991        5.678379       0.9271693         6.605548  0.90496542  0.07052544
1268       24.863430      85.6364573       110.499888  0.00000000 42.49793207
1958       24.863430      17.3003260        42.163756  5.25727485  1.40778217
2006        9.462865       0.2235870         9.686452  1.87931011  0.00000000
2233       27.943627      10.1332013        38.076828  0.09146909  2.76873372
3953       24.863430       3.3569486        28.220379  0.24323800  2.42631747
     mean_wait_p mean_wait_rp mean_time_in_system mean_eod
3991    1.991905     2.062430            31.77847 260.0836
1268    0.000000    42.497932            68.23150 348.0091
1958    7.753168     9.160950            50.01683 283.9539
2006    1.297761     1.297761            42.44758 257.8852
2233    6.998359     9.767093            44.14195 276.3151
3953   12.107860    14.534178            44.72451 286.2879</code></pre>
</div>
</div>
</section>
</section>
<section id="polynomial-regression-model-for-initial-wait" class="level2">
<h2 class="anchored" data-anchor-id="polynomial-regression-model-for-initial-wait">Polynomial regression model for initial wait</h2>
<p>Now let’s fit a polynomial regression model to try to predict <code>mean_wait_i</code> (the initial wait time experienced by the patient before their vital signs are taken by a medical technician). We want to do repeated k-crossfold resampling to get a sense of how variable the model is when based on different fit datasets. We can do this all quite easily using tidymodels by building workflow objects to properly handle the model fitting and assessment within the context of resampling.</p>
<p>Start by creating an an empty model object of the appropriate type. Since polynomial regression is really just a linear regression model with some added features (powers of inputs) we use the <code>parsnip::linear_reg</code> function - <a href="https://parsnip.tidymodels.org/reference/linear_reg.html">https://parsnip.tidymodels.org/reference/linear_reg.html</a>. Nothing special is needed for fitting a polynomial regression model in R beyond the <code>lm</code> function and we use <code>set_engine</code> to specify this choice for how the regression model should be fit. Technically, <code>lm' is the default engine for</code>linear_reg<code>but we are just being explicit. While we don't need them here, the</code>linear_reg` function also includes optional arguments related to regularization.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">wait_i_poly_mod <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">linear_reg</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mode =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"regression"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb15-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_engine</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">engine =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>)</span></code></pre></div>
</div>
<p>For sckikit-learn users, the above is analogous to:</p>
<pre><code>from sklearn import linear_model
wait_i_poly_mod = linear_model.LinearRegression()</code></pre>
<p>Next we need to create the specific linear regression formula object we want to fit. This first formula only contains basic inputs and does not include any queueing inspired features. This next line of code has nothing to do with tidymodels, it’s just creation of an R formula object.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a formula object. Later we'll use a recipe to generate the polynomial terms.</span></span>
<span id="cb17-2">wait_i_poly_noq_formula <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mean_wait_i <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> patients_per_clinic_block <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb17-3">                         num_med_techs <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-4">                         num_rooms <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb17-5">                         vitals_time_mean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-6">                         exam_time_mean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-7">                         exam_time_cv2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb17-8">                         post_exam_time_mean</span></code></pre></div>
</div>
<p>Now we’ll create a recipe object that contains two steps - the formula and the polynomial transforms. Tidymodels uses the recipe package to let you create objects that encapsulate various modeling pre-processing tasks that occur prior to actually fitting the model. A peek at the <a href="https://recipes.tidymodels.org/reference/index.html">recipes docs</a> reveals that recipes tries to provide a wide range of pre-processing functionality such as formula specification, imputation, transformations, dummy encoding, and more. There has been some <a href="https://staffblogs.le.ac.uk/teachingr/2020/10/05/on-not-using-tidymodels/">criticism of recipes (and tidymodels in general)</a> in this regard in that it tries to do too much and does it not very well. I haven’t used tidymodels nearly enough to have formed an opinion yet on this point. In scikit-learn, the <code>sklearn.preprocessing</code> package provides functionality that overlaps with the recipes package in R.</p>
<p>In our case, the only pre-processing we need is to specify the base linear model formula and then add in the polynomial terms. We do the latter by using one the many <code>step_*</code> functions available in recipes - <code>step_poly</code>. For <code>step_poly</code>, the default of <code>degree = 2</code> corresponds to using up to squared terms.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">wait_i_poly_noq_recipe <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">recipe</span>(wait_i_poly_noq_formula, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> xy_q_in_train) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb18-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_poly</span>(patients_per_clinic_block, num_med_techs, num_rooms,</span>
<span id="cb18-3">          vitals_time_mean, exam_time_mean, exam_time_cv2, post_exam_time_mean,</span>
<span id="cb18-4">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">degree =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div>
</div>
<p>Now we can combine our model and our recipe into a workflow. This is reminiscent of <a href="https://scikit-learn.org/stable/modules/compose.html#combining-estimators">Pipeline objects in scikit-learn</a>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a workflow object that uses the model and recipe above</span></span>
<span id="cb19-2">wait_i_poly_noq_wflow <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb19-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb19-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_model</span>(wait_i_poly_mod) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb19-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_recipe</span>(wait_i_poly_noq_recipe)</span></code></pre></div>
</div>
<p>At this point, we still haven’t fit any models nor done any resampling. We have simply set up a resampling scheme through our creation of the <code>in_train_splits</code> variable (a tibble of <code>vfold_split</code> objects) and created a workflow consisting of a model and a pre-processing recipe. Before launching the resampling and fitting process, we can set some resampling control options through the <code>rsample::control_samples</code> function. For example, we can choose to save the out of sample predictions for later analysis or append the workflow object onto its output.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">keep_pred <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">control_resamples</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">save_pred =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">save_workflow =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span></code></pre></div>
</div>
<p>Finally, we can pipe our workflow through <code>tune::fit_resamples()</code>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1">wait_i_poly_noq_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb21-2">  wait_i_poly_noq_wflow <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb21-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit_resamples</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">resamples =</span> in_train_splits, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">control =</span> keep_pred)</span></code></pre></div>
</div>
<p>Now let’s look at the metrics averaged over the 50 splits (10 repeats x 5 folds). Notice that there’s a small standard error on the metrics indicating that there’s not a lot of variability between splits. This is good. The <code>tune::collect_metrics</code> function makes it easy to pull out and consolidate metrics.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>(wait_i_poly_noq_results)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 6
  .metric .estimator  mean     n std_err .config             
  &lt;chr&gt;   &lt;chr&gt;      &lt;dbl&gt; &lt;int&gt;   &lt;dbl&gt; &lt;chr&gt;               
1 rmse    standard   4.86     50 0.0605  Preprocessor1_Model1
2 rsq     standard   0.408    50 0.00304 Preprocessor1_Model1</code></pre>
</div>
</div>
<p>Now we’ll refit one last model using the entire training set and assess its performance on the in design test data. For this, we use <code>tune::last_fit()</code>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Do last fit</span></span>
<span id="cb24-2">wait_i_last_poly_noq_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">last_fit</span>(wait_i_poly_noq_wflow, xy_q_in_split)</span></code></pre></div>
</div>
<p>What exactly is returned from <code>last_fit()</code>? It’s a one row tibble of tibbles.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1">wait_i_last_poly_noq_results</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># Resampling results
# Manual resampling 
# A tibble: 1 × 6
  splits             id               .metrics .notes   .predictions .workflow 
  &lt;list&gt;             &lt;chr&gt;            &lt;list&gt;   &lt;list&gt;   &lt;list&gt;       &lt;list&gt;    
1 &lt;split [2916/972]&gt; train/test split &lt;tibble&gt; &lt;tibble&gt; &lt;tibble&gt;     &lt;workflow&gt;</code></pre>
</div>
</div>
<p>There’s only one row and it corresponds to the initial train/test split that was done. From the column names we see that things like the accuracy metrics and final model predictions are tibbles within this container tibble.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>(wait_i_last_poly_noq_results)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 4
  .metric .estimator .estimate .config             
  &lt;chr&gt;   &lt;chr&gt;          &lt;dbl&gt; &lt;chr&gt;               
1 rmse    standard       4.76  Preprocessor1_Model1
2 rsq     standard       0.398 Preprocessor1_Model1</code></pre>
</div>
</div>
<p>The RMSE on test is slightly smaller than RMSE on train which is unusual but can happen due to randomness.</p>
<p>Let’s plot the actual vs predicted values based on the in design test dataframe. The predictions on the test data based on the model fit by <code>last_fit</code> can be obtained by <code>tune::collect_predictions()</code>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1">wait_i_last_poly_noq_rmse <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>(wait_i_last_poly_noq_results) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb29-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(.metric <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rmse'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb29-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(.estimate) </span>
<span id="cb29-4"></span>
<span id="cb29-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Grab the predictions on the test data</span></span>
<span id="cb29-6">assess_wait_i_last_poly_noq_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_predictions</span>(wait_i_last_poly_noq_results)</span>
<span id="cb29-7"></span>
<span id="cb29-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Plot actual vs predicted.</span></span>
<span id="cb29-9">assess_wait_i_last_poly_noq_results <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb29-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> mean_wait_i, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> .pred)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb29-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb29-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb29-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_obs_pred</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb29-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Actual mean initial wait time (min)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb29-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Metamodel prediction"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb29-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggtitle</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Polynomial regression model - no queueing terms"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb29-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">annotate</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">40</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sprintf</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"RMSE = %8.3f"</span>, wait_i_last_poly_noq_rmse))</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/tidymodels-mm/tidymodels_mm_files/figure-html/assess_wait_i_last_poly_res-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>Not great, pretty horrible. Clearly some wierdness with the zero values of actual mean wait time. Let’s see how the addition of the queuing inspired features affects model accuracy. We’ll pull everything we did above into a single code chunk.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a formula object. </span></span>
<span id="cb30-2">wait_i_poly_q_formula <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mean_wait_i <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> patients_per_clinic_block <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb30-3">                         num_med_techs <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-4">                         num_rooms <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb30-5">                         vitals_time_mean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-6">                         exam_time_mean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-7">                         exam_time_cv2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb30-8">                         post_exam_time_mean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-9">                         off_util_staff <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> off_util_physician <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> off_util_room <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-10">                         mean_wait_i_dm1</span>
<span id="cb30-11"></span>
<span id="cb30-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create recipe</span></span>
<span id="cb30-13">wait_i_poly_q_recipe <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">recipe</span>(wait_i_poly_q_formula, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> xy_q_in_train) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb30-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_poly</span>(patients_per_clinic_block, num_med_techs, num_rooms,</span>
<span id="cb30-15">          vitals_time_mean, exam_time_mean, exam_time_cv2, post_exam_time_mean,</span>
<span id="cb30-16">          off_util_staff, off_util_physician, off_util_room, mean_wait_i_dm1,</span>
<span id="cb30-17">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">degree =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb30-18"></span>
<span id="cb30-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create workflow object (uses model object we created earlier)</span></span>
<span id="cb30-20">wait_i_poly_q_wflow <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb30-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb30-22">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_model</span>(wait_i_poly_mod) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb30-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_recipe</span>(wait_i_poly_q_recipe)</span>
<span id="cb30-24"></span>
<span id="cb30-25"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit the models</span></span>
<span id="cb30-26">keep_pred <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">control_resamples</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">save_pred =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">save_workflow =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb30-27"></span>
<span id="cb30-28">wait_i_poly_q_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb30-29">  wait_i_poly_q_wflow <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb30-30">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit_resamples</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">resamples =</span> in_train_splits, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">control =</span> keep_pred)</span>
<span id="cb30-31"></span>
<span id="cb30-32"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>(wait_i_poly_q_results)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 6
  .metric .estimator  mean     n std_err .config             
  &lt;chr&gt;   &lt;chr&gt;      &lt;dbl&gt; &lt;int&gt;   &lt;dbl&gt; &lt;chr&gt;               
1 rmse    standard   1.82     50 0.0219  Preprocessor1_Model1
2 rsq     standard   0.916    50 0.00266 Preprocessor1_Model1</code></pre>
</div>
<div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Do last fit</span></span>
<span id="cb32-2">wait_i_last_poly_q_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">last_fit</span>(wait_i_poly_q_wflow, xy_q_in_split)</span>
<span id="cb32-3"></span>
<span id="cb32-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>(wait_i_last_poly_q_results)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 4
  .metric .estimator .estimate .config             
  &lt;chr&gt;   &lt;chr&gt;          &lt;dbl&gt; &lt;chr&gt;               
1 rmse    standard       1.76  Preprocessor1_Model1
2 rsq     standard       0.917 Preprocessor1_Model1</code></pre>
</div>
<div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1">wait_i_last_poly_q_rmse <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>(wait_i_last_poly_q_results) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb34-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(.metric <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rmse'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb34-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(.estimate) </span>
<span id="cb34-4"></span>
<span id="cb34-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Grab the predictions on the test data</span></span>
<span id="cb34-6">assess_wait_i_last_poly_q_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_predictions</span>(wait_i_last_poly_q_results)</span>
<span id="cb34-7"></span>
<span id="cb34-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Plot actual vs predicted.</span></span>
<span id="cb34-9">assess_wait_i_last_poly_q_results <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb34-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> mean_wait_i, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> .pred)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb34-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb34-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb34-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_obs_pred</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb34-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Actual mean initial wait time (min)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb34-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Metamodel prediction"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb34-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggtitle</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Polynomial regression model - including queueing terms"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb34-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">annotate</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">40</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sprintf</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"RMSE = %8.3f"</span>, wait_i_last_poly_q_rmse))</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/tidymodels-mm/tidymodels_mm_files/figure-html/poly_q-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>Clearly there is value in the queueing inspired terms. Ok, that’s the polynomial regression model. Tidymodels made it easy to create a coherent workflow that was easy to modify. For example, we could use <code>step_ns</code> instead of <code>step_poly</code> in the recipe and we could fit a natural cubic spline.</p>
<p>Now let’s look at the more complicated case of using tidymodels to fit a nonlinear model with <code>nls()</code>.</p>
</section>
<section id="nonlinear-queueing-inspired-model-for-initital-wait" class="level2">
<h2 class="anchored" data-anchor-id="nonlinear-queueing-inspired-model-for-initital-wait">Nonlinear queueing inspired model for initital wait</h2>
<p>In the dataset, you’ll find a queueing related featured named <code>mean_wait_i_dm1</code> which corresponds to a <img src="https://latex.codecogs.com/png.latex?D/M/1"> based approximation for the initial mean wait time. The details are available in our paper, but for now we’ll just say that we fit a multiplicative power model that includes terms that attempt to correct for the violated assumptions of the <img src="https://latex.codecogs.com/png.latex?D/M/1"> model.</p>
<p>The proposed model is</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AW_I%20=%20b_1%20%5Cfrac%7BW_%7BD%7D%5E%7Bb_2%7DR%5E%7Bb_3%7DC%5E%7Bb_4%7D%7D%7BS%5E%7Bb_5%7D%7D%0A"></p>
<p>where</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?W_%7BD%7D"> is the mean wait in the approximating <img src="https://latex.codecogs.com/png.latex?D/M/1"> queue</li>
<li><img src="https://latex.codecogs.com/png.latex?R"> is the number of exam rooms</li>
<li><img src="https://latex.codecogs.com/png.latex?C"> is the coefficient of variation squared of an approximate effective service time distribution for staff (hyper-erlang)</li>
<li><img src="https://latex.codecogs.com/png.latex?S"> is the number medical technician staff</li>
</ul>
<p>You can think of the <img src="https://latex.codecogs.com/png.latex?R,%20C,%20S"> terms along with the power parameters as being adjustments to the raw <img src="https://latex.codecogs.com/png.latex?D/M/1"> based term, <img src="https://latex.codecogs.com/png.latex?W_%7BD%7D">. Clearly, this is a nonlinear model and we will estimate <img src="https://latex.codecogs.com/png.latex?b_1,%20...,%20b_5"> using R’s <code>nls()</code> function.</p>
<section id="developing-a-parsnip-model-that-uses-nls" class="level3">
<h3 class="anchored" data-anchor-id="developing-a-parsnip-model-that-uses-nls">Developing a parsnip model that uses nls()</h3>
<p>It would be great if we could simply use similar code to what we did above for the polynomial models. Unfortunately, it’s not that easy. To begin with, nonlinear regression, and in particular the <code>nls()</code> engine isn’t supported as of yet in <a href="https://parsnip.tidymodels.org/">parsnip</a>, the tidymodels package that handles model building. As I starting digging into (no pun intended) this package for the first time, I realized, of course, a “parsnip” is not a “carrot”. The goal of parsnip, written by the same author who developed the caret package, is stated on their webpage.</p>
<blockquote class="blockquote">
<p>The goal of parsnip is to provide a tidy, unified interface to models that can be used to try a range of models without getting bogged down in the syntactical minutiae of the underlying packages.</p>
</blockquote>
<p>However, parsnip is <strong>not</strong> the new caret. It’s more like tidymodels is the new caret. Parsnip just handles the unified modeling interface and has been designed with the goal of being easy for users to add their own custom models. They have a very nice tutorial on doing just that:</p>
<ul>
<li><a href="https://www.tidymodels.org/learn/develop/models/">How to build a parsnip model</a></li>
</ul>
<p>So, let’s create our own <code>"nonlinear_reg"</code> model that relies on <code>nls()</code> as the model fitting engine. Since <code>nls()</code> needs a starting guess for the parameter values, we need to figure out how and where such a starting guess fits into a custom parsnip model.</p>
<p>I just followed the steps in the tutorial listed above.</p>
<p><strong>Step 1. Register the model, modes, and arguments</strong></p>
<div class="cell">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_new_model</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nonlinear_reg"</span>)</span>
<span id="cb35-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_model_mode</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nonlinear_reg"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mode =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"regression"</span>)</span>
<span id="cb35-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_model_engine</span>(</span>
<span id="cb35-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nonlinear_reg"</span>, </span>
<span id="cb35-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mode =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"regression"</span>, </span>
<span id="cb35-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">eng =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nls"</span></span>
<span id="cb35-7">)</span>
<span id="cb35-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_dependency</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nonlinear_reg"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">eng =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nls"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pkg =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"stats"</span>)</span>
<span id="cb35-9"></span>
<span id="cb35-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># We can examine what we just created</span></span>
<span id="cb35-11"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">show_model_info</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nonlinear_reg"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Information for `nonlinear_reg`
 modes: unknown, regression 

 engines: 
   regression: nlsNA

¹The model can use case weights.

 no registered arguments.

 no registered fit modules.

 no registered prediction modules.</code></pre>
</div>
</div>
<p><strong>Step 2. Create the model function</strong></p>
<p>Now we declare the main arguments for the model. This is intended for tuning parameters (I think) and isn’t needed here.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># set_model_arg(</span></span>
<span id="cb37-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   model = "nonlinear_reg",</span></span>
<span id="cb37-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   eng = "nls",</span></span>
<span id="cb37-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   parsnip = "start",</span></span>
<span id="cb37-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   original = "start",</span></span>
<span id="cb37-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   # Not sure what to use for func arg</span></span>
<span id="cb37-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   func = list(pkg = "foo", fun = "bar"),</span></span>
<span id="cb37-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   has_submodel = FALSE</span></span>
<span id="cb37-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># )</span></span>
<span id="cb37-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># show_model_info("nonlinear_reg")</span></span></code></pre></div>
</div>
<p>This next function creates the <code>new_model_spec</code> object needed to define our custom model.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb38-1">nonlinear_reg <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb38-2">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mode =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"regression"</span>,  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">start =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>) {</span>
<span id="cb38-3">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Check for correct mode</span></span>
<span id="cb38-4">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> (mode  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"regression"</span>) {</span>
<span id="cb38-5">      rlang<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abort</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"`mode` should be 'regression'"</span>)</span>
<span id="cb38-6">    }</span>
<span id="cb38-7">    </span>
<span id="cb38-8">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Capture the arguments in quosures</span></span>
<span id="cb38-9">    args <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">start =</span> rlang<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">enquo</span>(start))</span>
<span id="cb38-10">    </span>
<span id="cb38-11">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create specification</span></span>
<span id="cb38-12">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">new_model_spec</span>(</span>
<span id="cb38-13">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nonlinear_reg"</span>,</span>
<span id="cb38-14">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">args =</span> args,</span>
<span id="cb38-15">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mode =</span> mode,</span>
<span id="cb38-16">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">engine =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>,</span>
<span id="cb38-17">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">eng_args =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>,</span>
<span id="cb38-18">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span></span>
<span id="cb38-19">    )</span>
<span id="cb38-20">  }</span></code></pre></div>
</div>
<p><strong>Step 3. Add a fit module</strong></p>
<div class="cell">
<div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb39-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_fit</span>(</span>
<span id="cb39-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nonlinear_reg"</span>,</span>
<span id="cb39-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">eng =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nls"</span>,</span>
<span id="cb39-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mode =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"regression"</span>,</span>
<span id="cb39-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">value =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb39-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">interface =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"formula"</span>,</span>
<span id="cb39-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">protect =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"formula"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data"</span>),</span>
<span id="cb39-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">func =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pkg =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"stats"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fun =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nls"</span>),</span>
<span id="cb39-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">defaults =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>()</span>
<span id="cb39-10">  )</span>
<span id="cb39-11">)</span>
<span id="cb39-12"></span>
<span id="cb39-13"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">show_model_info</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nonlinear_reg"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Information for `nonlinear_reg`
 modes: unknown, regression 

 engines: 
   regression: nls

¹The model can use case weights.

 no registered arguments.

 fit modules:
 engine       mode
    nls regression

 no registered prediction modules.</code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb41" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb41-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_encoding</span>(</span>
<span id="cb41-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nonlinear_reg"</span>,</span>
<span id="cb41-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">eng =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nls"</span>,</span>
<span id="cb41-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mode =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"regression"</span>,</span>
<span id="cb41-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">options =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb41-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">predictor_indicators =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>,</span>
<span id="cb41-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">compute_intercept =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>,</span>
<span id="cb41-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">remove_intercept =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>,</span>
<span id="cb41-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">allow_sparse_x =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span></span>
<span id="cb41-10">  )</span>
<span id="cb41-11">)</span></code></pre></div>
</div>
<p><strong>Step 4. Add modules for prediction</strong></p>
<div class="cell">
<div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb42-1">response_info <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb42-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb42-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pre =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>,</span>
<span id="cb42-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">post =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>,</span>
<span id="cb42-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">func =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fun =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"predict"</span>),</span>
<span id="cb42-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">args =</span></span>
<span id="cb42-7">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># These lists should be of the form:</span></span>
<span id="cb42-8">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># {predict.nls argument name} = {values provided from parsnip objects}</span></span>
<span id="cb42-9">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb42-10">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># We don't want the first two arguments evaluated right now</span></span>
<span id="cb42-11">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># since they don't exist yet. </span></span>
<span id="cb42-12">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">object =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">quote</span>(object<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>fit),</span>
<span id="cb42-13">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">newdata =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">quote</span>(new_data)</span>
<span id="cb42-14">      )</span>
<span id="cb42-15">  )</span>
<span id="cb42-16"></span>
<span id="cb42-17"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_pred</span>(</span>
<span id="cb42-18">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nonlinear_reg"</span>,</span>
<span id="cb42-19">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">eng =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nls"</span>,</span>
<span id="cb42-20">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mode =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"regression"</span>,</span>
<span id="cb42-21">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"numeric"</span>,</span>
<span id="cb42-22">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">value =</span> response_info</span>
<span id="cb42-23">)</span></code></pre></div>
</div>
<p>Now we have a custom model type named <code>nonlinear_reg</code> that we can use just like (we hope) any built in parsnip model. Before trying to use with the full blown resampling scheme, let’s just see if we can make it work for a simple fit, predict, and assess cycle. We’ll just fit the model on the entire within design training set, <code>xy_q_in_train</code>.</p>
<p>The <code>nls()</code> function would like a vector of initial values for all parameters to be fit.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb43-1">init_nls_wait_i <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b1=</span>.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b2=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b3=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b4=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b5=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div>
</div>
<p>Create a model object, a formula object and then fit the model.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb44-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create nonlinear_reg model object</span></span>
<span id="cb44-2">wait_i_nls_mod <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nonlinear_reg</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_engine</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">engine =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nls"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">start =</span> init_nls_wait_i)</span>
<span id="cb44-3"></span>
<span id="cb44-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create the nonlinear formula</span></span>
<span id="cb44-5">wait_i_nls_mod_formula <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mean_wait_i <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> b1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (num_med_techs <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span> b2) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> </span>
<span id="cb44-6">  (mean_wait_i_dm1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span> b3) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(num_rooms <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span> b4) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (staff_eff_svc_time_cv2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span> b5)</span>
<span id="cb44-7"></span>
<span id="cb44-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit the model</span></span>
<span id="cb44-9">wait_i_nls_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wait_i_nls_mod <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb44-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">formula =</span> wait_i_nls_mod_formula, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> xy_q_in_train)</span>
<span id="cb44-11"></span>
<span id="cb44-12">wait_i_nls_fit</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>parsnip model object

Nonlinear regression model
  model: mean_wait_i ~ b1 * (num_med_techs^b2) * (mean_wait_i_dm1^b3) *     (num_rooms^b4) * (staff_eff_svc_time_cv2^b5)
   data: data
      b1       b2       b3       b4       b5 
 0.01395 -0.33003  1.64832  0.54281  0.38448 
 residual sum-of-squares: 4656

Number of iterations to convergence: 7 
Achieved convergence tolerance: 1.525e-06</code></pre>
</div>
</div>
<p>Great. It works. We can make predictions and compute error metrics.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb46" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb46-1">wait_i_nls_predicted <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(wait_i_nls_fit, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">new_data =</span> xy_q_in_test)</span>
<span id="cb46-2"></span>
<span id="cb46-3">wait_i_nls_rmse <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rmse_vec</span>(xy_q_in_test<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>mean_wait_i, wait_i_nls_predicted<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>.pred)</span>
<span id="cb46-4"></span>
<span id="cb46-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Plot actual vs predicted.</span></span>
<span id="cb46-6">xy_q_in_test <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb46-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(mean_wait_i) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb46-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_cols</span>(wait_i_nls_predicted) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span>  </span>
<span id="cb46-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> mean_wait_i, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> .pred)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb46-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb46-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb46-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_obs_pred</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb46-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Actual mean initial wait time (min)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb46-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Metamodel prediction"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb46-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggtitle</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Nonlinear queueing based model"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb46-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">annotate</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">40</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sprintf</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"RMSE = %8.3f"</span>, wait_i_nls_rmse))</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/tidymodels-mm/tidymodels_mm_files/figure-html/nls_wait_i_plot-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>Very nice fit and only five parameters that have physical underpinnings that make sense in the context of queueing models.</p>
<p>To get a sense of the stability of the model, We really need to add the k-crossfold resampling just as we did with the polynomial models. Before we try that, we can actually get a preview of the impending problem by tweaking the previous example to use a simple tidy workflow object. The model and formula objects are fine. Let’s try to use them use them in a workflow.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb47" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb47-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create simple workflow</span></span>
<span id="cb47-2">wait_i_nls_wf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb47-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_model</span>(wait_i_nls_mod) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb47-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_formula</span>(wait_i_nls_mod_formula)</span>
<span id="cb47-5"></span>
<span id="cb47-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Pipe the workflow into fit() to fit the model</span></span>
<span id="cb47-7">wait_i_nls_wf <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb47-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> xy_q_in_train)</span></code></pre></div>
<div class="cell-output cell-output-error">
<pre><code>Error in terms.formula(formula, data = data): invalid power in formula</code></pre>
</div>
</div>
<p>And there’s the problem. When <code>workflows::add_formula</code> is called, an internal checking of the proposed formula flags our formula as having an invalid power. No doubt this is because the powers in our formula are parameters to be estimated by <code>nls()</code>. This same problem also occurs if we try to use <code>fit_resample</code> as we did with the polynomial models.</p>
<p>To summarize what we just learned, While I was able to successfully use my custom <code>nonlinear_reg</code> model to fit a model and assess its predictive accuracy, I cannot use it in any workflow that uses <code>add_formula</code> or <code>fit_resamples()</code> So, it appears that we are going to have to write our own function for iterating over our k-crossfold splits object and fitting and assessing models for each split.</p>
</section>
<section id="fit-queueing-based-nls-model-and-use-cross-validation" class="level3">
<h3 class="anchored" data-anchor-id="fit-queueing-based-nls-model-and-use-cross-validation">Fit queueing based nls model and use cross-validation</h3>
<p>I hacked together a few functions to do what <code>tune::fit_resamples()</code> does. They basically just loop through the splits doing model fitting, predicting, scoring and plotting. You can find the <a href="https://github.com/misken/op_clinic_mm/blob/main/lib/myparsnip_funcs.R">source code here</a> within the GitHub repo for this project.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb49" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb49-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">source</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'lib/myparsnip_funcs.R'</span>)</span></code></pre></div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb50" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb50-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create nonlinear_reg model object</span></span>
<span id="cb50-2">init_nls_wait_i <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b1=</span>.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b2=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b3=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b4=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b5=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb50-3">wait_i_nls_mod <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nonlinear_reg</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_engine</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">engine =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nls"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">start =</span> init_nls_wait_i)</span>
<span id="cb50-4"></span>
<span id="cb50-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create the nonlinear formula</span></span>
<span id="cb50-6">wait_i_nls_mod_formula <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mean_wait_i <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> b1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (num_med_techs <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span> b2) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> </span>
<span id="cb50-7">  (mean_wait_i_dm1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span> b3) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(num_rooms <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span> b4) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (staff_eff_svc_time_cv2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span> b5)</span>
<span id="cb50-8"></span>
<span id="cb50-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Call my custom function to fit the models for each split</span></span>
<span id="cb50-10">wait_i_kfold_res <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">myparsnip_fit_resamples</span>(wait_i_nls_mod, wait_i_nls_mod_formula,</span>
<span id="cb50-11">                                          in_train_splits, kfold_number, kfold_repeats)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation ideoms with `aes()`</code></pre>
</div>
</div>
<p>Now let’s look at the metrics averaged over the 50 splits (10 repeats x 5 folds). The standard error is small and the RMSE value for this nonlinear model is better than both the spline and poly RMSE values. While I attempted to have <code>myparsnip_fit_resamples()</code> return an object that I could pass directly to <code>tune::collect_metrics</code>, I couldn’t quite get it to work even after digging into the <a href="https://github.com/tidymodels/tune/blob/main/R/collect.R">source code</a> and finding that <code>tune::collect_metrics</code> calls <code>UseMethod</code> and trying to coerce my function return value to be of the correct class so that the correct method would get dispatched. Then I realized I could just call the desired method even though it doesn’t get exported into the namespace for <code>tune</code> by using the triple colon notation.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb52" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb52-1">tune<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics.tune_results</span>(wait_i_kfold_res, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">summarize =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 6
  .metric .estimator  mean     n std_err .config             
  &lt;chr&gt;   &lt;chr&gt;      &lt;dbl&gt; &lt;int&gt;   &lt;dbl&gt; &lt;chr&gt;               
1 mae     standard   0.472    50 0.00548 Preprocessor1_Model1
2 rmse    standard   1.29     50 0.0189  Preprocessor1_Model1</code></pre>
</div>
</div>
<p>Now we’ll refit one last model using the entire training set and assess its performance on the in design test data - again using a custom function.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb54" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb54-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Do last fit</span></span>
<span id="cb54-2">wait_i_last_res <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">myparsnip_last_fit</span>(wait_i_nls_mod, wait_i_nls_mod_formula, xy_q_in_split)</span>
<span id="cb54-3"></span>
<span id="cb54-4">tune<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics.tune_results</span>(wait_i_last_res<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>metrics_summary, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">summarize =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 5
  id        .metric .estimator .estimate .config             
  &lt;chr&gt;     &lt;chr&gt;   &lt;chr&gt;          &lt;dbl&gt; &lt;chr&gt;               
1 Resample1 rmse    standard       1.16  Preprocessor1_Model1
2 Resample1 mae     standard       0.418 Preprocessor1_Model1</code></pre>
</div>
</div>
<p>No we can check out the actual vs predicted plot based on the test data. This plot also shows that the queueing based model outperforms the polynomial regression models - and only has 5 parameters whereas the polynomial models have many parameters (10 base inputs plus 10 quadratic terms). As an aside, a natural cubic spline model with one knot performed very similarly to the polynomial regression model.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb56" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb56-1">wait_i_last_res<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>act_vs_pred_plot</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/tidymodels-mm/tidymodels_mm_files/figure-html/nls_plot-1.png" class="img-fluid" width="672"></p>
</div>
</div>
</section>
</section>
<section id="random-forest-model" class="level2">
<h2 class="anchored" data-anchor-id="random-forest-model">Random forest model</h2>
<p>Just for grins, let’s try a totally different type of model - a random forest. Again, we’ll create one model without the queueing related terms and one with. Tidymodels makes it easy to reuse code - just drop in a different model engine.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb57" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb57-1">wait_i_rf_mod <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rand_forest</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mode =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"regression"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb57-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_engine</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">engine =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ranger"</span>)</span>
<span id="cb57-3"></span>
<span id="cb57-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a formula object. </span></span>
<span id="cb57-5">wait_i_rf_q_formula <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mean_wait_i <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> patients_per_clinic_block <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb57-6">                         num_med_techs <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb57-7">                         num_rooms <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb57-8">                         vitals_time_mean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb57-9">                         exam_time_mean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb57-10">                         exam_time_cv2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb57-11">                         post_exam_time_mean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb57-12">                         off_util_staff <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> off_util_physician <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> off_util_room <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb57-13">                         mean_wait_i_dm1</span>
<span id="cb57-14"></span>
<span id="cb57-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create recipe</span></span>
<span id="cb57-16">wait_i_rf_q_recipe <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">recipe</span>(wait_i_rf_q_formula, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> xy_q_in_train) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb57-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_poly</span>(patients_per_clinic_block, num_med_techs, num_rooms,</span>
<span id="cb57-18">          vitals_time_mean, exam_time_mean, exam_time_cv2, post_exam_time_mean,</span>
<span id="cb57-19">          off_util_staff, off_util_physician, off_util_room, mean_wait_i_dm1,</span>
<span id="cb57-20">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">degree =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb57-21"></span>
<span id="cb57-22"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create workflow object (uses model object we created earlier)</span></span>
<span id="cb57-23">wait_i_rf_q_wflow <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb57-24">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb57-25">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_model</span>(wait_i_rf_mod) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb57-26">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_recipe</span>(wait_i_rf_q_recipe)</span>
<span id="cb57-27"></span>
<span id="cb57-28"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit the models</span></span>
<span id="cb57-29">keep_pred <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">control_resamples</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">save_pred =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">save_workflow =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb57-30"></span>
<span id="cb57-31">wait_i_rf_q_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb57-32">  wait_i_rf_q_wflow <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb57-33">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit_resamples</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">resamples =</span> in_train_splits, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">control =</span> keep_pred)</span>
<span id="cb57-34"></span>
<span id="cb57-35"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>(wait_i_rf_q_results)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 6
  .metric .estimator  mean     n  std_err .config             
  &lt;chr&gt;   &lt;chr&gt;      &lt;dbl&gt; &lt;int&gt;    &lt;dbl&gt; &lt;chr&gt;               
1 rmse    standard   0.765    50 0.0221   Preprocessor1_Model1
2 rsq     standard   0.986    50 0.000678 Preprocessor1_Model1</code></pre>
</div>
<div class="sourceCode cell-code" id="cb59" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb59-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Do last fit</span></span>
<span id="cb59-2">wait_i_last_rf_q_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">last_fit</span>(wait_i_rf_q_wflow, xy_q_in_split)</span>
<span id="cb59-3"></span>
<span id="cb59-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>(wait_i_last_rf_q_results)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 4
  .metric .estimator .estimate .config             
  &lt;chr&gt;   &lt;chr&gt;          &lt;dbl&gt; &lt;chr&gt;               
1 rmse    standard       0.586 Preprocessor1_Model1
2 rsq     standard       0.991 Preprocessor1_Model1</code></pre>
</div>
<div class="sourceCode cell-code" id="cb61" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb61-1">wait_i_last_rf_q_rmse <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>(wait_i_last_rf_q_results) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb61-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(.metric <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rmse'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb61-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(.estimate) </span>
<span id="cb61-4"></span>
<span id="cb61-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Grab the predictions on the test data</span></span>
<span id="cb61-6">assess_wait_i_last_rf_q_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_predictions</span>(wait_i_last_rf_q_results)</span>
<span id="cb61-7"></span>
<span id="cb61-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Plot actual vs predicted.</span></span>
<span id="cb61-9">assess_wait_i_last_rf_q_results <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb61-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> mean_wait_i, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> .pred)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb61-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb61-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb61-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_obs_pred</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb61-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Actual mean initial wait time (min)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb61-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Metamodel prediction"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb61-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggtitle</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Random forest model - including queueing terms"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb61-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">annotate</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">40</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sprintf</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"RMSE = %8.3f"</span>, wait_i_last_rf_q_rmse))</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/tidymodels-mm/tidymodels_mm_files/figure-html/rf_q-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>Wow, those random forests can be tough to beat. Let’s see if the queueing related features helped the random forest by dropping those four terms from the formula.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb62" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb62-1">wait_i_rf_mod <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rand_forest</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mode =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"regression"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb62-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_engine</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">engine =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ranger"</span>)</span>
<span id="cb62-3"></span>
<span id="cb62-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a formula object. </span></span>
<span id="cb62-5">wait_i_rf_noq_formula <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mean_wait_i <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> patients_per_clinic_block <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb62-6">                         num_med_techs <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb62-7">                         num_rooms <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb62-8">                         vitals_time_mean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb62-9">                         exam_time_mean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb62-10">                         exam_time_cv2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb62-11">                         post_exam_time_mean</span>
<span id="cb62-12"></span>
<span id="cb62-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create recipe</span></span>
<span id="cb62-14">wait_i_rf_noq_recipe <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">recipe</span>(wait_i_rf_noq_formula, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> xy_q_in_train) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb62-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_poly</span>(patients_per_clinic_block, num_med_techs, num_rooms,</span>
<span id="cb62-16">          vitals_time_mean, exam_time_mean, exam_time_cv2, post_exam_time_mean,</span>
<span id="cb62-17">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">degree =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb62-18"></span>
<span id="cb62-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create workflow object (uses model object we created earlier)</span></span>
<span id="cb62-20">wait_i_rf_noq_wflow <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb62-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb62-22">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_model</span>(wait_i_rf_mod) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb62-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_recipe</span>(wait_i_rf_noq_recipe)</span>
<span id="cb62-24"></span>
<span id="cb62-25"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit the models</span></span>
<span id="cb62-26">keep_pred <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">control_resamples</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">save_pred =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">save_workflow =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb62-27"></span>
<span id="cb62-28">wait_i_rf_noq_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb62-29">  wait_i_rf_noq_wflow <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb62-30">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit_resamples</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">resamples =</span> in_train_splits, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">control =</span> keep_pred)</span>
<span id="cb62-31"></span>
<span id="cb62-32"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>(wait_i_rf_noq_results)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 6
  .metric .estimator  mean     n std_err .config             
  &lt;chr&gt;   &lt;chr&gt;      &lt;dbl&gt; &lt;int&gt;   &lt;dbl&gt; &lt;chr&gt;               
1 rmse    standard   1.71     50 0.0393  Preprocessor1_Model1
2 rsq     standard   0.951    50 0.00167 Preprocessor1_Model1</code></pre>
</div>
<div class="sourceCode cell-code" id="cb64" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb64-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Do last fit</span></span>
<span id="cb64-2">wait_i_last_rf_noq_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">last_fit</span>(wait_i_rf_noq_wflow, xy_q_in_split)</span>
<span id="cb64-3"></span>
<span id="cb64-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>(wait_i_last_rf_noq_results)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 4
  .metric .estimator .estimate .config             
  &lt;chr&gt;   &lt;chr&gt;          &lt;dbl&gt; &lt;chr&gt;               
1 rmse    standard       1.57  Preprocessor1_Model1
2 rsq     standard       0.963 Preprocessor1_Model1</code></pre>
</div>
<div class="sourceCode cell-code" id="cb66" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb66-1">wait_i_last_rf_noq_rmse <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>(wait_i_last_rf_noq_results) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb66-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(.metric <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rmse'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb66-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(.estimate) </span>
<span id="cb66-4"></span>
<span id="cb66-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Grab the predictions on the test data</span></span>
<span id="cb66-6">assess_wait_i_last_rf_noq_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_predictions</span>(wait_i_last_rf_noq_results)</span>
<span id="cb66-7"></span>
<span id="cb66-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Plot actual vs predicted.</span></span>
<span id="cb66-9">assess_wait_i_last_rf_noq_results <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb66-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> mean_wait_i, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> .pred)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb66-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb66-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb66-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_obs_pred</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb66-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Actual mean initial wait time (min)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb66-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Metamodel prediction"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb66-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggtitle</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Random forest model - no queueing terms"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb66-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">annotate</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">40</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sprintf</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"RMSE = %8.3f"</span>, wait_i_last_rf_noq_rmse))</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/tidymodels-mm/tidymodels_mm_files/figure-html/rf_noq-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>Yep, the queueing terms definitely helped with the random forest as they did with the polynomial regression model. You can see how there is systematic under prediction for larger wait times.</p>
</section>
<section id="out-of-design-analysis" class="level2">
<h2 class="anchored" data-anchor-id="out-of-design-analysis">Out of design analysis</h2>
<p>Now let’s compare the models on their ability to extrapolate beyond the experimental design points. All of the models contain the queueing inspired features. Start with the polynomial model.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb67" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb67-1">wait_i_poly_out_q_wflow <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb67-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb67-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_model</span>(wait_i_poly_mod) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb67-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_recipe</span>(wait_i_poly_q_recipe)</span>
<span id="cb67-5">  </span>
<span id="cb67-6">wait_i_poly_out_fit_q <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit</span>(wait_i_poly_out_q_wflow, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> xy_q_in)</span>
<span id="cb67-7">wait_i_poly_out_pred_q <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(wait_i_poly_out_fit_q, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">new_data =</span> xy_q_out)</span>
<span id="cb67-8"></span>
<span id="cb67-9"></span>
<span id="cb67-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Plot actual vs predicted.</span></span>
<span id="cb67-11">xy_q_out <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb67-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(mean_wait_i) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb67-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_cols</span>(wait_i_poly_out_pred_q) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span>  </span>
<span id="cb67-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> mean_wait_i, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> .pred)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb67-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb67-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb67-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_obs_pred</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb67-18">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Actual mean initial wait time (min)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb67-19">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Metamodel prediction"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb67-20">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggtitle</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Polynomial model - out of design extrapolation"</span>)</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/tidymodels-mm/tidymodels_mm_files/figure-html/metrics_poly_final_out-1.png" class="img-fluid" width="672"></p>
</div>
<div class="sourceCode cell-code" id="cb68" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb68-1">wait_i_poly_out_rmse_q <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rmse_vec</span>(xy_q_out<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>mean_wait_i, wait_i_poly_out_pred_q<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>.pred)</span>
<span id="cb68-2"></span>
<span id="cb68-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sprintf</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"RMSE for poly model on out of design points: %10.3f"</span>, wait_i_poly_out_rmse_q)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "RMSE for poly model on out of design points:      3.408"</code></pre>
</div>
</div>
<p>Now for the random forest.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb70" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb70-1">wait_i_rf_out_wflow_q <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb70-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb70-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_model</span>(wait_i_rf_mod) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb70-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_recipe</span>(wait_i_rf_q_recipe)</span>
<span id="cb70-5"></span>
<span id="cb70-6">wait_i_rf_out_wflow_noq <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb70-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb70-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_model</span>(wait_i_rf_mod) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb70-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_recipe</span>(wait_i_rf_noq_recipe)</span>
<span id="cb70-10"></span>
<span id="cb70-11">wait_i_rf_out_fit_q <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit</span>(wait_i_rf_out_wflow_q, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> xy_q_in)</span>
<span id="cb70-12">wait_i_rf_out_pred_q <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(wait_i_rf_out_fit_q, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">new_data =</span> xy_q_out)</span>
<span id="cb70-13"></span>
<span id="cb70-14">wait_i_rf_out_fit_noq <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit</span>(wait_i_rf_out_wflow_noq, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> xy_q_in)</span>
<span id="cb70-15">wait_i_rf_out_pred_noq <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(wait_i_rf_out_fit_noq, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">new_data =</span> xy_q_out)</span>
<span id="cb70-16"></span>
<span id="cb70-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Plot actual vs predicted.</span></span>
<span id="cb70-18">xy_q_out <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb70-19">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(mean_wait_i) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb70-20">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_cols</span>(wait_i_rf_out_pred_q) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span>  </span>
<span id="cb70-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> mean_wait_i, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> .pred)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb70-22">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb70-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb70-24">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_obs_pred</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb70-25">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Actual mean initial wait time (min)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb70-26">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Metamodel prediction"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb70-27">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggtitle</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Random forest model - out of design extrapolation"</span>)</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/tidymodels-mm/tidymodels_mm_files/figure-html/metrics_rf_final_out-1.png" class="img-fluid" width="672"></p>
</div>
<div class="sourceCode cell-code" id="cb71" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb71-1">wait_i_rf_out_rmse_q <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rmse_vec</span>(xy_q_out<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>mean_wait_i, wait_i_rf_out_pred_q<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>.pred)</span>
<span id="cb71-2">wait_i_rf_out_rmse_noq <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rmse_vec</span>(xy_q_out<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>mean_wait_i, wait_i_rf_out_pred_noq<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>.pred)</span>
<span id="cb71-3"></span>
<span id="cb71-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sprintf</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"RMSE for rf model on out of design points with queueing: %10.3f"</span>, wait_i_rf_out_rmse_q)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "RMSE for rf model on out of design points with queueing:      2.195"</code></pre>
</div>
<div class="sourceCode cell-code" id="cb73" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb73-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sprintf</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"RMSE for rf model on out of design points, no queueing: %10.3f"</span>, wait_i_rf_out_rmse_noq)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "RMSE for rf model on out of design points, no queueing:      3.889"</code></pre>
</div>
</div>
<p>Finally, here’s the nonlinear model.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb75" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb75-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Nonlinear models need starting guesses for the params</span></span>
<span id="cb75-2">init_nls_wait_i <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b1=</span>.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b2=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b3=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b4=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b5=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb75-3"></span>
<span id="cb75-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit the model</span></span>
<span id="cb75-5">wait_i_nls_out_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nls</span>(mean_wait_i <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> b1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (num_med_techs <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span> b2) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> </span>
<span id="cb75-6">  (mean_wait_i_dm1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span> b3) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(num_rooms <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span> b4) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (staff_eff_svc_time_cv2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span> b5),</span>
<span id="cb75-7">                 <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data=</span>xy_q_in, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">start=</span>init_nls_wait_i)</span>
<span id="cb75-8"></span>
<span id="cb75-9">wait_i_nls_out_pred <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(wait_i_nls_out_fit, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">newdata =</span> xy_q_out)</span>
<span id="cb75-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#wait_i_nls_out_pred &lt;- enframe(wait_i_nls_out_pred, name=NULL, value="wait_i_nls_out_pred")</span></span></code></pre></div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb76" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb76-1">wait_i_nls_out_rmse <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rmse_vec</span>(xy_q_out<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>mean_wait_i, wait_i_nls_out_pred)</span>
<span id="cb76-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sprintf</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"RMSE for nls model on out of design points: %10.3f"</span>, wait_i_nls_out_rmse)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "RMSE for nls model on out of design points:      3.033"</code></pre>
</div>
<div class="sourceCode cell-code" id="cb78" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb78-1">nls_plot_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> xy_q_out <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb78-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(mean_wait_i) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb78-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_cols</span>(wait_i_nls_out_pred) </span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>New names:
• `` -&gt; `...2`</code></pre>
</div>
<div class="sourceCode cell-code" id="cb80" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb80-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data=</span>nls_plot_data, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> mean_wait_i, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> wait_i_nls_out_pred)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb80-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb80-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb80-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_obs_pred</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb80-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Actual mean initial wait time (min)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb80-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Metamodel prediction"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb80-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggtitle</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Queueing based model - out of design extrapolation"</span>)</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/tidymodels-mm/tidymodels_mm_files/figure-html/metrics_nls_final_out-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>Like I said earlier, those darn random forests are tough to beat. :)</p>
</section>
<section id="key-takeaways" class="level2">
<h2 class="anchored" data-anchor-id="key-takeaways">Key takeaways</h2>
<p>The bottom line is that:</p>
<ul>
<li>we used a queueing inspired nonlinear model with only five parameters that outperformed polynomial and spline models with many more parameters in both in and out of design predictions (interpolation and extrapolation). The random forest model outperformed all three other techniques on the in design dataset but exhibited some systematic bias on the out of design dataset. Of course, the random forest model is a black box in comparison to the regression models.</li>
<li>the nonlinear queueing based model made sense in the context of the underlying process physics of the simulation model - each term was interpretable and had basis in theory,</li>
<li>there is no simple cookbook procedure to exploiting domain knowledge (queueing knowledge in this case). Each situation is different and the details of the feature engineering are going to be situational. It’s unrealistic to expect more.</li>
<li>offered load and resource utilization based features are often simple to develop and can be quite predictive.</li>
</ul>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{isken2023,
  author = {Mark Isken},
  title = {Using Tidymodels for Simulation Metamodeling},
  date = {2023-02-12},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-isken2023" class="csl-entry quarto-appendix-citeas">
Mark Isken. 2023. <span>“Using Tidymodels for Simulation
Metamodeling.”</span> February 12, 2023.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>simulation</category>
  <category>metamodeling</category>
  <guid>https://bitsofanalytics.org/posts/tidymodels-mm/tidymodels_mm.html</guid>
  <pubDate>Sun, 12 Feb 2023 05:00:00 GMT</pubDate>
</item>
<item>
  <title>First impressions of blogging with Quarto</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/port-to-quarto/</link>
  <description><![CDATA[ 



<p>As I mentioned in a <a href="https://bitsofanalytics.org/posts/welcome/">recent short post</a>, I’ve moved my blog to Quarto. I’ve been using various static blogging tools since 2014 when I moved my blog from <a href="https://bitsofanalytics.org/posts/drupal-to-pelican/drupal-to-pelican.html">Drupal to Pelican</a>. Pelican served me well for many years and allowed me to write posts in either restructured text or markdown and I could even create blogs from Jupyter notebooks or R Markdown documents. Yes, there was quite a bit of extension installing and fiddling with <code>conf.py</code>, but all in all, it worked really well.</p>
<p>Based on my experience with creating my <a href="https://bitsofanalytics.org/posts/sphinx-coursewebs/sphinx_coursewebs.html">teaching websites with Sphinx</a>, I decided to give it a try for blogging when I happened to run across the <a href="https://bitsofanalytics.org/posts/ablog-first-post/ablog_first_post.html">ablog Sphinx extension</a>. There was lot of exciting things happening in the Python publishing world such as <a href="https://jupyterbook.org/en/stable/intro.html">JupyterBook</a>, the <a href="https://myst-parser.readthedocs.io/en/latest/">MyST parser</a>, and <a href="https://myst-nb.readthedocs.io/">MyST-NB</a>. I was already a huge fan of Sphinx. Again, I was able to author posts in either reSt or markdown and could use Jupyter and R Studio. Some fiddling involved to make all these things play nicely together, but it all worked. I also moved my blog off of a commercial hosting site and used GitHub Pages. The workflow of pushing a commit to a repo resulting in a website refresh was a nice benefit.</p>
<p>So, why switch to <a href="https://quarto.org/">Quarto</a>?</p>
<section id="the-community" class="level2">
<h2 class="anchored" data-anchor-id="the-community">The community</h2>
<p>While the Sphinx community is terrific, not a lot of people seem to be blogging with it. The ablog extension changed hands and there’s not much active development aimed at new features for general users. As a long time member of both the Python and R communities, I really wanted a similar experience specific to blogging. When Quarto made it clear that it wasn’t just about R, I started paying attention. Then, a few Quarto based blogs caught my eye, such as <a href="https://blog.djnavarro.net/">Notes from a data witch</a> (easily the best data science blog title I’ve ever run across), <a href="https://www.javierorracadeatcu.com/blog.html">Data Science Bytes - Blog</a> and <a href="https://www.crosstab.io/articles/">Crosstab Data Science</a>. They looked great. The authors seemed happy with the authoring tools and experience. I got the feeling that this thing was really going to take off. Then I found these incredibly useful blog posts by new Quarto users and I was sold.</p>
<ul>
<li><a href="https://beamilz.com/posts/2022-06-05-creating-a-blog-with-quarto/en/">Creating a blog with Quarto in 10 steps</a></li>
<li><a href="https://blog.djnavarro.net/posts/2022-04-20_porting-to-quarto/">Porting a distill blog to quarto</a></li>
<li><a href="https://albert-rapp.de/posts/13_quarto_blog_writing_guide/13_quarto_blog_writing_guide.html">The ultimate guide to starting a Quarto blog</a></li>
</ul>
</section>
<section id="multi-language-support-that-just-works" class="level2">
<h2 class="anchored" data-anchor-id="multi-language-support-that-just-works">Multi-language support that just works</h2>
<p>I want to be able to blog from within R Studio, Jupyter Lab, or by writing markdown with some text editor. I also want a command line interface that I can use if I want. Quarto supports all of this. You can even mix R and Python (and more) within the same Quarto markdown document.</p>
</section>
<section id="great-documentation" class="level2">
<h2 class="anchored" data-anchor-id="great-documentation">Great documentation</h2>
<p><a href="https://quarto.org/docs/guide/">Check</a></p>
</section>
<section id="not-too-painful-to-port-sphinx-blog-to-quarto-blog" class="level2">
<h2 class="anchored" data-anchor-id="not-too-painful-to-port-sphinx-blog-to-quarto-blog">Not too painful to port Sphinx blog to Quarto blog</h2>
<p>My Sphinx based blog was made up of a mix of <a href="https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html">reStructuredText</a>, markdown and Jupyter notebook documents. I did end up touching every post (~60 posts) as this gave me a chance to fix up some tags and add some update notices at the top for some of my more dated posts. The strategy for each document type was as follows.</p>
<p>I used a Quarto folder structure in which each post lived in its own subfolder inside of a folder named <code>posts</code>. Inside of each individual post folder there were folders for <code>images</code> and <code>data</code> as needed.</p>
<section id="markdown-files" class="level3">
<h3 class="anchored" data-anchor-id="markdown-files">Markdown files</h3>
<p>I replaced the existing YAML header with a Quarto specific version and renamed the file to have a <code>.qmd</code> extension. I made sure any necessary image and data files got put into the appropriate subfolders. All of my previous posts had used the same <code>images</code> and <code>data</code> subfolders so no path changes were needed. That’s it.</p>
</section>
<section id="restructuredtext-files" class="level3">
<h3 class="anchored" data-anchor-id="restructuredtext-files">reStructuredText files</h3>
<p>For these, I first converted them to markdown format using <a href="https://pandoc.org/">pandoc</a>.</p>
<div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">pandoc</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-f</span> rst <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-t</span> markdown some-post.rst <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-o</span> some-post.qmd</span></code></pre></div>
<p>Then I did the same things I did with markdown files.</p>
</section>
<section id="jupyter-notebooks" class="level3">
<h3 class="anchored" data-anchor-id="jupyter-notebooks">Jupyter notebooks</h3>
<p>These already had a <em>raw</em> cell at the top of the notebook containing a YAML header. I just replaced the existing header with the Quarto version. I reran the entire notebook and then just plopped it into its new home in a subfolder of <code>posts</code>. The images get encoded and embedded in the notebook itself.</p>
</section>
</section>
<section id="ongoing-authoring" class="level2">
<h2 class="anchored" data-anchor-id="ongoing-authoring">Ongoing authoring</h2>
<p>I’ve authored a number of new posts using both R Studio and Jupyter notebooks. It’s extremely simple to just click “Render” from within R Studio and the site gets updated locally. Then using git to stage, commit, and push the changes to GitHub is all that’s needed to publish the updates to my Netlify hosted site. Setting up Netlify was super easy - I just followed Step 6 in <a href="https://beamilz.com/posts/2022-06-05-creating-a-blog-with-quarto/en/">Creating a blog with Quarto in 10 steps</a> and the Netlify docs.</p>
<p>The workflow for authoring and publishing is so simple that I find it motivates me to write more often.</p>
<p>As I get more time, I’ll dig back into some of the helpful blog posts mentioned above and do some more site customizaton. But for now, I’ve got it looking ok and I’m just focusing on creating and posting content.</p>


</section>

 ]]></description>
  <category>quarto</category>
  <category>blogging</category>
  <category>R</category>
  <category>python</category>
  <guid>https://bitsofanalytics.org/posts/port-to-quarto/</guid>
  <pubDate>Fri, 03 Feb 2023 05:00:00 GMT</pubDate>
</item>
<item>
  <title>Algal bloom detection extended tutorial - Part 3: Finding images of interest</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/algaebloom-part3/</link>
  <description><![CDATA[ 



<p>This is part of the <a href="https://bitsofanalytics.org/posts/geonewb-intro/">geonewb</a> series of posts.</p>
<p>In <a href="https://bitsofanalytics.org/posts/algaebloom-part2/">Part 2</a> of this series we got familiar with Microsoft’s Planetary computer and learned a bit about Sentinel-2 image files. Now we’ll move on in the tutorial to exploring:</p>
<ul>
<li><strong>programmatically finding and acquiring satellite image data based on location and date range</strong> (both Sentinel-2 and Landsat) from <strong>Microsoft’s Planetary Computer</strong>,</li>
<li>using GeoPandas for working with multiple image items</li>
</ul>
<p>As mentioned in Part 1, I’m following along and taking some deeper dives and various detours from the official <a href="https://drivendata.co/blog/tick-tick-bloom-benchmark">Getting Started Tutorial</a>.</p>
<p>In subsequent parts we’ll tackle the <strong>feature engineering</strong> and <strong>predictive modeling</strong> sections of the original tutorial.</p>
<div class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Need to do some date math and need to work with file paths</span></span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> datetime <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> timedelta, datetime</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> pathlib <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Path</span></code></pre></div>
</div>
<div class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> matplotlib.pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb2-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span></code></pre></div>
</div>
<div class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> IPython.display <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Image</span>
<span id="cb3-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> PIL <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Image <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> PILImage</span></code></pre></div>
</div>
<div class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>matplotlib inline</span></code></pre></div>
</div>
<section id="finding-and-acquiring-satellite-imagery-data" class="level1">
<h1>Finding and acquiring satellite imagery data</h1>
<p>Unline many challenges at <a href="https://www.drivendata.org/">DrivenData</a>, the feature data for this challenge is not directly provided. We need to get it using various APIs from specific sources. The date and location for each <code>uid</code> in the metadata can be used to find relevant satellite images from a number of different places. There are four approved data sources and all the details are described on the project home page at the following links:</p>
<ul>
<li><a href="https://www.drivendata.org/competitions/143/tick-tick-bloom/page/650/#sentinel-2-1">Sentinel-2 satellite imagery</a></li>
<li><a href="https://www.drivendata.org/competitions/143/tick-tick-bloom/page/650/#landsat-1">Landsat satellite imagery</a></li>
<li><a href="https://www.drivendata.org/competitions/143/tick-tick-bloom/page/650/#climate-data-1">NOAA’s High-Resolution Rapid Refresh (HRRR) climate data</a></li>
<li><a href="https://www.drivendata.org/competitions/143/tick-tick-bloom/page/650/#elevation-data-1">Copernicus DEM elevation data</a></li>
</ul>
<p>For now we will just focus on finding relevant Sentinel-2 and Landsat data. This <a href="https://www.drivendata.org/competitions/143/tick-tick-bloom/page/650/#satellite-imagery">page in the challenge site</a> has additional information and resources related to retrieving satellite imagery data. From that page you can get a very good high level overview of the different <em>levels</em> of satellite imagery data, top of atmosphere reflectance vs bottom of atmosphere reflectance, atmospheric corrections, spectral bands and algorithmic bands, as well as the relevant links for accessing data from the MPC.</p>
<p>From the main tutorial:</p>
<blockquote class="blockquote">
<p>The general steps we’ll use to pull satellite data are:</p>
<ol type="1">
<li><p>Establish a connection to the Planetary Computer’s STAC API using the planetary_computer and pystac_client Python packages.</p></li>
<li><p>Query the STAC API for scenes that capture our in situ labels. For each sample, we’ll search for imagery that includes the sample’s location (latitude and longitude) around the date the sample was taken. In this benchmark, we’ll use only Sentinel-2 L2A and Landsat Level-2 data.</p></li>
<li><p>Select one image for each sample. We’ll use Sentinel-2 data wherever it is available, because it is higher resolution. We’ll have to use Landsat for data before roughly 2016, because Sentinel-2 was not available yet.</p></li>
<li><p>Convert the image to a 1-dimensional list of features that can be input into our tree model</p></li>
</ol>
</blockquote>
</section>
<section id="code-driven-search-for-images-with-the-stac-api" class="level1">
<h1>Code driven search for images with the STAC API</h1>
<p>In Part 2, we manually used the MPC Explore feature to find an image of interest. Now, we’ll use a lat-long pair along with a date range to find all the images available for that location at that time. For the lat-long value of interest I picked out a point in the little body of water outlined below. The underlying image is from <a href="https://portal.opentopography.org/datasets">Open Topography</a>.</p>
<div class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">Image(url<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'images/thunder_bay_lake_point_labelled.png'</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="5">
<img src="https://bitsofanalytics.org/posts/algaebloom-part3/images/thunder_bay_lake_point_labelled.png">
</div>
</div>
<div class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1">lat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">45.03967636446461</span></span>
<span id="cb6-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">long</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">83.30284787280465</span></span></code></pre></div>
</div>
<div class="cell" data-tags="[]" data-execution_count="7">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Establish a connection to the STAC API</span></span>
<span id="cb7-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> planetary_computer <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pc</span>
<span id="cb7-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> pystac_client <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Client</span>
<span id="cb7-4"></span>
<span id="cb7-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Useful libs for working with this data</span></span>
<span id="cb7-6"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> geopy.distance <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> distance</span>
<span id="cb7-7"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> geopandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> gpd</span>
<span id="cb7-8"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> shapely</span>
<span id="cb7-9"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rioxarray</span></code></pre></div>
</div>
<p>Let’s create a few functions to make it easy to define a bounding box around a lat, long pair. This code is right from the tutorial.</p>
<blockquote class="blockquote">
<p>Time range: We want our feature data to be as close to the time of the sample as possible, because in algal blooms in small water bodies form and move very rapidly. Remember, you cannot use any data collected after the date of the sample.</p>
</blockquote>
<blockquote class="blockquote">
<p>Imagery taken with roughly 10 days of the sample will generally still be an accurate representation of environmental conditions at the time of the sample. For some data points you may not be able to get data within 10 days, and may have to use earlier data. We’ll search the fifteen days up to the sample time, including the sample date.</p>
</blockquote>
<div class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># get our bounding box to search latitude and longitude coordinates</span></span>
<span id="cb8-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> get_bounding_box(latitude, longitude, meter_buffer<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>):</span>
<span id="cb8-3">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb8-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Given a latitude, longitude, and buffer in meters, returns a bounding</span></span>
<span id="cb8-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    box around the point with the buffer on the left, right, top, and bottom.</span></span>
<span id="cb8-6"></span>
<span id="cb8-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Returns a list of [minx, miny, maxx, maxy]</span></span>
<span id="cb8-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    """</span></span>
<span id="cb8-9">    distance_search <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> distance.distance(meters<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>meter_buffer)</span>
<span id="cb8-10"></span>
<span id="cb8-11">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># calculate the lat/long bounds based on ground distance</span></span>
<span id="cb8-12">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># bearings are cardinal directions to move (south, west, north, and east)</span></span>
<span id="cb8-13">    min_lat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> distance_search.destination((latitude, longitude), bearing<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">180</span>)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb8-14">    min_long <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> distance_search.destination((latitude, longitude), bearing<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">270</span>)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb8-15">    max_lat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> distance_search.destination((latitude, longitude), bearing<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb8-16">    max_long <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> distance_search.destination((latitude, longitude), bearing<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">90</span>)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb8-17"></span>
<span id="cb8-18">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> [min_long, min_lat, max_long, max_lat]</span></code></pre></div>
</div>
<div class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># get our date range to search, and format correctly for query</span></span>
<span id="cb9-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> get_date_range(date, time_buffer_days<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>):</span>
<span id="cb9-3">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""Get a date range to search for in the planetary computer based</span></span>
<span id="cb9-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    on a sample's date. The time range will include the sample date</span></span>
<span id="cb9-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    and time_buffer_days days prior</span></span>
<span id="cb9-6"></span>
<span id="cb9-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Returns a string"""</span></span>
<span id="cb9-8">    datetime_format <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%Y-%m-</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%d</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">T"</span></span>
<span id="cb9-9">    range_start <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.to_datetime(date) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> timedelta(days<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>time_buffer_days)</span>
<span id="cb9-10">    date_range <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>range_start<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>strftime(datetime_format)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">/</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>pd<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>to_datetime(date)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>strftime(datetime_format)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb9-11"></span>
<span id="cb9-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> date_range</span></code></pre></div>
</div>
<div class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1">target_date <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2022-06-15"</span></span></code></pre></div>
</div>
<div class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1">target_date_range <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_date_range(target_date)</span>
<span id="cb11-2">target_date_range</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="11">
<pre><code>'2022-06-05T/2022-06-15T'</code></pre>
</div>
</div>
<p>This next step essentially “signs in” to the MPC catalog of data so that we can search and acquire the data we are interested in.</p>
<div class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1">catalog <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Client.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(</span>
<span id="cb13-2">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://planetarycomputer.microsoft.com/api/stac/v1"</span>, modifier<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>pc.sign_inplace</span>
<span id="cb13-3">)</span>
<span id="cb13-4"></span>
<span id="cb13-5">catalog</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="12">
&lt;Client id=microsoft-pc&gt;
</div>
</div>
<p>To search the catalog we will supply three different types of criteria:</p>
<ul>
<li>which collections to search (e.g.&nbsp;“sentinel-2-l2a”)</li>
<li>a bounding box of coordinates</li>
<li>a date range</li>
</ul>
<p>Any item with the specified collection(s), that intersect the bounding box and were acquired within the date range will be returned.</p>
<div class="cell" data-tags="[]" data-execution_count="13">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#help(Client.search)</span></span></code></pre></div>
</div>
<div class="cell" data-execution_count="14">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1">bbox <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_bounding_box(lat, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">long</span>, meter_buffer<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3000</span>)</span>
<span id="cb15-2">bbox</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="14">
<pre><code>[-83.34092260849519, 45.01268150971338, -83.2647731371141, 45.06667109107233]</code></pre>
</div>
</div>
<div class="cell" data-execution_count="15">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># search the planetary computer sentinel-l2a and Landsat level 2</span></span>
<span id="cb17-2">search <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> catalog.search(</span>
<span id="cb17-3">    collections<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sentinel-2-l2a"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"landsat-c2-l2"</span>], bbox<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>bbox, datetime<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>target_date_range</span>
<span id="cb17-4">)</span>
<span id="cb17-5"></span>
<span id="cb17-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># see how many items were returned</span></span>
<span id="cb17-7">items <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> search.get_all_items()</span>
<span id="cb17-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(items)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> items found'</span>)</span>
<span id="cb17-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'items is a </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(items)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb17-10"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'items[0] is a </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(items[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>17 items found
items is a &lt;class 'pystac.item_collection.ItemCollection'&gt;
items[0] is a &lt;class 'pystac.item.Item'&gt;</code></pre>
</div>
</div>
<p>Great, it worked. By looking at the <code>id</code> values, we can see how many Sentinel-2 vs Landsat images we found.</p>
<div class="cell" data-execution_count="16">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> item <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> items:</span>
<span id="cb19-2">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(item.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">id</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>S2B_MSIL2A_20220613T161829_R040_T17TLL_20220614T113426
S2B_MSIL2A_20220613T161829_R040_T17TLK_20220614T110341
S2B_MSIL2A_20220613T161829_R040_T16TGR_20220614T112541
S2B_MSIL2A_20220613T161829_R040_T16TGQ_20220614T112402
LC09_L2SP_020029_20220612_02_T1
S2A_MSIL2A_20220611T162911_R083_T17TLL_20220612T151658
S2A_MSIL2A_20220611T162911_R083_T16TGR_20220612T145248
LC08_L2SP_021029_20220611_02_T1
LC08_L2SP_021028_20220611_02_T1
S2A_MSIL2A_20220608T161841_R040_T17TLL_20220609T103649
S2A_MSIL2A_20220608T161841_R040_T17TLK_20220609T095528
S2A_MSIL2A_20220608T161841_R040_T16TGR_20220609T105106
S2A_MSIL2A_20220608T161841_R040_T16TGQ_20220609T103242
S2B_MSIL2A_20220606T162839_R083_T17TLL_20220607T020323
S2B_MSIL2A_20220606T162839_R083_T17TLK_20220607T023733
S2B_MSIL2A_20220606T162839_R083_T16TGR_20220607T015546
S2B_MSIL2A_20220606T162839_R083_T16TGQ_20220607T015218</code></pre>
</div>
</div>
<p>Look at the properties for a Sentinel-2 item and a Landsat item.</p>
<div class="cell" data-execution_count="17">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb21-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sentinel-2 item</span></span>
<span id="cb21-2">items[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].properties</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="17">
<pre><code>{'datetime': '2022-06-13T16:18:29.024000Z',
 'platform': 'Sentinel-2B',
 'proj:epsg': 32617,
 'instruments': ['msi'],
 's2:mgrs_tile': '17TLL',
 'constellation': 'Sentinel 2',
 's2:granule_id': 'S2B_OPER_MSI_L2A_TL_ESRI_20220614T113427_A027522_T17TLL_N04.00',
 'eo:cloud_cover': 87.161016,
 's2:datatake_id': 'GS2B_20220613T161829_027522_N04.00',
 's2:product_uri': 'S2B_MSIL2A_20220613T161829_N0400_R040_T17TLL_20220614T113426.SAFE',
 's2:datastrip_id': 'S2B_OPER_MSI_L2A_DS_ESRI_20220614T113427_S20220613T162212_N04.00',
 's2:product_type': 'S2MSI2A',
 'sat:orbit_state': 'descending',
 's2:datatake_type': 'INS-NOBS',
 's2:generation_time': '2022-06-14T11:34:26.294534Z',
 'sat:relative_orbit': 40,
 's2:water_percentage': 6.461042,
 's2:mean_solar_zenith': 25.4829228231717,
 's2:mean_solar_azimuth': 146.024549969213,
 's2:processing_baseline': '04.00',
 's2:snow_ice_percentage': 0.0,
 's2:vegetation_percentage': 6.113103,
 's2:thin_cirrus_percentage': 33.120424,
 's2:cloud_shadow_percentage': 0.067276,
 's2:nodata_pixel_percentage': 14.849125,
 's2:unclassified_percentage': 0.045276,
 's2:dark_features_percentage': 0.000943,
 's2:not_vegetated_percentage': 0.151341,
 's2:degraded_msi_data_percentage': 0.0127,
 's2:high_proba_clouds_percentage': 3.011082,
 's2:reflectance_conversion_factor': 0.970306720412633,
 's2:medium_proba_clouds_percentage': 51.029509,
 's2:saturated_defective_pixel_percentage': 0.0}</code></pre>
</div>
</div>
<div class="cell" data-execution_count="18">
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb23-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Landsat item</span></span>
<span id="cb23-2">items[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>].properties</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="18">
<pre><code>{'gsd': 30,
 'created': '2022-06-28T23:39:39.840750Z',
 'sci:doi': '10.5066/P9OGBGM6',
 'datetime': '2022-06-12T16:15:20.609822Z',
 'platform': 'landsat-9',
 'proj:epsg': 32617,
 'proj:shape': [8001, 7901],
 'description': 'Landsat Collection 2 Level-2',
 'instruments': ['oli', 'tirs'],
 'eo:cloud_cover': 79.41,
 'proj:transform': [30.0, 0.0, 245685.0, 0.0, -30.0, 5060415.0],
 'view:off_nadir': 0,
 'landsat:wrs_row': '029',
 'landsat:scene_id': 'LC90200292022163LGN00',
 'landsat:wrs_path': '020',
 'landsat:wrs_type': '2',
 'view:sun_azimuth': 138.06479334,
 'landsat:correction': 'L2SP',
 'view:sun_elevation': 63.60158079,
 'landsat:cloud_cover_land': 86.06,
 'landsat:collection_number': '02',
 'landsat:collection_category': 'T1'}</code></pre>
</div>
</div>
<p>Yep, some of the assets are different, though some are shared. Down below we’ll see that we’ll need to process the Landsat assets differently that we do the Sentinel-2 assets:</p>
<ul>
<li>Sentinel-2 contains a ‘visual’ band that includes the red, green, and blue bands,</li>
<li>Landsat has individual red, green and blue bands, but not a convenient ‘visual’ band,</li>
<li>From the <code>crop_landsat_image()</code> function in the tutorial, it looks likr Landsat RGB values need to be normalized to 0-255 to be consistent with Sentinel-2.</li>
<li>That same function uses <code>odc.stac.load()</code> instead of <code>rioxarray</code>.</li>
<li>The <code>gsd</code> property of the Landsat item indicates that the resolution is 30m. Sentinel-2 gives us 10m resolution for several of the bands.</li>
</ul>
</section>
<section id="do-items-contain-our-sample-point" class="level1">
<h1>Do items contain our sample point?</h1>
<p>GeoPandas can help us answer this question with spatial queries. From the original tutorial (yes, we are using a different point and different target date):</p>
<blockquote class="blockquote">
<p>Remember that our example measurement was taken on 2021-09-27 at coordinates (41.98006, -110.65734). Because we used a bounding box around the sample to search, the Planetary Computer returned all items that contain any part of that bounding box. This means we still have to double check whether each item actually contains our sample point.</p>
</blockquote>
<p><strong>UPDATE 2023-02-07</strong></p>
<p>While digging through some other tutorial and the STAC API docs, I realized that there is a built in <code>intersects=&lt;POINT&gt;</code> argument that avoid this whole issue of downloading a bunch of images that might not even intersect our point of interest. We can also easily avoid images with large cloud cover values. We can just do this:</p>
<div class="sourceCode" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb25-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> shapely.geometry <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Point</span>
<span id="cb25-2"></span>
<span id="cb25-3">cloud_cover_thresh <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span></span>
<span id="cb25-4"></span>
<span id="cb25-5">search <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> catalog.search(</span>
<span id="cb25-6">    collections<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sentinel-2-l2a"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"landsat-c2-l2"</span>],</span>
<span id="cb25-7">    intersects<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>Point((<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">long</span>, lat)),</span>
<span id="cb25-8">    datetime<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>target_date_range,</span>
<span id="cb25-9">    query<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>{<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"eo:cloud_cover"</span>: {<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lt"</span>: cloud_cover_thresh}},</span>
<span id="cb25-10">)</span></code></pre></div>
<p>Oh well, I needed to start to learn to use GeoPandas anyway.</p>
<p><strong>END OF UPDATE</strong></p>
<p>We will use GeoPandas to create a <code>GeoDataFrame</code> based on the collection of STAC items. This will allow us to do spatial queries such as checking if each item contains our sample point (a lat-long pair) - though it appears in the tutorial that a subset of this <code>GeoDataFrame</code> is converted to a pandas <code>DataFrame</code> and then the lat-long sample point value is manually checked to see if it’s in the bounding box. Seems like there must be a GeoPandas way to do the same.</p>
<p>What exactly is GeoPandas again? The basic idea is to combine the capabilites of pandas with the shapely library to allow you to work with geospatial data in a pandas-like way.</p>
<blockquote class="blockquote">
<p>GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by <a href="">shapely</a>https://shapely.readthedocs.io/en/stable/index.html. GeoPandas further depends on <a href="https://fiona.readthedocs.io/">fiona</a> for file access and matplotlib for plotting.</p>
</blockquote>
<p>Conventiently, you can create a <code>GeoDataFrame</code> from the features dictionary returned by the STAC items collection’s <code>to_dict</code> method.</p>
<div class="cell" data-tags="[]" data-execution_count="19">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb26-1">items_df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> gpd.GeoDataFrame.from_features(items.to_dict(), crs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"epsg:4326"</span>)</span>
<span id="cb26-2">items_df.head()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="19">

<div>

<table class="dataframe table table-sm table-striped">
  <thead>
    <tr>
      <th></th>
      <th>geometry</th>
      <th>datetime</th>
      <th>platform</th>
      <th>proj:epsg</th>
      <th>instruments</th>
      <th>s2:mgrs_tile</th>
      <th>constellation</th>
      <th>s2:granule_id</th>
      <th>eo:cloud_cover</th>
      <th>s2:datatake_id</th>
      <th>...</th>
      <th>landsat:wrs_row</th>
      <th>landsat:scene_id</th>
      <th>landsat:wrs_path</th>
      <th>landsat:wrs_type</th>
      <th>view:sun_azimuth</th>
      <th>landsat:correction</th>
      <th>view:sun_elevation</th>
      <th>landsat:cloud_cover_land</th>
      <th>landsat:collection_number</th>
      <th>landsat:collection_category</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>POLYGON ((-83.52837 45.03720, -83.50498 45.106...</td>
      <td>2022-06-13T16:18:29.024000Z</td>
      <td>Sentinel-2B</td>
      <td>32617</td>
      <td>[msi]</td>
      <td>17TLL</td>
      <td>Sentinel 2</td>
      <td>S2B_OPER_MSI_L2A_TL_ESRI_20220614T113427_A0275...</td>
      <td>87.161016</td>
      <td>GS2B_20220613T161829_027522_N04.00</td>
      <td>...</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>1</th>
      <td>POLYGON ((-83.53807 45.00860, -83.50498 45.106...</td>
      <td>2022-06-13T16:18:29.024000Z</td>
      <td>Sentinel-2B</td>
      <td>32617</td>
      <td>[msi]</td>
      <td>17TLK</td>
      <td>Sentinel 2</td>
      <td>S2B_OPER_MSI_L2A_TL_ESRI_20220614T110342_A0275...</td>
      <td>95.372748</td>
      <td>GS2B_20220613T161829_027522_N04.00</td>
      <td>...</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2</th>
      <td>POLYGON ((-83.53731 45.01084, -83.50498 45.106...</td>
      <td>2022-06-13T16:18:29.024000Z</td>
      <td>Sentinel-2B</td>
      <td>32616</td>
      <td>[msi]</td>
      <td>16TGR</td>
      <td>Sentinel 2</td>
      <td>S2B_OPER_MSI_L2A_TL_ESRI_20220614T112542_A0275...</td>
      <td>99.999827</td>
      <td>GS2B_20220613T161829_027522_N04.00</td>
      <td>...</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>3</th>
      <td>POLYGON ((-83.83576 44.11947, -83.80033 44.226...</td>
      <td>2022-06-13T16:18:29.024000Z</td>
      <td>Sentinel-2B</td>
      <td>32616</td>
      <td>[msi]</td>
      <td>16TGQ</td>
      <td>Sentinel 2</td>
      <td>S2B_OPER_MSI_L2A_TL_ESRI_20220614T112404_A0275...</td>
      <td>99.957287</td>
      <td>GS2B_20220613T161829_027522_N04.00</td>
      <td>...</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>4</th>
      <td>POLYGON ((-83.56556 45.66118, -84.14055 43.950...</td>
      <td>2022-06-12T16:15:20.609822Z</td>
      <td>landsat-9</td>
      <td>32617</td>
      <td>[oli, tirs]</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>79.410000</td>
      <td>NaN</td>
      <td>...</td>
      <td>029</td>
      <td>LC90200292022163LGN00</td>
      <td>020</td>
      <td>2</td>
      <td>138.064793</td>
      <td>L2SP</td>
      <td>63.601581</td>
      <td>86.06</td>
      <td>02</td>
      <td>T1</td>
    </tr>
  </tbody>
</table>
<p>5 rows × 51 columns</p>
</div>
</div>
</div>
<div class="cell" data-tags="[]" data-execution_count="20">
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb27-1">items_df.info()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;class 'geopandas.geodataframe.GeoDataFrame'&gt;
RangeIndex: 17 entries, 0 to 16
Data columns (total 51 columns):
 #   Column                                   Non-Null Count  Dtype   
---  ------                                   --------------  -----   
 0   geometry                                 17 non-null     geometry
 1   datetime                                 17 non-null     object  
 2   platform                                 17 non-null     object  
 3   proj:epsg                                17 non-null     int64   
 4   instruments                              17 non-null     object  
 5   s2:mgrs_tile                             14 non-null     object  
 6   constellation                            14 non-null     object  
 7   s2:granule_id                            14 non-null     object  
 8   eo:cloud_cover                           17 non-null     float64 
 9   s2:datatake_id                           14 non-null     object  
 10  s2:product_uri                           14 non-null     object  
 11  s2:datastrip_id                          14 non-null     object  
 12  s2:product_type                          14 non-null     object  
 13  sat:orbit_state                          14 non-null     object  
 14  s2:datatake_type                         14 non-null     object  
 15  s2:generation_time                       14 non-null     object  
 16  sat:relative_orbit                       14 non-null     float64 
 17  s2:water_percentage                      14 non-null     float64 
 18  s2:mean_solar_zenith                     14 non-null     float64 
 19  s2:mean_solar_azimuth                    14 non-null     float64 
 20  s2:processing_baseline                   14 non-null     object  
 21  s2:snow_ice_percentage                   14 non-null     float64 
 22  s2:vegetation_percentage                 14 non-null     float64 
 23  s2:thin_cirrus_percentage                14 non-null     float64 
 24  s2:cloud_shadow_percentage               14 non-null     float64 
 25  s2:nodata_pixel_percentage               14 non-null     float64 
 26  s2:unclassified_percentage               14 non-null     float64 
 27  s2:dark_features_percentage              14 non-null     float64 
 28  s2:not_vegetated_percentage              14 non-null     float64 
 29  s2:degraded_msi_data_percentage          14 non-null     float64 
 30  s2:high_proba_clouds_percentage          14 non-null     float64 
 31  s2:reflectance_conversion_factor         14 non-null     float64 
 32  s2:medium_proba_clouds_percentage        14 non-null     float64 
 33  s2:saturated_defective_pixel_percentage  14 non-null     float64 
 34  gsd                                      3 non-null      float64 
 35  created                                  3 non-null      object  
 36  sci:doi                                  3 non-null      object  
 37  proj:shape                               3 non-null      object  
 38  description                              3 non-null      object  
 39  proj:transform                           3 non-null      object  
 40  view:off_nadir                           3 non-null      float64 
 41  landsat:wrs_row                          3 non-null      object  
 42  landsat:scene_id                         3 non-null      object  
 43  landsat:wrs_path                         3 non-null      object  
 44  landsat:wrs_type                         3 non-null      object  
 45  view:sun_azimuth                         3 non-null      float64 
 46  landsat:correction                       3 non-null      object  
 47  view:sun_elevation                       3 non-null      float64 
 48  landsat:cloud_cover_land                 3 non-null      float64 
 49  landsat:collection_number                3 non-null      object  
 50  landsat:collection_category              3 non-null      object  
dtypes: float64(23), geometry(1), int64(1), object(26)
memory usage: 6.9+ KB</code></pre>
</div>
</div>
<p>Looks like there are clearly two different groups of images - fourteen Sentinel-2 images and three Landsat images. Let’s confirm by using the pandas-like capabilities of GeoPandas.</p>
<p>The <code>geometry</code> column has a <code>geometry</code> data type (a GeoPandas thing) and the values are POLYGON objects from the shapely library.</p>
<div class="cell" data-execution_count="21">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb29-1">items_df.groupby([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'platform'</span>])[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'platform'</span>]].count()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="21">

<div>

<table class="dataframe table table-sm table-striped">
  <thead>
    <tr>
      <th></th>
      <th>platform</th>
    </tr>
    <tr>
      <th>platform</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Sentinel-2A</th>
      <td>6</td>
    </tr>
    <tr>
      <th>Sentinel-2B</th>
      <td>8</td>
    </tr>
    <tr>
      <th>landsat-8</th>
      <td>2</td>
    </tr>
    <tr>
      <th>landsat-9</th>
      <td>1</td>
    </tr>
  </tbody>
</table>
</div>
</div>
</div>
<div class="cell" data-execution_count="22">
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb30-1">items_df.groupby([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'platform'</span>])[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'platform'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'s2:mgrs_tile'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'gsd'</span>]].count()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="22">

<div>

<table class="dataframe table table-sm table-striped">
  <thead>
    <tr>
      <th></th>
      <th>platform</th>
      <th>s2:mgrs_tile</th>
      <th>gsd</th>
    </tr>
    <tr>
      <th>platform</th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Sentinel-2A</th>
      <td>6</td>
      <td>6</td>
      <td>0</td>
    </tr>
    <tr>
      <th>Sentinel-2B</th>
      <td>8</td>
      <td>8</td>
      <td>0</td>
    </tr>
    <tr>
      <th>landsat-8</th>
      <td>2</td>
      <td>0</td>
      <td>2</td>
    </tr>
    <tr>
      <th>landsat-9</th>
      <td>1</td>
      <td>0</td>
      <td>1</td>
    </tr>
  </tbody>
</table>
</div>
</div>
</div>
<p>One of the advantages of GeoPandas is that we can do spatial queries.</p>
<div class="cell" data-execution_count="23">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb31-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a shapely Point object using the lat-long coordinates of interest</span></span>
<span id="cb31-2">sample_point <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> shapely.geometry.Point((<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">long</span>, lat))</span></code></pre></div>
</div>
<p>We can use <code>contains()</code> to check of the sample point is contained within the <code>geometry</code> object of each row in the <code>GeoDataFrame</code>. We’ll add a new boolean column which indicates whether or not GeoPandas classifies each row as containing the sample point.</p>
<div class="cell" data-execution_count="24">
<div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb32-1">items_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'gpd_contains_point'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> items_df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'geometry'</span>].contains(sample_point)</span>
<span id="cb32-2">items_df[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'geometry'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'gpd_contains_point'</span>]]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="24">

<div>

<table class="dataframe table table-sm table-striped">
  <thead>
    <tr>
      <th></th>
      <th>geometry</th>
      <th>gpd_contains_point</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>POLYGON ((-83.52837 45.03720, -83.50498 45.106...</td>
      <td>False</td>
    </tr>
    <tr>
      <th>1</th>
      <td>POLYGON ((-83.53807 45.00860, -83.50498 45.106...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>2</th>
      <td>POLYGON ((-83.53731 45.01084, -83.50498 45.106...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>3</th>
      <td>POLYGON ((-83.83576 44.11947, -83.80033 44.226...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>4</th>
      <td>POLYGON ((-83.56556 45.66118, -84.14055 43.950...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>5</th>
      <td>POLYGON ((-82.14621 45.08712, -82.15765 45.059...</td>
      <td>False</td>
    </tr>
    <tr>
      <th>6</th>
      <td>POLYGON ((-84.41644 46.02437, -83.00067 45.983...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>7</th>
      <td>POLYGON ((-85.12959 45.66361, -85.70455 43.952...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>8</th>
      <td>POLYGON ((-84.62940 47.08717, -85.22759 45.378...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>9</th>
      <td>POLYGON ((-83.51840 45.03736, -83.48218 45.144...</td>
      <td>False</td>
    </tr>
    <tr>
      <th>10</th>
      <td>POLYGON ((-83.53696 44.98261, -83.53192 44.997...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>11</th>
      <td>POLYGON ((-83.52750 45.01056, -83.48218 45.144...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>12</th>
      <td>POLYGON ((-83.82636 44.11920, -83.77829 44.263...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>13</th>
      <td>POLYGON ((-82.14615 45.08446, -82.15646 45.059...</td>
      <td>False</td>
    </tr>
    <tr>
      <th>14</th>
      <td>POLYGON ((-82.14615 45.08442, -82.20325 44.945...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>15</th>
      <td>POLYGON ((-84.41644 46.02437, -83.00067 45.983...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>16</th>
      <td>POLYGON ((-84.45737 45.12552, -83.06390 45.085...</td>
      <td>True</td>
    </tr>
  </tbody>
</table>
</div>
</div>
</div>
<p>Great. Now let’s pluck out just some key metadata along with the STAC item object itself and store in a <code>GeoDataFrame</code>. Then we’ll add a column indicating whether that item contains our sample point.</p>
<p>Ah, now I see why <a href="https://drivendata.co/blog/tick-tick-bloom-benchmark#Select-one-image">the original tutorial did the creation of a pandas DataFrame and a manual check of the sample point against the bbox</a> - the bbox doesn’t get added as a column to the <code>GeoDataFrame</code>. I’ll take a slightly different approach and leverage GeoPandas and shapely.</p>
<div class="cell" data-execution_count="25">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb33-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Need shape function from shapely to convert geometry dict to shape object</span></span>
<span id="cb33-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># https://stackoverflow.com/questions/68820085/how-to-convert-geojson-to-shapely-polygon</span></span>
<span id="cb33-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> shapely.geometry <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> shape</span>
<span id="cb33-4"></span>
<span id="cb33-5">item_details_gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> gpd.GeoDataFrame(</span>
<span id="cb33-6">    [</span>
<span id="cb33-7">        {</span>
<span id="cb33-8">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"datetime"</span>: item.datetime.strftime(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%Y-%m-</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%d</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>),</span>
<span id="cb33-9">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"geometry"</span>: shape(item.geometry),</span>
<span id="cb33-10">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"platform"</span>: item.properties[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"platform"</span>],</span>
<span id="cb33-11">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cloud_cover"</span>: item.properties[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'eo:cloud_cover'</span>],</span>
<span id="cb33-12">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"min_long"</span>: item.bbox[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],</span>
<span id="cb33-13">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"max_long"</span>: item.bbox[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>],</span>
<span id="cb33-14">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"min_lat"</span>: item.bbox[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb33-15">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"max_lat"</span>: item.bbox[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>],</span>
<span id="cb33-16">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bbox"</span>: item.bbox,</span>
<span id="cb33-17">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sample_point"</span>: sample_point,</span>
<span id="cb33-18">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"item_obj"</span>: item,</span>
<span id="cb33-19">        }</span>
<span id="cb33-20">        <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> item <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> items</span>
<span id="cb33-21">    ]</span>
<span id="cb33-22">)</span>
<span id="cb33-23"></span>
<span id="cb33-24"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add column indicating if sample point contained in item geometry</span></span>
<span id="cb33-25">item_details_gdf[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"contains_sample_point"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> item_details_gdf.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">apply</span>(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: x.geometry.contains(x.sample_point), axis<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb33-26"></span>
<span id="cb33-27"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(</span>
<span id="cb33-28">    <span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Filtering the GeoDataFrame resulted in </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>item_details_gdf<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>contains_sample_point<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">/</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(item_details_gdf)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> items that contain the sample location</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb33-29">)</span>
<span id="cb33-30"></span>
<span id="cb33-31"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Filter out the items that do NOT contain our sample point</span></span>
<span id="cb33-32">item_details_gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> item_details_gdf[item_details_gdf[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"contains_sample_point"</span>]]</span>
<span id="cb33-33">item_details_gdf[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"datetime"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"platform"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"contains_sample_point"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cloud_cover"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bbox"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"item_obj"</span>]].sort_values(</span>
<span id="cb33-34">    by<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"datetime"</span></span>
<span id="cb33-35">)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Filtering the GeoDataFrame resulted in 13/17 items that contain the sample location
</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="25">

<div>

<table class="dataframe table table-sm table-striped">
  <thead>
    <tr>
      <th></th>
      <th>datetime</th>
      <th>platform</th>
      <th>contains_sample_point</th>
      <th>cloud_cover</th>
      <th>bbox</th>
      <th>item_obj</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>14</th>
      <td>2022-06-06</td>
      <td>Sentinel-2B</td>
      <td>True</td>
      <td>100.000000</td>
      <td>[-83.54315, 44.13799675, -82.14615, 45.14807318]</td>
      <td>&lt;Item id=S2B_MSIL2A_20220606T162839_R083_T17TL...</td>
    </tr>
    <tr>
      <th>15</th>
      <td>2022-06-06</td>
      <td>Sentinel-2B</td>
      <td>True</td>
      <td>99.779093</td>
      <td>[-84.4613, 44.99758761, -83.00067, 46.02436508]</td>
      <td>&lt;Item id=S2B_MSIL2A_20220606T162839_R083_T16TG...</td>
    </tr>
    <tr>
      <th>16</th>
      <td>2022-06-06</td>
      <td>Sentinel-2B</td>
      <td>True</td>
      <td>100.000000</td>
      <td>[-84.50015, 44.09977092, -83.0639, 45.12552488]</td>
      <td>&lt;Item id=S2B_MSIL2A_20220606T162839_R083_T16TG...</td>
    </tr>
    <tr>
      <th>10</th>
      <td>2022-06-08</td>
      <td>Sentinel-2A</td>
      <td>True</td>
      <td>73.155582</td>
      <td>[-83.53696, 44.13799675, -82.12808, 45.14807318]</td>
      <td>&lt;Item id=S2A_MSIL2A_20220608T161841_R040_T17TL...</td>
    </tr>
    <tr>
      <th>11</th>
      <td>2022-06-08</td>
      <td>Sentinel-2A</td>
      <td>True</td>
      <td>62.271535</td>
      <td>[-83.5275, 44.99758761, -83.00067, 45.98904329]</td>
      <td>&lt;Item id=S2A_MSIL2A_20220608T161841_R040_T16TG...</td>
    </tr>
    <tr>
      <th>12</th>
      <td>2022-06-08</td>
      <td>Sentinel-2A</td>
      <td>True</td>
      <td>97.201419</td>
      <td>[-83.826355, 44.09977092, -83.0639, 45.09827524]</td>
      <td>&lt;Item id=S2A_MSIL2A_20220608T161841_R040_T16TG...</td>
    </tr>
    <tr>
      <th>6</th>
      <td>2022-06-11</td>
      <td>Sentinel-2A</td>
      <td>True</td>
      <td>37.726283</td>
      <td>[-84.4613, 44.99758761, -83.00067, 46.02436508]</td>
      <td>&lt;Item id=S2A_MSIL2A_20220611T162911_R083_T16TG...</td>
    </tr>
    <tr>
      <th>7</th>
      <td>2022-06-11</td>
      <td>landsat-8</td>
      <td>True</td>
      <td>36.880000</td>
      <td>[-85.73558553, 43.51420503, -82.75131771, 45.6...</td>
      <td>&lt;Item id=LC08_L2SP_021029_20220611_02_T1&gt;</td>
    </tr>
    <tr>
      <th>8</th>
      <td>2022-06-11</td>
      <td>landsat-8</td>
      <td>True</td>
      <td>49.950000</td>
      <td>[-85.3846599, 44.88733525, -82.18147251, 47.14...</td>
      <td>&lt;Item id=LC08_L2SP_021028_20220611_02_T1&gt;</td>
    </tr>
    <tr>
      <th>4</th>
      <td>2022-06-12</td>
      <td>landsat-9</td>
      <td>True</td>
      <td>79.410000</td>
      <td>[-84.26389548, 43.49313512, -81.21392752, 45.6...</td>
      <td>&lt;Item id=LC09_L2SP_020029_20220612_02_T1&gt;</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2022-06-13</td>
      <td>Sentinel-2B</td>
      <td>True</td>
      <td>95.372748</td>
      <td>[-83.53807319, 44.13799675, -82.1280813, 45.14...</td>
      <td>&lt;Item id=S2B_MSIL2A_20220613T161829_R040_T17TL...</td>
    </tr>
    <tr>
      <th>2</th>
      <td>2022-06-13</td>
      <td>Sentinel-2B</td>
      <td>True</td>
      <td>99.999827</td>
      <td>[-83.5373128, 44.99758761, -83.00067944, 45.98...</td>
      <td>&lt;Item id=S2B_MSIL2A_20220613T161829_R040_T16TG...</td>
    </tr>
    <tr>
      <th>3</th>
      <td>2022-06-13</td>
      <td>Sentinel-2B</td>
      <td>True</td>
      <td>99.957287</td>
      <td>[-83.83576164, 44.09977092, -83.06390244, 45.0...</td>
      <td>&lt;Item id=S2B_MSIL2A_20220613T161829_R040_T16TG...</td>
    </tr>
  </tbody>
</table>
</div>
</div>
</div>
<p>Confirm we’ve create a <code>GeoDataFrame</code> containing a geometry column with an actual <code>geometry</code> dtype.</p>
<div class="cell" data-tags="[]" data-execution_count="26">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb35-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(item_details_gdf)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="26">
<pre><code>geopandas.geodataframe.GeoDataFrame</code></pre>
</div>
</div>
<div class="cell" data-execution_count="27">
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb37-1">item_details_gdf.info()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;class 'geopandas.geodataframe.GeoDataFrame'&gt;
Int64Index: 13 entries, 1 to 16
Data columns (total 12 columns):
 #   Column                 Non-Null Count  Dtype   
---  ------                 --------------  -----   
 0   datetime               13 non-null     object  
 1   geometry               13 non-null     geometry
 2   platform               13 non-null     object  
 3   cloud_cover            13 non-null     float64 
 4   min_long               13 non-null     float64 
 5   max_long               13 non-null     float64 
 6   min_lat                13 non-null     float64 
 7   max_lat                13 non-null     float64 
 8   bbox                   13 non-null     object  
 9   sample_point           13 non-null     object  
 10  item_obj               13 non-null     object  
 11  contains_sample_point  13 non-null     bool    
dtypes: bool(1), float64(5), geometry(1), object(5)
memory usage: 1.2+ KB</code></pre>
</div>
</div>
</section>
<section id="first-steps-in-getting-to-modeling-features" class="level1">
<h1>First steps in getting to modeling features</h1>
<p>So, how to make use of these samples for a predictive model? For now, we’ll take a similar approach taken in the original tutorial.</p>
<blockquote class="blockquote">
<p>To keep things simple in this benchmark, we’ll just choose one to input into our benchmark model. Note that in your solution, you could find a way to incorporate multiple images!</p>
<p>We’ll narrow to one image in two steps: - If any Sentinel imagery is available, filter to only Sentinel imagery. Sentinel-2 is higher resolution than Landsat, which is extremely helpful for blooms in small water bodies. In this case, two images are from Sentinel and contain the actual sample location. - Select the item that is the closest time wise to the sampling date. This gives us a Sentinel-2A item that was captured on 10/20/2022 - two days before our sample was collected on 10/22.</p>
<p>This is a very simple way to choose the best image. You may want to explore additional strategies like selecting an image with less cloud cover obscuring the Earth’s surface (as in this <a href="https://planetarycomputer.microsoft.com/dataset/sentinel-2-l2a#Example-Notebook">tutorial</a>).</p>
</blockquote>
<div class="cell" data-execution_count="28">
<div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb39-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 1 - filter to sentinel using the str accessor</span></span>
<span id="cb39-2">item_details_gdf[item_details_gdf.platform.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>.contains(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sentinel"</span>)]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="28">

<div>

<table class="dataframe table table-sm table-striped">
  <thead>
    <tr>
      <th></th>
      <th>datetime</th>
      <th>geometry</th>
      <th>platform</th>
      <th>cloud_cover</th>
      <th>min_long</th>
      <th>max_long</th>
      <th>min_lat</th>
      <th>max_lat</th>
      <th>bbox</th>
      <th>sample_point</th>
      <th>item_obj</th>
      <th>contains_sample_point</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>1</th>
      <td>2022-06-13</td>
      <td>POLYGON ((-83.53807 45.00860, -83.50498 45.106...</td>
      <td>Sentinel-2B</td>
      <td>95.372748</td>
      <td>-83.538073</td>
      <td>-82.128081</td>
      <td>44.137997</td>
      <td>45.148073</td>
      <td>[-83.53807319, 44.13799675, -82.1280813, 45.14...</td>
      <td>POINT (-83.30284787280465 45.03967636446461)</td>
      <td>&lt;Item id=S2B_MSIL2A_20220613T161829_R040_T17TL...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>2</th>
      <td>2022-06-13</td>
      <td>POLYGON ((-83.53731 45.01084, -83.50498 45.106...</td>
      <td>Sentinel-2B</td>
      <td>99.999827</td>
      <td>-83.537313</td>
      <td>-83.000679</td>
      <td>44.997588</td>
      <td>45.989316</td>
      <td>[-83.5373128, 44.99758761, -83.00067944, 45.98...</td>
      <td>POINT (-83.30284787280465 45.03967636446461)</td>
      <td>&lt;Item id=S2B_MSIL2A_20220613T161829_R040_T16TG...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>3</th>
      <td>2022-06-13</td>
      <td>POLYGON ((-83.83576 44.11947, -83.80033 44.226...</td>
      <td>Sentinel-2B</td>
      <td>99.957287</td>
      <td>-83.835762</td>
      <td>-83.063902</td>
      <td>44.099771</td>
      <td>45.098555</td>
      <td>[-83.83576164, 44.09977092, -83.06390244, 45.0...</td>
      <td>POINT (-83.30284787280465 45.03967636446461)</td>
      <td>&lt;Item id=S2B_MSIL2A_20220613T161829_R040_T16TG...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>6</th>
      <td>2022-06-11</td>
      <td>POLYGON ((-84.41644 46.02437, -83.00067 45.983...</td>
      <td>Sentinel-2A</td>
      <td>37.726283</td>
      <td>-84.461300</td>
      <td>-83.000670</td>
      <td>44.997588</td>
      <td>46.024365</td>
      <td>[-84.4613, 44.99758761, -83.00067, 46.02436508]</td>
      <td>POINT (-83.30284787280465 45.03967636446461)</td>
      <td>&lt;Item id=S2A_MSIL2A_20220611T162911_R083_T16TG...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>10</th>
      <td>2022-06-08</td>
      <td>POLYGON ((-83.53696 44.98261, -83.53192 44.997...</td>
      <td>Sentinel-2A</td>
      <td>73.155582</td>
      <td>-83.536960</td>
      <td>-82.128080</td>
      <td>44.137997</td>
      <td>45.148073</td>
      <td>[-83.53696, 44.13799675, -82.12808, 45.14807318]</td>
      <td>POINT (-83.30284787280465 45.03967636446461)</td>
      <td>&lt;Item id=S2A_MSIL2A_20220608T161841_R040_T17TL...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>11</th>
      <td>2022-06-08</td>
      <td>POLYGON ((-83.52750 45.01056, -83.48218 45.144...</td>
      <td>Sentinel-2A</td>
      <td>62.271535</td>
      <td>-83.527500</td>
      <td>-83.000670</td>
      <td>44.997588</td>
      <td>45.989043</td>
      <td>[-83.5275, 44.99758761, -83.00067, 45.98904329]</td>
      <td>POINT (-83.30284787280465 45.03967636446461)</td>
      <td>&lt;Item id=S2A_MSIL2A_20220608T161841_R040_T16TG...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>12</th>
      <td>2022-06-08</td>
      <td>POLYGON ((-83.82636 44.11920, -83.77829 44.263...</td>
      <td>Sentinel-2A</td>
      <td>97.201419</td>
      <td>-83.826355</td>
      <td>-83.063900</td>
      <td>44.099771</td>
      <td>45.098275</td>
      <td>[-83.826355, 44.09977092, -83.0639, 45.09827524]</td>
      <td>POINT (-83.30284787280465 45.03967636446461)</td>
      <td>&lt;Item id=S2A_MSIL2A_20220608T161841_R040_T16TG...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>14</th>
      <td>2022-06-06</td>
      <td>POLYGON ((-82.14615 45.08442, -82.20325 44.945...</td>
      <td>Sentinel-2B</td>
      <td>100.000000</td>
      <td>-83.543150</td>
      <td>-82.146150</td>
      <td>44.137997</td>
      <td>45.148073</td>
      <td>[-83.54315, 44.13799675, -82.14615, 45.14807318]</td>
      <td>POINT (-83.30284787280465 45.03967636446461)</td>
      <td>&lt;Item id=S2B_MSIL2A_20220606T162839_R083_T17TL...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>15</th>
      <td>2022-06-06</td>
      <td>POLYGON ((-84.41644 46.02437, -83.00067 45.983...</td>
      <td>Sentinel-2B</td>
      <td>99.779093</td>
      <td>-84.461300</td>
      <td>-83.000670</td>
      <td>44.997588</td>
      <td>46.024365</td>
      <td>[-84.4613, 44.99758761, -83.00067, 46.02436508]</td>
      <td>POINT (-83.30284787280465 45.03967636446461)</td>
      <td>&lt;Item id=S2B_MSIL2A_20220606T162839_R083_T16TG...</td>
      <td>True</td>
    </tr>
    <tr>
      <th>16</th>
      <td>2022-06-06</td>
      <td>POLYGON ((-84.45737 45.12552, -83.06390 45.085...</td>
      <td>Sentinel-2B</td>
      <td>100.000000</td>
      <td>-84.500150</td>
      <td>-83.063900</td>
      <td>44.099771</td>
      <td>45.125525</td>
      <td>[-84.50015, 44.09977092, -83.0639, 45.12552488]</td>
      <td>POINT (-83.30284787280465 45.03967636446461)</td>
      <td>&lt;Item id=S2B_MSIL2A_20220606T162839_R083_T16TG...</td>
      <td>True</td>
    </tr>
  </tbody>
</table>
</div>
</div>
</div>
<p>The closest date appears to be the Sentinel-2B image taken on 2022-06-13. However, the images from that date have really high cloud coverage values. So, let’s take the Sentinel-2 image with the lowest cloud cover value.</p>
<div class="cell" data-execution_count="29">
<div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb40-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2 - take lowest cloud cover</span></span>
<span id="cb40-2">best_item <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (</span>
<span id="cb40-3">    item_details_gdf[item_details_gdf.platform.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>.contains(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sentinel"</span>)]</span>
<span id="cb40-4">    .sort_values(by<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cloud_cover"</span>, ascending<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb40-5">    .iloc[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb40-6">)</span>
<span id="cb40-7">best_item</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="29">
<pre><code>datetime                                                        2022-06-11
geometry                 POLYGON ((-84.41644 46.0243651, -83.00067 45.9...
platform                                                       Sentinel-2A
cloud_cover                                                      37.726283
min_long                                                          -84.4613
max_long                                                         -83.00067
min_lat                                                          44.997588
max_lat                                                          46.024365
bbox                       [-84.4613, 44.99758761, -83.00067, 46.02436508]
sample_point                  POINT (-83.30284787280465 45.03967636446461)
item_obj                 &lt;Item id=S2A_MSIL2A_20220611T162911_R083_T16TG...
contains_sample_point                                                 True
Name: 6, dtype: object</code></pre>
</div>
</div>
<p>The actual <a href="https://www.cogeo.org/">COG</a> is accessible through <code>item_obj</code> (our name), which is just a <code>pystac.item.Item</code> object.</p>
<div class="cell" data-tags="[]" data-toc-hr-collapsed="true" data-execution_count="30">
<div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb42-1">best_item.item_obj</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="30">
&lt;Item id=S2A_MSIL2A_20220611T162911_R083_T16TGR_20220612T145248&gt;
</div>
</div>
<p>Using stuff we did in Part 2, let’s crop the image around our sample point and take a look.</p>
<div class="cell" data-execution_count="31">
<div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb43-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> crop_sentinel_image(item, bounding_box, asset_str):</span>
<span id="cb43-2">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb43-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Given a STAC item from Sentinel-2 and a bounding box tuple in the format</span></span>
<span id="cb43-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    (minx, miny, maxx, maxy), return a cropped portion of the item's visual</span></span>
<span id="cb43-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    imagery in the bounding box.</span></span>
<span id="cb43-6"></span>
<span id="cb43-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Returns the image as a numpy array with dimensions (color band, height, width)</span></span>
<span id="cb43-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    """</span></span>
<span id="cb43-9">    (minx, miny, maxx, maxy) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bounding_box</span>
<span id="cb43-10"></span>
<span id="cb43-11">    cropped_image <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rioxarray.open_rasterio(pc.sign(item.assets[asset_str].href)).rio.clip_box(</span>
<span id="cb43-12">        minx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>minx,</span>
<span id="cb43-13">        miny<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>miny,</span>
<span id="cb43-14">        maxx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>maxx,</span>
<span id="cb43-15">        maxy<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>maxy,</span>
<span id="cb43-16">        crs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"EPSG:4326"</span>,</span>
<span id="cb43-17">    )</span>
<span id="cb43-18"></span>
<span id="cb43-19">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> cropped_image</span></code></pre></div>
</div>
<div class="cell" data-execution_count="32">
<div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb44-1">bbox_small <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_bounding_box(lat, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">long</span>, meter_buffer<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">500</span>)</span>
<span id="cb44-2">bbox_small</span>
<span id="cb44-3"></span>
<span id="cb44-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Crop the image</span></span>
<span id="cb44-5">cropped_img <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crop_sentinel_image(best_item.item_obj, bbox_small, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'visual'</span>)</span>
<span id="cb44-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'cropped_image is a </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(cropped_img)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> with dimensions of </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>cropped_img<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>dims<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> and shape = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>cropped_img<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb44-7"></span>
<span id="cb44-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a numpy array from the cropped image</span></span>
<span id="cb44-9">cropped_img_array <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cropped_img.to_numpy()</span>
<span id="cb44-10"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'cropped_image_array is a </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(cropped_img_array)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> with shape = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>cropped_img_array<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>cropped_image is a &lt;class 'xarray.core.dataarray.DataArray'&gt; with dimensions of ('band', 'y', 'x') and shape = (3, 106, 106)
cropped_image_array is a &lt;class 'numpy.ndarray'&gt; with shape = (3, 106, 106)</code></pre>
</div>
</div>
<p>You can see how the xarray package adds dimension names to numpy arrays.</p>
<p>We have to transpose some of the dimensions to plot since matplotlib expects channels in a certain order (y, x, band). Note that the band dimension is of length 3 - red, green and blue.</p>
<div class="cell" data-tags="[]" data-execution_count="33">
<div class="sourceCode cell-code" id="cb46" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb46-1">plt.imshow(np.transpose(cropped_img_array, axes<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="33">
<pre><code>&lt;matplotlib.image.AxesImage at 0x7ff7dca7d0a0&gt;</code></pre>
</div>
<div class="cell-output cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/algaebloom-part3/index_files/figure-html/cell-34-output-2.png" class="img-fluid"></p>
</div>
</div>
</section>
<section id="what-about-those-landsat-images" class="level1">
<h1>What about those Landsat images?</h1>
<p>We kind of just dismissed the Landsat images in the previous section. Let’s learn a little more about Landsat imagery and do some pre-processing to get these images to be more comparable to the Sentinel-2 images in terms of the underlying pixel values.</p>
<section id="some-landsat-background-and-analysis-challenges" class="level2">
<h2 class="anchored" data-anchor-id="some-landsat-background-and-analysis-challenges">Some Landsat background and analysis challenges</h2>
<p>Landsat, a joint NASA/USGS program, provides the longest continuous space-based record of Earth’s land in existence.</p>
<p>There have been many Landsat missions since the original launch in 1972. The competition data only goes back to 2013, so participants should only use Landsat 8 and Landsat 9. Participants may not use any previous Landsat missions. Landsat 8 and Landsat 9 satellites are out of phase with one another, so that between the two each point on the Earth is revisited every 8 days. The data collected by Landsat 9 is very similar to Landsat 8.</p>
<p>Participants may use either level-1 or level-2 data, but may not use level-3. In addition to bottom-of-atmosphere reflectance, Landsat level-2 also includes a measurement of surface temperature, which is relevant to the behavior of algal blooms.</p>
<p>From https://planetarycomputer.microsoft.com/dataset/landsat-c2-l2</p>
<blockquote class="blockquote">
<p>Landsat Collection 2 Level-2 Science Products, consisting of atmospherically corrected surface reflectance and surface temperature image data. Collection 2 Level-2 Science Products are available from August 22, 1982 to present.</p>
<p>This dataset represents the global archive of Level-2 data from Landsat Collection 2 acquired by the Thematic Mapper onboard Landsat 4 and 5, the Enhanced Thematic Mapper onboard Landsat 7, and the Operatational Land Imager and Thermal Infrared Sensor onboard Landsat 8 and 9. Images are stored in cloud-optimized GeoTIFF format.</p>
</blockquote>
<p>From the main tutorial:</p>
<blockquote class="blockquote">
<p>Note that unlike Sentinel-2 imagery, Landsat imagery is not originally returned with image values scaled to 0-255. Our function above scales the pixel values with cv2.normalize(image_array, None, 0, 255, cv2.NORM_MINMAX) so that it is more comparable to our Sentinel-2 imagery, and we can input both of them as features into our model. You may want to explore other methods of converting Sentinel and Landsat imagery to comparable scales to make sure that no information is lost when re-scaling.</p>
</blockquote>
</section>
<section id="exploring-a-landsat-image" class="level2">
<h2 class="anchored" data-anchor-id="exploring-a-landsat-image">Exploring a Landsat image</h2>
<p>We have a few Landsat images in the collection of items we found and they had pretty low cloud cover values. Let’s get the one with the lowest cloud cover.</p>
<div class="cell" data-execution_count="34">
<div class="sourceCode cell-code" id="cb48" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb48-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2 - take lowest cloud cover</span></span>
<span id="cb48-2">best_item_landsat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (</span>
<span id="cb48-3">    item_details_gdf[item_details_gdf.platform.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>.contains(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"landsat"</span>)]</span>
<span id="cb48-4">    .sort_values(by<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cloud_cover"</span>, ascending<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb48-5">    .iloc[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb48-6">)</span>
<span id="cb48-7">best_item_landsat</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="34">
<pre><code>datetime                                                        2022-06-11
geometry                 POLYGON ((-85.1295869 45.6636126, -85.7045499 ...
platform                                                         landsat-8
cloud_cover                                                          36.88
min_long                                                        -85.735586
max_long                                                        -82.751318
min_lat                                                          43.514205
max_lat                                                          45.676265
bbox                     [-85.73558553, 43.51420503, -82.75131771, 45.6...
sample_point                  POINT (-83.30284787280465 45.03967636446461)
item_obj                         &lt;Item id=LC08_L2SP_021029_20220611_02_T1&gt;
contains_sample_point                                                 True
Name: 7, dtype: object</code></pre>
</div>
</div>
<p>Again, the COG itself is in the <code>item_obj</code> column. The problem is that the assets that exist for Landsat COGs are slightly different than those for Sentinel-2.</p>
<div class="cell" data-execution_count="35">
<div class="sourceCode cell-code" id="cb50" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb50-1"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> asset_key, asset <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> best_item_landsat.item_obj.assets.items():</span>
<span id="cb50-2">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>asset_key<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&lt;25}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> - </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>asset<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>title<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>qa                        - Surface Temperature Quality Assessment Band
ang                       - Angle Coefficients File
red                       - Red Band
blue                      - Blue Band
drad                      - Downwelled Radiance Band
emis                      - Emissivity Band
emsd                      - Emissivity Standard Deviation Band
trad                      - Thermal Radiance Band
urad                      - Upwelled Radiance Band
atran                     - Atmospheric Transmittance Band
cdist                     - Cloud Distance Band
green                     - Green Band
nir08                     - Near Infrared Band 0.8
lwir11                    - Surface Temperature Band
swir16                    - Short-wave Infrared Band 1.6
swir22                    - Short-wave Infrared Band 2.2
coastal                   - Coastal/Aerosol Band
mtl.txt                   - Product Metadata File (txt)
mtl.xml                   - Product Metadata File (xml)
mtl.json                  - Product Metadata File (json)
qa_pixel                  - Pixel Quality Assessment Band
qa_radsat                 - Radiometric Saturation and Terrain Occlusion Quality Assessment Band
qa_aerosol                - Aerosol Quality Assessment Band
tilejson                  - TileJSON with default rendering
rendered_preview          - Rendered preview</code></pre>
</div>
</div>
<p>You can find a <a href="https://planetarycomputer.microsoft.com/dataset/landsat-c2-l2">good data dictionary at this MPC Landsat page</a>.</p>
<p>We aren’t going to dig into all the details here, but instead will just focus on the issue raised in the original tutorial. The red, blue and green bands in the Landsat item are not scaled on a 0-255 scale. Here’s the code from the original tutorial that uses the odc-stac and opencv libraries to normalize the red, blue and green bands to 0-255. A few things of note:</p>
<ul>
<li>the <code>odc.stac.stac_load()</code> function allows you to specify a list of bands to load</li>
<li>the <code>cv2.normalize()</code> function is used to put the values on the 0-255 scale</li>
</ul>
<div class="sourceCode" id="cb52" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb52-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> crop_landsat_image(item, bounding_box):</span>
<span id="cb52-2">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb52-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Given a STAC item from Landsat and a bounding box tuple in the format</span></span>
<span id="cb52-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    (minx, miny, maxx, maxy), return a cropped portion of the item's visual</span></span>
<span id="cb52-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    imagery in the bounding box.</span></span>
<span id="cb52-6"></span>
<span id="cb52-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Returns the image as a numpy array with dimensions (color band, height, width)</span></span>
<span id="cb52-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    """</span></span>
<span id="cb52-9">    (minx, miny, maxx, maxy) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bounding_box</span>
<span id="cb52-10"></span>
<span id="cb52-11">    image <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> odc.stac.stac_load(</span>
<span id="cb52-12">        [pc.sign(item)], bands<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"green"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>], bbox<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[minx, miny, maxx, maxy]</span>
<span id="cb52-13">    ).isel(time<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb52-14">    image_array <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> image[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"green"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>]].to_array().to_numpy()</span>
<span id="cb52-15"></span>
<span id="cb52-16">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># normalize to 0 - 255 values</span></span>
<span id="cb52-17">    image_array <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cv2.normalize(image_array, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">255</span>, cv2.NORM_MINMAX)</span>
<span id="cb52-18"></span>
<span id="cb52-19">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> image_array</span></code></pre></div>
<p>Unfortunately, I had all kinds of problems getting odc-stac and opencv (includes the cv2 module) installed in my conda virtual environment. I figured instead of fighting with that, I’d figure out how to use rioxarray instead of odc-stac to load the bands into arrays and simply write my own normalize function to do the rescaling.</p>
<p>First, just to illustrate the scale problem, I’ll load the red band.</p>
<div class="cell" data-execution_count="36">
<div class="sourceCode cell-code" id="cb53" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb53-1">red_href <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> best_item_landsat.item_obj.assets[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>].href</span>
<span id="cb53-2">ds <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rioxarray.open_rasterio(red_href)</span>
<span id="cb53-3">ds</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="36">

<div><svg style="position: absolute; width: 0; height: 0; overflow: hidden">
<defs>
<symbol id="icon-database" viewbox="0 0 32 32">
<path d="M16 0c-8.837 0-16 2.239-16 5v4c0 2.761 7.163 5 16 5s16-2.239 16-5v-4c0-2.761-7.163-5-16-5z"></path>
<path d="M16 17c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z"></path>
<path d="M16 26c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z"></path>
</symbol>
<symbol id="icon-file-text2" viewbox="0 0 32 32">
<path d="M28.681 7.159c-0.694-0.947-1.662-2.053-2.724-3.116s-2.169-2.030-3.116-2.724c-1.612-1.182-2.393-1.319-2.841-1.319h-15.5c-1.378 0-2.5 1.121-2.5 2.5v27c0 1.378 1.122 2.5 2.5 2.5h23c1.378 0 2.5-1.122 2.5-2.5v-19.5c0-0.448-0.137-1.23-1.319-2.841zM24.543 5.457c0.959 0.959 1.712 1.825 2.268 2.543h-4.811v-4.811c0.718 0.556 1.584 1.309 2.543 2.268zM28 29.5c0 0.271-0.229 0.5-0.5 0.5h-23c-0.271 0-0.5-0.229-0.5-0.5v-27c0-0.271 0.229-0.5 0.5-0.5 0 0 15.499-0 15.5 0v7c0 0.552 0.448 1 1 1h7v19.5z"></path>
<path d="M23 26h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
<path d="M23 22h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
<path d="M23 18h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
</symbol>
</defs>
</svg>
<style>/* CSS stylesheet for displaying xarray objects in jupyterlab.
 *
 */

:root {
  --xr-font-color0: var(--jp-content-font-color0, rgba(0, 0, 0, 1));
  --xr-font-color2: var(--jp-content-font-color2, rgba(0, 0, 0, 0.54));
  --xr-font-color3: var(--jp-content-font-color3, rgba(0, 0, 0, 0.38));
  --xr-border-color: var(--jp-border-color2, #e0e0e0);
  --xr-disabled-color: var(--jp-layout-color3, #bdbdbd);
  --xr-background-color: var(--jp-layout-color0, white);
  --xr-background-color-row-even: var(--jp-layout-color1, white);
  --xr-background-color-row-odd: var(--jp-layout-color2, #eeeeee);
}

html[theme=dark],
body[data-theme=dark],
body.vscode-dark {
  --xr-font-color0: rgba(255, 255, 255, 1);
  --xr-font-color2: rgba(255, 255, 255, 0.54);
  --xr-font-color3: rgba(255, 255, 255, 0.38);
  --xr-border-color: #1F1F1F;
  --xr-disabled-color: #515151;
  --xr-background-color: #111111;
  --xr-background-color-row-even: #111111;
  --xr-background-color-row-odd: #313131;
}

.xr-wrap {
  display: block !important;
  min-width: 300px;
  max-width: 700px;
}

.xr-text-repr-fallback {
  /* fallback to plain text repr when CSS is not injected (untrusted notebook) */
  display: none;
}

.xr-header {
  padding-top: 6px;
  padding-bottom: 6px;
  margin-bottom: 4px;
  border-bottom: solid 1px var(--xr-border-color);
}

.xr-header > div,
.xr-header > ul {
  display: inline;
  margin-top: 0;
  margin-bottom: 0;
}

.xr-obj-type,
.xr-array-name {
  margin-left: 2px;
  margin-right: 10px;
}

.xr-obj-type {
  color: var(--xr-font-color2);
}

.xr-sections {
  padding-left: 0 !important;
  display: grid;
  grid-template-columns: 150px auto auto 1fr 20px 20px;
}

.xr-section-item {
  display: contents;
}

.xr-section-item input {
  display: none;
}

.xr-section-item input + label {
  color: var(--xr-disabled-color);
}

.xr-section-item input:enabled + label {
  cursor: pointer;
  color: var(--xr-font-color2);
}

.xr-section-item input:enabled + label:hover {
  color: var(--xr-font-color0);
}

.xr-section-summary {
  grid-column: 1;
  color: var(--xr-font-color2);
  font-weight: 500;
}

.xr-section-summary > span {
  display: inline-block;
  padding-left: 0.5em;
}

.xr-section-summary-in:disabled + label {
  color: var(--xr-font-color2);
}

.xr-section-summary-in + label:before {
  display: inline-block;
  content: '►';
  font-size: 11px;
  width: 15px;
  text-align: center;
}

.xr-section-summary-in:disabled + label:before {
  color: var(--xr-disabled-color);
}

.xr-section-summary-in:checked + label:before {
  content: '▼';
}

.xr-section-summary-in:checked + label > span {
  display: none;
}

.xr-section-summary,
.xr-section-inline-details {
  padding-top: 4px;
  padding-bottom: 4px;
}

.xr-section-inline-details {
  grid-column: 2 / -1;
}

.xr-section-details {
  display: none;
  grid-column: 1 / -1;
  margin-bottom: 5px;
}

.xr-section-summary-in:checked ~ .xr-section-details {
  display: contents;
}

.xr-array-wrap {
  grid-column: 1 / -1;
  display: grid;
  grid-template-columns: 20px auto;
}

.xr-array-wrap > label {
  grid-column: 1;
  vertical-align: top;
}

.xr-preview {
  color: var(--xr-font-color3);
}

.xr-array-preview,
.xr-array-data {
  padding: 0 5px !important;
  grid-column: 2;
}

.xr-array-data,
.xr-array-in:checked ~ .xr-array-preview {
  display: none;
}

.xr-array-in:checked ~ .xr-array-data,
.xr-array-preview {
  display: inline-block;
}

.xr-dim-list {
  display: inline-block !important;
  list-style: none;
  padding: 0 !important;
  margin: 0;
}

.xr-dim-list li {
  display: inline-block;
  padding: 0;
  margin: 0;
}

.xr-dim-list:before {
  content: '(';
}

.xr-dim-list:after {
  content: ')';
}

.xr-dim-list li:not(:last-child):after {
  content: ',';
  padding-right: 5px;
}

.xr-has-index {
  font-weight: bold;
}

.xr-var-list,
.xr-var-item {
  display: contents;
}

.xr-var-item > div,
.xr-var-item label,
.xr-var-item > .xr-var-name span {
  background-color: var(--xr-background-color-row-even);
  margin-bottom: 0;
}

.xr-var-item > .xr-var-name:hover span {
  padding-right: 5px;
}

.xr-var-list > li:nth-child(odd) > div,
.xr-var-list > li:nth-child(odd) > label,
.xr-var-list > li:nth-child(odd) > .xr-var-name span {
  background-color: var(--xr-background-color-row-odd);
}

.xr-var-name {
  grid-column: 1;
}

.xr-var-dims {
  grid-column: 2;
}

.xr-var-dtype {
  grid-column: 3;
  text-align: right;
  color: var(--xr-font-color2);
}

.xr-var-preview {
  grid-column: 4;
}

.xr-index-preview {
  grid-column: 2 / 5;
  color: var(--xr-font-color2);
}

.xr-var-name,
.xr-var-dims,
.xr-var-dtype,
.xr-preview,
.xr-attrs dt {
  white-space: nowrap;
  overflow: hidden;
  text-overflow: ellipsis;
  padding-right: 10px;
}

.xr-var-name:hover,
.xr-var-dims:hover,
.xr-var-dtype:hover,
.xr-attrs dt:hover {
  overflow: visible;
  width: auto;
  z-index: 1;
}

.xr-var-attrs,
.xr-var-data,
.xr-index-data {
  display: none;
  background-color: var(--xr-background-color) !important;
  padding-bottom: 5px !important;
}

.xr-var-attrs-in:checked ~ .xr-var-attrs,
.xr-var-data-in:checked ~ .xr-var-data,
.xr-index-data-in:checked ~ .xr-index-data {
  display: block;
}

.xr-var-data > table {
  float: right;
}

.xr-var-name span,
.xr-var-data,
.xr-index-name div,
.xr-index-data,
.xr-attrs {
  padding-left: 25px !important;
}

.xr-attrs,
.xr-var-attrs,
.xr-var-data,
.xr-index-data {
  grid-column: 1 / -1;
}

dl.xr-attrs {
  padding: 0;
  margin: 0;
  display: grid;
  grid-template-columns: 125px auto;
}

.xr-attrs dt,
.xr-attrs dd {
  padding: 0;
  margin: 0;
  float: left;
  padding-right: 10px;
  width: auto;
}

.xr-attrs dt {
  font-weight: normal;
  grid-column: 1;
}

.xr-attrs dt:hover span {
  display: inline-block;
  background: var(--xr-background-color);
  padding-right: 10px;
}

.xr-attrs dd {
  grid-column: 2;
  white-space: pre-wrap;
  word-break: break-all;
}

.xr-icon-database,
.xr-icon-file-text2,
.xr-no-icon {
  display: inline-block;
  vertical-align: middle;
  width: 1em;
  height: 1.5em !important;
  stroke-width: 0;
  stroke: currentColor;
  fill: currentColor;
}
</style><pre class="xr-text-repr-fallback">&lt;xarray.DataArray (band: 1, y: 7761, x: 7641)&gt;
[59301801 values with dtype=uint16]
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 6.021e+05 6.021e+05 ... 8.313e+05 8.313e+05
  * y            (y) float64 5.059e+06 5.059e+06 ... 4.826e+06 4.826e+06
    spatial_ref  int64 0
Attributes:
    AREA_OR_POINT:  Point
    _FillValue:     0
    scale_factor:   1.0
    add_offset:     0.0</pre><div class="xr-wrap" style="display:none"><div class="xr-header"><div class="xr-obj-type">xarray.DataArray</div><div class="xr-array-name"></div><ul class="xr-dim-list"><li><span class="xr-has-index">band</span>: 1</li><li><span class="xr-has-index">y</span>: 7761</li><li><span class="xr-has-index">x</span>: 7641</li></ul></div><ul class="xr-sections"><li class="xr-section-item"><div class="xr-array-wrap"><input id="section-a7a7315e-40a3-4af4-9599-671d1a1a20ee" class="xr-array-in" type="checkbox" checked=""><label for="section-a7a7315e-40a3-4af4-9599-671d1a1a20ee" title="Show/hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-array-preview xr-preview"><span>...</span></div><div class="xr-array-data"><pre>[59301801 values with dtype=uint16]</pre></div></div></li><li class="xr-section-item"><input id="section-f25565a3-f3fa-4c24-bc86-7ba638d74dc1" class="xr-section-summary-in" type="checkbox" checked=""><label for="section-f25565a3-f3fa-4c24-bc86-7ba638d74dc1" class="xr-section-summary">Coordinates: <span>(4)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><ul class="xr-var-list"><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">band</span></div><div class="xr-var-dims">(band)</div><div class="xr-var-dtype">int64</div><div class="xr-var-preview xr-preview">1</div><input id="attrs-7ac7412a-4b2a-46a3-aa07-08da5a75f22e" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-7ac7412a-4b2a-46a3-aa07-08da5a75f22e" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-4ea6d8e0-5891-4ee2-89ae-f26ec69bf17a" class="xr-var-data-in" type="checkbox"><label for="data-4ea6d8e0-5891-4ee2-89ae-f26ec69bf17a" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array([1])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">x</span></div><div class="xr-var-dims">(x)</div><div class="xr-var-dtype">float64</div><div class="xr-var-preview xr-preview">6.021e+05 6.021e+05 ... 8.313e+05</div><input id="attrs-c30fa42c-c91b-417b-80da-393bc43919f9" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-c30fa42c-c91b-417b-80da-393bc43919f9" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-c913ac06-429c-4837-8a9c-c0f2a2202a9b" class="xr-var-data-in" type="checkbox"><label for="data-c913ac06-429c-4837-8a9c-c0f2a2202a9b" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array([602100., 602130., 602160., ..., 831240., 831270., 831300.])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">y</span></div><div class="xr-var-dims">(y)</div><div class="xr-var-dtype">float64</div><div class="xr-var-preview xr-preview">5.059e+06 5.059e+06 ... 4.826e+06</div><input id="attrs-88c1bac9-3b50-4466-8cd1-28fc02140218" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-88c1bac9-3b50-4466-8cd1-28fc02140218" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-d0fc14ac-4b15-4843-959e-820e1174d6db" class="xr-var-data-in" type="checkbox"><label for="data-d0fc14ac-4b15-4843-959e-820e1174d6db" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array([5058900., 5058870., 5058840., ..., 4826160., 4826130., 4826100.])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span>spatial_ref</span></div><div class="xr-var-dims">()</div><div class="xr-var-dtype">int64</div><div class="xr-var-preview xr-preview">0</div><input id="attrs-92756a39-d172-4cb6-9217-0c79ee535505" class="xr-var-attrs-in" type="checkbox"><label for="attrs-92756a39-d172-4cb6-9217-0c79ee535505" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-c7888b70-c804-49a5-aec7-9fd83f06bb5e" class="xr-var-data-in" type="checkbox"><label for="data-c7888b70-c804-49a5-aec7-9fd83f06bb5e" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>crs_wkt :</span></dt><dd>PROJCS["WGS 84 / UTM zone 16N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-87],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32616"]]</dd><dt><span>semi_major_axis :</span></dt><dd>6378137.0</dd><dt><span>semi_minor_axis :</span></dt><dd>6356752.314245179</dd><dt><span>inverse_flattening :</span></dt><dd>298.257223563</dd><dt><span>reference_ellipsoid_name :</span></dt><dd>WGS 84</dd><dt><span>longitude_of_prime_meridian :</span></dt><dd>0.0</dd><dt><span>prime_meridian_name :</span></dt><dd>Greenwich</dd><dt><span>geographic_crs_name :</span></dt><dd>WGS 84</dd><dt><span>horizontal_datum_name :</span></dt><dd>World Geodetic System 1984</dd><dt><span>projected_crs_name :</span></dt><dd>WGS 84 / UTM zone 16N</dd><dt><span>grid_mapping_name :</span></dt><dd>transverse_mercator</dd><dt><span>latitude_of_projection_origin :</span></dt><dd>0.0</dd><dt><span>longitude_of_central_meridian :</span></dt><dd>-87.0</dd><dt><span>false_easting :</span></dt><dd>500000.0</dd><dt><span>false_northing :</span></dt><dd>0.0</dd><dt><span>scale_factor_at_central_meridian :</span></dt><dd>0.9996</dd><dt><span>spatial_ref :</span></dt><dd>PROJCS["WGS 84 / UTM zone 16N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-87],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32616"]]</dd><dt><span>GeoTransform :</span></dt><dd>602085.0 30.0 0.0 5058915.0 0.0 -30.0</dd></dl></div><div class="xr-var-data"><pre>array(0)</pre></div></li></ul></div></li><li class="xr-section-item"><input id="section-1cd17c3d-b777-4ea2-85cd-67617ce657d2" class="xr-section-summary-in" type="checkbox"><label for="section-1cd17c3d-b777-4ea2-85cd-67617ce657d2" class="xr-section-summary">Indexes: <span>(3)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><ul class="xr-var-list"><li class="xr-var-item"><div class="xr-index-name"><div>band</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-f66727cf-215c-4e11-a689-59555c16c66e" class="xr-index-data-in" type="checkbox"><label for="index-f66727cf-215c-4e11-a689-59555c16c66e" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Int64Index([1], dtype='int64', name='band'))</pre></div></li><li class="xr-var-item"><div class="xr-index-name"><div>x</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-3a877350-66da-4f58-9053-9514206c21fb" class="xr-index-data-in" type="checkbox"><label for="index-3a877350-66da-4f58-9053-9514206c21fb" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Float64Index([602100.0, 602130.0, 602160.0, 602190.0, 602220.0, 602250.0,
              602280.0, 602310.0, 602340.0, 602370.0,
              ...
              831030.0, 831060.0, 831090.0, 831120.0, 831150.0, 831180.0,
              831210.0, 831240.0, 831270.0, 831300.0],
             dtype='float64', name='x', length=7641))</pre></div></li><li class="xr-var-item"><div class="xr-index-name"><div>y</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-fb84ab0f-2d1b-4675-8de5-553f2e5de072" class="xr-index-data-in" type="checkbox"><label for="index-fb84ab0f-2d1b-4675-8de5-553f2e5de072" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Float64Index([5058900.0, 5058870.0, 5058840.0, 5058810.0, 5058780.0, 5058750.0,
              5058720.0, 5058690.0, 5058660.0, 5058630.0,
              ...
              4826370.0, 4826340.0, 4826310.0, 4826280.0, 4826250.0, 4826220.0,
              4826190.0, 4826160.0, 4826130.0, 4826100.0],
             dtype='float64', name='y', length=7761))</pre></div></li></ul></div></li><li class="xr-section-item"><input id="section-2af89791-3c20-4ae5-90df-3729333e3d0d" class="xr-section-summary-in" type="checkbox" checked=""><label for="section-2af89791-3c20-4ae5-90df-3729333e3d0d" class="xr-section-summary">Attributes: <span>(4)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><dl class="xr-attrs"><dt><span>AREA_OR_POINT :</span></dt><dd>Point</dd><dt><span>_FillValue :</span></dt><dd>0</dd><dt><span>scale_factor :</span></dt><dd>1.0</dd><dt><span>add_offset :</span></dt><dd>0.0</dd></dl></div></li></ul></div></div>
</div>
</div>
<div class="cell" data-execution_count="37">
<div class="sourceCode cell-code" id="cb54" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb54-1">ds[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4000</span>:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4005</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4000</span>:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4005</span>].values</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="37">
<pre><code>array([[8521, 8506, 8422, 8492, 8494],
       [8470, 8602, 8435, 8549, 8613],
       [8554, 8512, 8454, 8454, 8633],
       [8506, 8521, 8315, 8299, 8424],
       [8612, 8578, 8402, 8225, 8215]], dtype=uint16)</code></pre>
</div>
</div>
<p>Yep, values aren’t between 0-255.</p>
<p>I found that <code>rioxarray.open_rasterio</code> has the following input parameter:</p>
<ul>
<li><strong>band_as_variable (bool, default=False)</strong> – If True, will load bands in a raster to separate variables.</li>
</ul>
<p>Hmm, not sure that will work as we don’t have a single raster file with all the bands. Instead we can load the red, green and blue bands separately, rescale them and then merge them.</p>
<p>Rescaling a vector of values to a new min and max is pretty easy.</p>
<pre><code>rescaled_value = new_min + (new_max - new_min) * (value - value.min) / (value.max - value.min)</code></pre>
<p>Since our min is 0 and max is 255, this simplifies to:</p>
<pre><code>rescaled_value = 255 * (value - value.min) / (value.max - value.min)</code></pre>
<p>There was a little numpy trickiness (see last line of code below).</p>
<div class="cell" data-tags="[]" data-execution_count="38">
<div class="sourceCode cell-code" id="cb58" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb58-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'bbox</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>bbox<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb58-2">minx, miny, maxx, maxy <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bbox</span>
<span id="cb58-3"></span>
<span id="cb58-4">bands <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {}</span>
<span id="cb58-5">bands_rescaled <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {}</span>
<span id="cb58-6"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> band <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'red'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'green'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'blue'</span>]:</span>
<span id="cb58-7">    bands[band] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rioxarray.open_rasterio(best_item_landsat.item_obj.assets[band].href).rio.clip_box(</span>
<span id="cb58-8">            minx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>minx,</span>
<span id="cb58-9">            miny<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>miny,</span>
<span id="cb58-10">            maxx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>maxx,</span>
<span id="cb58-11">            maxy<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>maxy,</span>
<span id="cb58-12">            crs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"EPSG:4326"</span>,</span>
<span id="cb58-13">    )</span>
<span id="cb58-14">    </span>
<span id="cb58-15">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Rescale to 0-255</span></span>
<span id="cb58-16">    min_of_band <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bands[band].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>(dim<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'x'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'y'</span>]).values</span>
<span id="cb58-17">    max_of_band <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bands[band].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>(dim<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'x'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'y'</span>]).values</span>
<span id="cb58-18">    range_of_band <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> max_of_band <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> min_of_band</span>
<span id="cb58-19">    bands_rescaled[band] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bands[band].copy()</span>
<span id="cb58-20">    bands_rescaled[band].values <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.around(np.array([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">255</span>]) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (bands[band].values <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> min_of_band) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> range_of_band).astype(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'int'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>bbox
[-83.34092260849519, 45.01268150971338, -83.2647731371141, 45.06667109107233]
</code></pre>
</div>
</div>
<div class="cell" data-execution_count="39">
<div class="sourceCode cell-code" id="cb60" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb60-1">bands_rescaled[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'red'</span>].values</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="39">
<pre><code>array([[[34, 34, 35, ...,  9, 10,  9],
        [33, 35, 36, ...,  9, 10, 10],
        [33, 35, 35, ..., 10, 10, 10],
        ...,
        [29, 33, 33, ..., 37, 38, 37],
        [28, 32, 31, ..., 37, 36, 37],
        [30, 32, 31, ..., 37, 36, 36]]])</code></pre>
</div>
</div>
<p>Now let’s combine the red, green, and blue bands to create the equivalent of the Sentinel-2 visual band. Now, we don’t really <em>need</em> to do this for purposes of developing features for machine learning models, but it seems like a useful thing to know how to do and will let us directly compare the Landsat and Sentinel-2 images. The major steps we used are:</p>
<ul>
<li>create a list containing three <code>DataArray</code>s corresponding to the rescaled red, green and blue bands</li>
<li>before adding each array to the list, add a band identifier as a coordinate</li>
<li>combine the three <code>DataArray</code>s using <code>xarray.combine</code></li>
</ul>
<div class="cell" data-execution_count="40">
<div class="sourceCode cell-code" id="cb62" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb62-1"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> band <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'red'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'green'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'blue'</span>]:</span>
<span id="cb62-2">    band_name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.array([band])</span>
<span id="cb62-3">    bands_rescaled[band] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bands_rescaled[band].assign_coords(band_name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"band"</span>, band_name))</span>
<span id="cb62-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># keep the unscaled version around for comparison</span></span>
<span id="cb62-5">    bands[band] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bands[band].assign_coords(band_name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"band"</span>, band_name))</span>
<span id="cb62-6">    </span>
<span id="cb62-7">bands_list <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [val <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> key, val <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> bands_rescaled.items()]</span>
<span id="cb62-8">bands_list</span>
<span id="cb62-9"></span>
<span id="cb62-10">bands_list_unscaled <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [val <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> key, val <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> bands.items()]</span></code></pre></div>
</div>
<p>Combine the arrays into a single <code>DataArray</code>.</p>
<div class="cell" data-execution_count="41">
<div class="sourceCode cell-code" id="cb63" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb63-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> xarray <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> xr</span></code></pre></div>
</div>
<div class="cell" data-execution_count="42">
<div class="sourceCode cell-code" id="cb64" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb64-1">bands_combined <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> xr.concat(bands_list, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'band'</span>)</span>
<span id="cb64-2">bands_combined_unscaled <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> xr.concat(bands_list_unscaled, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'band'</span>)</span></code></pre></div>
</div>
<p>Now we’ve got a single array we can crop and we have already normalized it.</p>
<div class="cell" data-execution_count="43">
<div class="sourceCode cell-code" id="cb65" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb65-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> crop_landsat_image(combined_xr, bounding_box):</span>
<span id="cb65-2">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb65-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Given a STAC item from Landsat and a bounding box tuple in the format</span></span>
<span id="cb65-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    (minx, miny, maxx, maxy), return a cropped portion of the item's visual</span></span>
<span id="cb65-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    imagery in the bounding box.</span></span>
<span id="cb65-6"></span>
<span id="cb65-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Returns the image as a numpy array with dimensions (color band, height, width)</span></span>
<span id="cb65-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    """</span></span>
<span id="cb65-9">    (minx, miny, maxx, maxy) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bounding_box</span>
<span id="cb65-10"></span>
<span id="cb65-11">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#image = odc.stac.stac_load(</span></span>
<span id="cb65-12">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#    [pc.sign(item)], bands=["red", "green", "blue"], bbox=[minx, miny, maxx, maxy]</span></span>
<span id="cb65-13">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#).isel(time=0)</span></span>
<span id="cb65-14">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#image_array = image[["red", "green", "blue"]].to_array().to_numpy()</span></span>
<span id="cb65-15">    </span>
<span id="cb65-16">    cropped_image <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> combined_xr.rio.clip_box(</span>
<span id="cb65-17">        minx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>minx,</span>
<span id="cb65-18">        miny<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>miny,</span>
<span id="cb65-19">        maxx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>maxx,</span>
<span id="cb65-20">        maxy<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>maxy,</span>
<span id="cb65-21">        crs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"EPSG:4326"</span>,</span>
<span id="cb65-22">    )</span>
<span id="cb65-23"></span>
<span id="cb65-24">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> cropped_image</span>
<span id="cb65-25"></span></code></pre></div>
</div>
<div class="cell" data-execution_count="44">
<div class="sourceCode cell-code" id="cb66" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb66-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># we'll use the same cropped area as above</span></span>
<span id="cb66-2">landsat_image_array <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crop_landsat_image(bands_combined, bbox)</span>
<span id="cb66-3">landsat_image_array_unscaled <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crop_landsat_image(bands_combined_unscaled, bbox)</span>
<span id="cb66-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Show the red band</span></span>
<span id="cb66-5">landsat_image_array[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="44">

<div><svg style="position: absolute; width: 0; height: 0; overflow: hidden">
<defs>
<symbol id="icon-database" viewbox="0 0 32 32">
<path d="M16 0c-8.837 0-16 2.239-16 5v4c0 2.761 7.163 5 16 5s16-2.239 16-5v-4c0-2.761-7.163-5-16-5z"></path>
<path d="M16 17c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z"></path>
<path d="M16 26c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z"></path>
</symbol>
<symbol id="icon-file-text2" viewbox="0 0 32 32">
<path d="M28.681 7.159c-0.694-0.947-1.662-2.053-2.724-3.116s-2.169-2.030-3.116-2.724c-1.612-1.182-2.393-1.319-2.841-1.319h-15.5c-1.378 0-2.5 1.121-2.5 2.5v27c0 1.378 1.122 2.5 2.5 2.5h23c1.378 0 2.5-1.122 2.5-2.5v-19.5c0-0.448-0.137-1.23-1.319-2.841zM24.543 5.457c0.959 0.959 1.712 1.825 2.268 2.543h-4.811v-4.811c0.718 0.556 1.584 1.309 2.543 2.268zM28 29.5c0 0.271-0.229 0.5-0.5 0.5h-23c-0.271 0-0.5-0.229-0.5-0.5v-27c0-0.271 0.229-0.5 0.5-0.5 0 0 15.499-0 15.5 0v7c0 0.552 0.448 1 1 1h7v19.5z"></path>
<path d="M23 26h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
<path d="M23 22h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
<path d="M23 18h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
</symbol>
</defs>
</svg>
<style>/* CSS stylesheet for displaying xarray objects in jupyterlab.
 *
 */

:root {
  --xr-font-color0: var(--jp-content-font-color0, rgba(0, 0, 0, 1));
  --xr-font-color2: var(--jp-content-font-color2, rgba(0, 0, 0, 0.54));
  --xr-font-color3: var(--jp-content-font-color3, rgba(0, 0, 0, 0.38));
  --xr-border-color: var(--jp-border-color2, #e0e0e0);
  --xr-disabled-color: var(--jp-layout-color3, #bdbdbd);
  --xr-background-color: var(--jp-layout-color0, white);
  --xr-background-color-row-even: var(--jp-layout-color1, white);
  --xr-background-color-row-odd: var(--jp-layout-color2, #eeeeee);
}

html[theme=dark],
body[data-theme=dark],
body.vscode-dark {
  --xr-font-color0: rgba(255, 255, 255, 1);
  --xr-font-color2: rgba(255, 255, 255, 0.54);
  --xr-font-color3: rgba(255, 255, 255, 0.38);
  --xr-border-color: #1F1F1F;
  --xr-disabled-color: #515151;
  --xr-background-color: #111111;
  --xr-background-color-row-even: #111111;
  --xr-background-color-row-odd: #313131;
}

.xr-wrap {
  display: block !important;
  min-width: 300px;
  max-width: 700px;
}

.xr-text-repr-fallback {
  /* fallback to plain text repr when CSS is not injected (untrusted notebook) */
  display: none;
}

.xr-header {
  padding-top: 6px;
  padding-bottom: 6px;
  margin-bottom: 4px;
  border-bottom: solid 1px var(--xr-border-color);
}

.xr-header > div,
.xr-header > ul {
  display: inline;
  margin-top: 0;
  margin-bottom: 0;
}

.xr-obj-type,
.xr-array-name {
  margin-left: 2px;
  margin-right: 10px;
}

.xr-obj-type {
  color: var(--xr-font-color2);
}

.xr-sections {
  padding-left: 0 !important;
  display: grid;
  grid-template-columns: 150px auto auto 1fr 20px 20px;
}

.xr-section-item {
  display: contents;
}

.xr-section-item input {
  display: none;
}

.xr-section-item input + label {
  color: var(--xr-disabled-color);
}

.xr-section-item input:enabled + label {
  cursor: pointer;
  color: var(--xr-font-color2);
}

.xr-section-item input:enabled + label:hover {
  color: var(--xr-font-color0);
}

.xr-section-summary {
  grid-column: 1;
  color: var(--xr-font-color2);
  font-weight: 500;
}

.xr-section-summary > span {
  display: inline-block;
  padding-left: 0.5em;
}

.xr-section-summary-in:disabled + label {
  color: var(--xr-font-color2);
}

.xr-section-summary-in + label:before {
  display: inline-block;
  content: '►';
  font-size: 11px;
  width: 15px;
  text-align: center;
}

.xr-section-summary-in:disabled + label:before {
  color: var(--xr-disabled-color);
}

.xr-section-summary-in:checked + label:before {
  content: '▼';
}

.xr-section-summary-in:checked + label > span {
  display: none;
}

.xr-section-summary,
.xr-section-inline-details {
  padding-top: 4px;
  padding-bottom: 4px;
}

.xr-section-inline-details {
  grid-column: 2 / -1;
}

.xr-section-details {
  display: none;
  grid-column: 1 / -1;
  margin-bottom: 5px;
}

.xr-section-summary-in:checked ~ .xr-section-details {
  display: contents;
}

.xr-array-wrap {
  grid-column: 1 / -1;
  display: grid;
  grid-template-columns: 20px auto;
}

.xr-array-wrap > label {
  grid-column: 1;
  vertical-align: top;
}

.xr-preview {
  color: var(--xr-font-color3);
}

.xr-array-preview,
.xr-array-data {
  padding: 0 5px !important;
  grid-column: 2;
}

.xr-array-data,
.xr-array-in:checked ~ .xr-array-preview {
  display: none;
}

.xr-array-in:checked ~ .xr-array-data,
.xr-array-preview {
  display: inline-block;
}

.xr-dim-list {
  display: inline-block !important;
  list-style: none;
  padding: 0 !important;
  margin: 0;
}

.xr-dim-list li {
  display: inline-block;
  padding: 0;
  margin: 0;
}

.xr-dim-list:before {
  content: '(';
}

.xr-dim-list:after {
  content: ')';
}

.xr-dim-list li:not(:last-child):after {
  content: ',';
  padding-right: 5px;
}

.xr-has-index {
  font-weight: bold;
}

.xr-var-list,
.xr-var-item {
  display: contents;
}

.xr-var-item > div,
.xr-var-item label,
.xr-var-item > .xr-var-name span {
  background-color: var(--xr-background-color-row-even);
  margin-bottom: 0;
}

.xr-var-item > .xr-var-name:hover span {
  padding-right: 5px;
}

.xr-var-list > li:nth-child(odd) > div,
.xr-var-list > li:nth-child(odd) > label,
.xr-var-list > li:nth-child(odd) > .xr-var-name span {
  background-color: var(--xr-background-color-row-odd);
}

.xr-var-name {
  grid-column: 1;
}

.xr-var-dims {
  grid-column: 2;
}

.xr-var-dtype {
  grid-column: 3;
  text-align: right;
  color: var(--xr-font-color2);
}

.xr-var-preview {
  grid-column: 4;
}

.xr-index-preview {
  grid-column: 2 / 5;
  color: var(--xr-font-color2);
}

.xr-var-name,
.xr-var-dims,
.xr-var-dtype,
.xr-preview,
.xr-attrs dt {
  white-space: nowrap;
  overflow: hidden;
  text-overflow: ellipsis;
  padding-right: 10px;
}

.xr-var-name:hover,
.xr-var-dims:hover,
.xr-var-dtype:hover,
.xr-attrs dt:hover {
  overflow: visible;
  width: auto;
  z-index: 1;
}

.xr-var-attrs,
.xr-var-data,
.xr-index-data {
  display: none;
  background-color: var(--xr-background-color) !important;
  padding-bottom: 5px !important;
}

.xr-var-attrs-in:checked ~ .xr-var-attrs,
.xr-var-data-in:checked ~ .xr-var-data,
.xr-index-data-in:checked ~ .xr-index-data {
  display: block;
}

.xr-var-data > table {
  float: right;
}

.xr-var-name span,
.xr-var-data,
.xr-index-name div,
.xr-index-data,
.xr-attrs {
  padding-left: 25px !important;
}

.xr-attrs,
.xr-var-attrs,
.xr-var-data,
.xr-index-data {
  grid-column: 1 / -1;
}

dl.xr-attrs {
  padding: 0;
  margin: 0;
  display: grid;
  grid-template-columns: 125px auto;
}

.xr-attrs dt,
.xr-attrs dd {
  padding: 0;
  margin: 0;
  float: left;
  padding-right: 10px;
  width: auto;
}

.xr-attrs dt {
  font-weight: normal;
  grid-column: 1;
}

.xr-attrs dt:hover span {
  display: inline-block;
  background: var(--xr-background-color);
  padding-right: 10px;
}

.xr-attrs dd {
  grid-column: 2;
  white-space: pre-wrap;
  word-break: break-all;
}

.xr-icon-database,
.xr-icon-file-text2,
.xr-no-icon {
  display: inline-block;
  vertical-align: middle;
  width: 1em;
  height: 1.5em !important;
  stroke-width: 0;
  stroke: currentColor;
  fill: currentColor;
}
</style><pre class="xr-text-repr-fallback">&lt;xarray.DataArray (y: 210, x: 210)&gt;
array([[34, 34, 35, ...,  9, 10,  9],
       [33, 35, 36, ...,  9, 10, 10],
       [33, 35, 35, ..., 10, 10, 10],
       ...,
       [29, 33, 33, ..., 37, 38, 37],
       [28, 32, 31, ..., 37, 36, 37],
       [30, 32, 31, ..., 37, 36, 36]])
Coordinates:
    band         int64 1
  * x            (x) float64 7.881e+05 7.881e+05 ... 7.943e+05 7.943e+05
  * y            (y) float64 4.997e+06 4.997e+06 ... 4.991e+06 4.991e+06
    band_name    &lt;U5 'red'
    spatial_ref  int64 0
Attributes:
    AREA_OR_POINT:  Point
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     0</pre><div class="xr-wrap" style="display:none"><div class="xr-header"><div class="xr-obj-type">xarray.DataArray</div><div class="xr-array-name"></div><ul class="xr-dim-list"><li><span class="xr-has-index">y</span>: 210</li><li><span class="xr-has-index">x</span>: 210</li></ul></div><ul class="xr-sections"><li class="xr-section-item"><div class="xr-array-wrap"><input id="section-5064dce0-e045-471b-97fb-1bef3be2b2ff" class="xr-array-in" type="checkbox" checked=""><label for="section-5064dce0-e045-471b-97fb-1bef3be2b2ff" title="Show/hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-array-preview xr-preview"><span>34 34 35 38 40 40 40 37 34 34 35 ... 36 37 36 38 37 37 36 37 37 36 36</span></div><div class="xr-array-data"><pre>array([[34, 34, 35, ...,  9, 10,  9],
       [33, 35, 36, ...,  9, 10, 10],
       [33, 35, 35, ..., 10, 10, 10],
       ...,
       [29, 33, 33, ..., 37, 38, 37],
       [28, 32, 31, ..., 37, 36, 37],
       [30, 32, 31, ..., 37, 36, 36]])</pre></div></div></li><li class="xr-section-item"><input id="section-2264442c-4465-422e-bf4e-773b85bdf88f" class="xr-section-summary-in" type="checkbox" checked=""><label for="section-2264442c-4465-422e-bf4e-773b85bdf88f" class="xr-section-summary">Coordinates: <span>(5)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><ul class="xr-var-list"><li class="xr-var-item"><div class="xr-var-name"><span>band</span></div><div class="xr-var-dims">()</div><div class="xr-var-dtype">int64</div><div class="xr-var-preview xr-preview">1</div><input id="attrs-1166be8b-3d10-4ae6-9aee-ea79aafc5529" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-1166be8b-3d10-4ae6-9aee-ea79aafc5529" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-6c9d3173-5d7b-40b5-8721-d3379fcd9bd1" class="xr-var-data-in" type="checkbox"><label for="data-6c9d3173-5d7b-40b5-8721-d3379fcd9bd1" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array(1)</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">x</span></div><div class="xr-var-dims">(x)</div><div class="xr-var-dtype">float64</div><div class="xr-var-preview xr-preview">7.881e+05 7.881e+05 ... 7.943e+05</div><input id="attrs-4b2c3366-2952-425d-8410-a30351d99730" class="xr-var-attrs-in" type="checkbox"><label for="attrs-4b2c3366-2952-425d-8410-a30351d99730" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-0276abdd-d405-4916-ba9e-4cf82d43f49d" class="xr-var-data-in" type="checkbox"><label for="data-0276abdd-d405-4916-ba9e-4cf82d43f49d" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>axis :</span></dt><dd>X</dd><dt><span>long_name :</span></dt><dd>x coordinate of projection</dd><dt><span>standard_name :</span></dt><dd>projection_x_coordinate</dd><dt><span>units :</span></dt><dd>metre</dd></dl></div><div class="xr-var-data"><pre>array([788070., 788100., 788130., ..., 794280., 794310., 794340.])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">y</span></div><div class="xr-var-dims">(y)</div><div class="xr-var-dtype">float64</div><div class="xr-var-preview xr-preview">4.997e+06 4.997e+06 ... 4.991e+06</div><input id="attrs-d7ae30af-7f2d-4181-a68b-1749d4785f74" class="xr-var-attrs-in" type="checkbox"><label for="attrs-d7ae30af-7f2d-4181-a68b-1749d4785f74" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-699bb393-aa61-458c-bdfb-1a60960e2d82" class="xr-var-data-in" type="checkbox"><label for="data-699bb393-aa61-458c-bdfb-1a60960e2d82" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>axis :</span></dt><dd>Y</dd><dt><span>long_name :</span></dt><dd>y coordinate of projection</dd><dt><span>standard_name :</span></dt><dd>projection_y_coordinate</dd><dt><span>units :</span></dt><dd>metre</dd></dl></div><div class="xr-var-data"><pre>array([4997160., 4997130., 4997100., ..., 4990950., 4990920., 4990890.])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span>band_name</span></div><div class="xr-var-dims">()</div><div class="xr-var-dtype">&lt;U5</div><div class="xr-var-preview xr-preview">'red'</div><input id="attrs-1a01d582-1ea1-4cf8-8dfb-a81f62f3d033" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-1a01d582-1ea1-4cf8-8dfb-a81f62f3d033" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-4da9568e-f277-49b4-a867-753cb35ad4ac" class="xr-var-data-in" type="checkbox"><label for="data-4da9568e-f277-49b4-a867-753cb35ad4ac" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array('red', dtype='&lt;U5')</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span>spatial_ref</span></div><div class="xr-var-dims">()</div><div class="xr-var-dtype">int64</div><div class="xr-var-preview xr-preview">0</div><input id="attrs-bdf1ff91-d6ee-421d-bb90-c83963196a23" class="xr-var-attrs-in" type="checkbox"><label for="attrs-bdf1ff91-d6ee-421d-bb90-c83963196a23" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-e397b3e5-8132-49b2-bcca-5cd495830c19" class="xr-var-data-in" type="checkbox"><label for="data-e397b3e5-8132-49b2-bcca-5cd495830c19" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>crs_wkt :</span></dt><dd>PROJCS["WGS 84 / UTM zone 16N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-87],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32616"]]</dd><dt><span>semi_major_axis :</span></dt><dd>6378137.0</dd><dt><span>semi_minor_axis :</span></dt><dd>6356752.314245179</dd><dt><span>inverse_flattening :</span></dt><dd>298.257223563</dd><dt><span>reference_ellipsoid_name :</span></dt><dd>WGS 84</dd><dt><span>longitude_of_prime_meridian :</span></dt><dd>0.0</dd><dt><span>prime_meridian_name :</span></dt><dd>Greenwich</dd><dt><span>geographic_crs_name :</span></dt><dd>WGS 84</dd><dt><span>horizontal_datum_name :</span></dt><dd>World Geodetic System 1984</dd><dt><span>projected_crs_name :</span></dt><dd>WGS 84 / UTM zone 16N</dd><dt><span>grid_mapping_name :</span></dt><dd>transverse_mercator</dd><dt><span>latitude_of_projection_origin :</span></dt><dd>0.0</dd><dt><span>longitude_of_central_meridian :</span></dt><dd>-87.0</dd><dt><span>false_easting :</span></dt><dd>500000.0</dd><dt><span>false_northing :</span></dt><dd>0.0</dd><dt><span>scale_factor_at_central_meridian :</span></dt><dd>0.9996</dd><dt><span>spatial_ref :</span></dt><dd>PROJCS["WGS 84 / UTM zone 16N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-87],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32616"]]</dd><dt><span>GeoTransform :</span></dt><dd>788055.0 30.0 0.0 4997175.0 0.0 -30.0</dd></dl></div><div class="xr-var-data"><pre>array(0)</pre></div></li></ul></div></li><li class="xr-section-item"><input id="section-6475eab7-e59d-4d19-89b7-a497fb5f5c70" class="xr-section-summary-in" type="checkbox"><label for="section-6475eab7-e59d-4d19-89b7-a497fb5f5c70" class="xr-section-summary">Indexes: <span>(2)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><ul class="xr-var-list"><li class="xr-var-item"><div class="xr-index-name"><div>x</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-939d5038-52d4-4958-aa84-75d09961f9f0" class="xr-index-data-in" type="checkbox"><label for="index-939d5038-52d4-4958-aa84-75d09961f9f0" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Float64Index([788070.0, 788100.0, 788130.0, 788160.0, 788190.0, 788220.0,
              788250.0, 788280.0, 788310.0, 788340.0,
              ...
              794070.0, 794100.0, 794130.0, 794160.0, 794190.0, 794220.0,
              794250.0, 794280.0, 794310.0, 794340.0],
             dtype='float64', name='x', length=210))</pre></div></li><li class="xr-var-item"><div class="xr-index-name"><div>y</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-22a5f913-1b78-46f9-aa18-b1b1d57210f0" class="xr-index-data-in" type="checkbox"><label for="index-22a5f913-1b78-46f9-aa18-b1b1d57210f0" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Float64Index([4997160.0, 4997130.0, 4997100.0, 4997070.0, 4997040.0, 4997010.0,
              4996980.0, 4996950.0, 4996920.0, 4996890.0,
              ...
              4991160.0, 4991130.0, 4991100.0, 4991070.0, 4991040.0, 4991010.0,
              4990980.0, 4990950.0, 4990920.0, 4990890.0],
             dtype='float64', name='y', length=210))</pre></div></li></ul></div></li><li class="xr-section-item"><input id="section-84778894-e687-4977-ba7f-c991cdfa2119" class="xr-section-summary-in" type="checkbox" checked=""><label for="section-84778894-e687-4977-ba7f-c991cdfa2119" class="xr-section-summary">Attributes: <span>(4)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><dl class="xr-attrs"><dt><span>AREA_OR_POINT :</span></dt><dd>Point</dd><dt><span>scale_factor :</span></dt><dd>1.0</dd><dt><span>add_offset :</span></dt><dd>0.0</dd><dt><span>_FillValue :</span></dt><dd>0</dd></dl></div></li></ul></div></div>
</div>
</div>
<div class="cell" data-execution_count="45">
<div class="sourceCode cell-code" id="cb67" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb67-1">plt.imshow(np.transpose(landsat_image_array, axes<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="45">
<pre><code>&lt;matplotlib.image.AxesImage at 0x7ff7dc92d550&gt;</code></pre>
</div>
<div class="cell-output cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/algaebloom-part3/index_files/figure-html/cell-46-output-2.png" class="img-fluid"></p>
</div>
</div>
<p>I should probably revisit trying to get odc-stac and opencv installed in my conda virtual environment, but it was time well spent doing this manually - learned some xarray things that will be useful in the future.</p>
</section>
<section id="odc-stac-update" class="level2">
<h2 class="anchored" data-anchor-id="odc-stac-update">odc-stac update</h2>
<p>I was reading another <a href="https://github.com/microsoft/PlanetaryComputerExamples/blob/main/datasets/landsat-c2/landsat-c2-example.ipynb">official tutorial on reading Landsat data from MPC</a> and I saw how handy it was going to be to have odc-stac installed. And, I realized that my previous issue with installing <a href="https://odc-stac.readthedocs.io/en/latest/">odc-stac</a> was almost certainly caused by some bad choices by me in mixing pip and conda installs. Just added odc-stac to my conda env (see Part 1) with:</p>
<pre><code>conda install -c conda-forge odc-stac</code></pre>
<p>and everything still seems to be working just fine. Now, we could simply use the code (with a few changes) from the original tutorial to load the raster file and crop the image. I added in the normalizing code I wrote earlier in this tutorial. I’m also folding in some code from the tutorial I just mentioned that adds some additional bands to the mix.</p>
<div class="cell" data-execution_count="46">
<div class="sourceCode cell-code" id="cb70" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb70-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> odc.stac</span>
<span id="cb70-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> crop_landsat_image_odcstac(item, bounding_box, normalize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>):</span>
<span id="cb70-3">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb70-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Given a STAC item from Landsat and a bounding box tuple in the format</span></span>
<span id="cb70-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    (minx, miny, maxx, maxy), return a cropped portion of the item's visual</span></span>
<span id="cb70-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    imagery in the bounding box.</span></span>
<span id="cb70-7"></span>
<span id="cb70-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Returns the image as an xarray</span></span>
<span id="cb70-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    """</span></span>
<span id="cb70-10">    (minx, miny, maxx, maxy) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bounding_box</span>
<span id="cb70-11"></span>
<span id="cb70-12">    bands_of_interest <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nir08"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"green"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"qa_pixel"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lwir11"</span>]</span>
<span id="cb70-13">    image <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> odc.stac.stac_load(</span>
<span id="cb70-14">        [pc.sign(item)], bands<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>bands_of_interest, bbox<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[minx, miny, maxx, maxy]</span>
<span id="cb70-15">    ).isel(time<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb70-16">    </span>
<span id="cb70-17">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> normalize:</span>
<span id="cb70-18">        <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> band <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'red'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'green'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'blue'</span>]:</span>
<span id="cb70-19">            min_of_band <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> image[band].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>(dim<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'x'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'y'</span>]).values</span>
<span id="cb70-20">            max_of_band <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> image[band].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>(dim<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'x'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'y'</span>]).values</span>
<span id="cb70-21">            range_of_band <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> max_of_band <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> min_of_band</span>
<span id="cb70-22">            image[band].values <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>  np.around(np.array([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">255</span>]) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (image[band].values <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> min_of_band) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> range_of_band).astype(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'int'</span>)   </span>
<span id="cb70-23"></span>
<span id="cb70-24">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> image</span></code></pre></div>
</div>
<p>To plot the unscaled image, we need to include the <code>robust=True</code> argument, else matplotlib will ignore values outside of 0-255.</p>
<p>See <a href="https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imshow.html#matplotlib.pyplot.imshow">this page in the matplotlib docs</a>.</p>
<div class="cell" data-execution_count="47">
<div class="sourceCode cell-code" id="cb71" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb71-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># we'll use the same cropped area as above</span></span>
<span id="cb71-2">landsat_image_array_odcstac <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crop_landsat_image_odcstac(best_item_landsat.item_obj, bbox, normalize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb71-3">fig, ax <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> plt.subplots(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb71-4">landsat_image_array_odcstac[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"green"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>]].to_array().plot.imshow(robust<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, ax<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>ax)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="47">
<pre><code>&lt;matplotlib.image.AxesImage at 0x7ff7ef318040&gt;</code></pre>
</div>
<div class="cell-output cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/algaebloom-part3/index_files/figure-html/cell-48-output-2.png" class="img-fluid"></p>
</div>
</div>
<p>Here’s the normalized version.</p>
<div class="cell" data-execution_count="48">
<div class="sourceCode cell-code" id="cb73" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb73-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># we'll use the same cropped area as above</span></span>
<span id="cb73-2">landsat_image_array_odcstac <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crop_landsat_image_odcstac(best_item_landsat.item_obj, bbox, normalize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb73-3">fig, ax <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> plt.subplots(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb73-4">landsat_image_array_odcstac[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"green"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>]].to_array().plot.imshow(ax<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>ax)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="48">
<pre><code>&lt;matplotlib.image.AxesImage at 0x7ff7ef2c9820&gt;</code></pre>
</div>
<div class="cell-output cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/algaebloom-part3/index_files/figure-html/cell-49-output-2.png" class="img-fluid"></p>
</div>
</div>
</section>
<section id="ndvi-and-surface-temperature" class="level2">
<h2 class="anchored" data-anchor-id="ndvi-and-surface-temperature">NDVI and Surface Temperature</h2>
<p>Finally, since we added in some additional bands, we can do things such as computing NDVI and displaying that. From the MPC tutorial mentioned above:</p>
<blockquote class="blockquote">
<p>Landsat has several bands, and with them we can go beyond rendering natural color imagery; for example, the following code computes a <a href="https://en.wikipedia.org/wiki/Normalized_difference_vegetation_index">Normalized Difference Vegetation Index (NDVI)</a> using the near-infrared and red bands. Note that we convert the red and near infrared bands to a data type that can contain negative values; if this is not done, negative NDVI values will be incorrectly stored.</p>
</blockquote>
<div class="cell" data-execution_count="49">
<div class="sourceCode cell-code" id="cb75" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb75-1">red <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> landsat_image_array_odcstac[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>].astype(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"float"</span>)</span>
<span id="cb75-2">nir <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> landsat_image_array_odcstac[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nir08"</span>].astype(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"float"</span>)</span>
<span id="cb75-3">ndvi <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (nir <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> red) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (nir <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> red)</span>
<span id="cb75-4"></span>
<span id="cb75-5">fig, ax <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> plt.subplots(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb75-6">ndvi.plot.imshow(ax<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>ax, cmap<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"viridis"</span>)</span>
<span id="cb75-7">ax.set_title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"NDVI, Thunder Bay (MI)"</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/algaebloom-part3/index_files/figure-html/cell-50-output-1.png" class="img-fluid"></p>
</div>
</div>
<p>What if we wanted to add <code>ndvi</code> to the xarray as a new band?</p>
<div class="cell" data-execution_count="50">
<div class="sourceCode cell-code" id="cb76" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb76-1">landsat_image_array_odcstac[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ndvi"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ndvi</span>
<span id="cb76-2">landsat_image_array_odcstac</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="50">

<div><svg style="position: absolute; width: 0; height: 0; overflow: hidden">
<defs>
<symbol id="icon-database" viewbox="0 0 32 32">
<path d="M16 0c-8.837 0-16 2.239-16 5v4c0 2.761 7.163 5 16 5s16-2.239 16-5v-4c0-2.761-7.163-5-16-5z"></path>
<path d="M16 17c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z"></path>
<path d="M16 26c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z"></path>
</symbol>
<symbol id="icon-file-text2" viewbox="0 0 32 32">
<path d="M28.681 7.159c-0.694-0.947-1.662-2.053-2.724-3.116s-2.169-2.030-3.116-2.724c-1.612-1.182-2.393-1.319-2.841-1.319h-15.5c-1.378 0-2.5 1.121-2.5 2.5v27c0 1.378 1.122 2.5 2.5 2.5h23c1.378 0 2.5-1.122 2.5-2.5v-19.5c0-0.448-0.137-1.23-1.319-2.841zM24.543 5.457c0.959 0.959 1.712 1.825 2.268 2.543h-4.811v-4.811c0.718 0.556 1.584 1.309 2.543 2.268zM28 29.5c0 0.271-0.229 0.5-0.5 0.5h-23c-0.271 0-0.5-0.229-0.5-0.5v-27c0-0.271 0.229-0.5 0.5-0.5 0 0 15.499-0 15.5 0v7c0 0.552 0.448 1 1 1h7v19.5z"></path>
<path d="M23 26h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
<path d="M23 22h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
<path d="M23 18h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
</symbol>
</defs>
</svg>
<style>/* CSS stylesheet for displaying xarray objects in jupyterlab.
 *
 */

:root {
  --xr-font-color0: var(--jp-content-font-color0, rgba(0, 0, 0, 1));
  --xr-font-color2: var(--jp-content-font-color2, rgba(0, 0, 0, 0.54));
  --xr-font-color3: var(--jp-content-font-color3, rgba(0, 0, 0, 0.38));
  --xr-border-color: var(--jp-border-color2, #e0e0e0);
  --xr-disabled-color: var(--jp-layout-color3, #bdbdbd);
  --xr-background-color: var(--jp-layout-color0, white);
  --xr-background-color-row-even: var(--jp-layout-color1, white);
  --xr-background-color-row-odd: var(--jp-layout-color2, #eeeeee);
}

html[theme=dark],
body[data-theme=dark],
body.vscode-dark {
  --xr-font-color0: rgba(255, 255, 255, 1);
  --xr-font-color2: rgba(255, 255, 255, 0.54);
  --xr-font-color3: rgba(255, 255, 255, 0.38);
  --xr-border-color: #1F1F1F;
  --xr-disabled-color: #515151;
  --xr-background-color: #111111;
  --xr-background-color-row-even: #111111;
  --xr-background-color-row-odd: #313131;
}

.xr-wrap {
  display: block !important;
  min-width: 300px;
  max-width: 700px;
}

.xr-text-repr-fallback {
  /* fallback to plain text repr when CSS is not injected (untrusted notebook) */
  display: none;
}

.xr-header {
  padding-top: 6px;
  padding-bottom: 6px;
  margin-bottom: 4px;
  border-bottom: solid 1px var(--xr-border-color);
}

.xr-header > div,
.xr-header > ul {
  display: inline;
  margin-top: 0;
  margin-bottom: 0;
}

.xr-obj-type,
.xr-array-name {
  margin-left: 2px;
  margin-right: 10px;
}

.xr-obj-type {
  color: var(--xr-font-color2);
}

.xr-sections {
  padding-left: 0 !important;
  display: grid;
  grid-template-columns: 150px auto auto 1fr 20px 20px;
}

.xr-section-item {
  display: contents;
}

.xr-section-item input {
  display: none;
}

.xr-section-item input + label {
  color: var(--xr-disabled-color);
}

.xr-section-item input:enabled + label {
  cursor: pointer;
  color: var(--xr-font-color2);
}

.xr-section-item input:enabled + label:hover {
  color: var(--xr-font-color0);
}

.xr-section-summary {
  grid-column: 1;
  color: var(--xr-font-color2);
  font-weight: 500;
}

.xr-section-summary > span {
  display: inline-block;
  padding-left: 0.5em;
}

.xr-section-summary-in:disabled + label {
  color: var(--xr-font-color2);
}

.xr-section-summary-in + label:before {
  display: inline-block;
  content: '►';
  font-size: 11px;
  width: 15px;
  text-align: center;
}

.xr-section-summary-in:disabled + label:before {
  color: var(--xr-disabled-color);
}

.xr-section-summary-in:checked + label:before {
  content: '▼';
}

.xr-section-summary-in:checked + label > span {
  display: none;
}

.xr-section-summary,
.xr-section-inline-details {
  padding-top: 4px;
  padding-bottom: 4px;
}

.xr-section-inline-details {
  grid-column: 2 / -1;
}

.xr-section-details {
  display: none;
  grid-column: 1 / -1;
  margin-bottom: 5px;
}

.xr-section-summary-in:checked ~ .xr-section-details {
  display: contents;
}

.xr-array-wrap {
  grid-column: 1 / -1;
  display: grid;
  grid-template-columns: 20px auto;
}

.xr-array-wrap > label {
  grid-column: 1;
  vertical-align: top;
}

.xr-preview {
  color: var(--xr-font-color3);
}

.xr-array-preview,
.xr-array-data {
  padding: 0 5px !important;
  grid-column: 2;
}

.xr-array-data,
.xr-array-in:checked ~ .xr-array-preview {
  display: none;
}

.xr-array-in:checked ~ .xr-array-data,
.xr-array-preview {
  display: inline-block;
}

.xr-dim-list {
  display: inline-block !important;
  list-style: none;
  padding: 0 !important;
  margin: 0;
}

.xr-dim-list li {
  display: inline-block;
  padding: 0;
  margin: 0;
}

.xr-dim-list:before {
  content: '(';
}

.xr-dim-list:after {
  content: ')';
}

.xr-dim-list li:not(:last-child):after {
  content: ',';
  padding-right: 5px;
}

.xr-has-index {
  font-weight: bold;
}

.xr-var-list,
.xr-var-item {
  display: contents;
}

.xr-var-item > div,
.xr-var-item label,
.xr-var-item > .xr-var-name span {
  background-color: var(--xr-background-color-row-even);
  margin-bottom: 0;
}

.xr-var-item > .xr-var-name:hover span {
  padding-right: 5px;
}

.xr-var-list > li:nth-child(odd) > div,
.xr-var-list > li:nth-child(odd) > label,
.xr-var-list > li:nth-child(odd) > .xr-var-name span {
  background-color: var(--xr-background-color-row-odd);
}

.xr-var-name {
  grid-column: 1;
}

.xr-var-dims {
  grid-column: 2;
}

.xr-var-dtype {
  grid-column: 3;
  text-align: right;
  color: var(--xr-font-color2);
}

.xr-var-preview {
  grid-column: 4;
}

.xr-index-preview {
  grid-column: 2 / 5;
  color: var(--xr-font-color2);
}

.xr-var-name,
.xr-var-dims,
.xr-var-dtype,
.xr-preview,
.xr-attrs dt {
  white-space: nowrap;
  overflow: hidden;
  text-overflow: ellipsis;
  padding-right: 10px;
}

.xr-var-name:hover,
.xr-var-dims:hover,
.xr-var-dtype:hover,
.xr-attrs dt:hover {
  overflow: visible;
  width: auto;
  z-index: 1;
}

.xr-var-attrs,
.xr-var-data,
.xr-index-data {
  display: none;
  background-color: var(--xr-background-color) !important;
  padding-bottom: 5px !important;
}

.xr-var-attrs-in:checked ~ .xr-var-attrs,
.xr-var-data-in:checked ~ .xr-var-data,
.xr-index-data-in:checked ~ .xr-index-data {
  display: block;
}

.xr-var-data > table {
  float: right;
}

.xr-var-name span,
.xr-var-data,
.xr-index-name div,
.xr-index-data,
.xr-attrs {
  padding-left: 25px !important;
}

.xr-attrs,
.xr-var-attrs,
.xr-var-data,
.xr-index-data {
  grid-column: 1 / -1;
}

dl.xr-attrs {
  padding: 0;
  margin: 0;
  display: grid;
  grid-template-columns: 125px auto;
}

.xr-attrs dt,
.xr-attrs dd {
  padding: 0;
  margin: 0;
  float: left;
  padding-right: 10px;
  width: auto;
}

.xr-attrs dt {
  font-weight: normal;
  grid-column: 1;
}

.xr-attrs dt:hover span {
  display: inline-block;
  background: var(--xr-background-color);
  padding-right: 10px;
}

.xr-attrs dd {
  grid-column: 2;
  white-space: pre-wrap;
  word-break: break-all;
}

.xr-icon-database,
.xr-icon-file-text2,
.xr-no-icon {
  display: inline-block;
  vertical-align: middle;
  width: 1em;
  height: 1.5em !important;
  stroke-width: 0;
  stroke: currentColor;
  fill: currentColor;
}
</style><pre class="xr-text-repr-fallback">&lt;xarray.Dataset&gt;
Dimensions:      (y: 210, x: 210)
Coordinates:
  * y            (y) float64 4.997e+06 4.997e+06 ... 4.991e+06 4.991e+06
  * x            (x) float64 7.881e+05 7.881e+05 ... 7.943e+05 7.943e+05
    spatial_ref  int32 32616
    time         datetime64[ns] 2022-06-11T16:22:04.584079
Data variables:
    nir08        (y, x) uint16 16636 16749 16906 16423 ... 9239 9247 9233 9219
    red          (y, x) int64 33 34 34 35 38 40 40 40 ... 38 37 37 36 37 37 36
    green        (y, x) int64 32 33 34 36 39 39 41 41 ... 32 32 29 29 29 29 29
    blue         (y, x) int64 43 47 45 49 52 53 55 54 ... 73 72 70 69 68 66 66
    qa_pixel     (y, x) uint16 21824 21824 21824 21824 ... 22280 22280 22280
    lwir11       (y, x) uint16 39453 39482 39491 39496 ... 38820 38829 38837
    ndvi         (y, x) float64 0.996 0.9959 0.996 0.9957 ... 0.992 0.992 0.9922</pre><div class="xr-wrap" style="display:none"><div class="xr-header"><div class="xr-obj-type">xarray.Dataset</div></div><ul class="xr-sections"><li class="xr-section-item"><input id="section-6a58828e-bc9f-4589-bf95-e1743674649a" class="xr-section-summary-in" type="checkbox" disabled=""><label for="section-6a58828e-bc9f-4589-bf95-e1743674649a" class="xr-section-summary" title="Expand/collapse section">Dimensions:</label><div class="xr-section-inline-details"><ul class="xr-dim-list"><li><span class="xr-has-index">y</span>: 210</li><li><span class="xr-has-index">x</span>: 210</li></ul></div><div class="xr-section-details"></div></li><li class="xr-section-item"><input id="section-57d94c26-d283-4079-9a8c-34d937d37064" class="xr-section-summary-in" type="checkbox" checked=""><label for="section-57d94c26-d283-4079-9a8c-34d937d37064" class="xr-section-summary">Coordinates: <span>(4)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><ul class="xr-var-list"><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">y</span></div><div class="xr-var-dims">(y)</div><div class="xr-var-dtype">float64</div><div class="xr-var-preview xr-preview">4.997e+06 4.997e+06 ... 4.991e+06</div><input id="attrs-8f5e0e83-9410-4299-b245-780ec46d5a2a" class="xr-var-attrs-in" type="checkbox"><label for="attrs-8f5e0e83-9410-4299-b245-780ec46d5a2a" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-d4691103-1a7a-4595-af25-d7dd31869e39" class="xr-var-data-in" type="checkbox"><label for="data-d4691103-1a7a-4595-af25-d7dd31869e39" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>units :</span></dt><dd>metre</dd><dt><span>resolution :</span></dt><dd>-30.0</dd><dt><span>crs :</span></dt><dd>EPSG:32616</dd></dl></div><div class="xr-var-data"><pre>array([4997145., 4997115., 4997085., ..., 4990935., 4990905., 4990875.])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">x</span></div><div class="xr-var-dims">(x)</div><div class="xr-var-dtype">float64</div><div class="xr-var-preview xr-preview">7.881e+05 7.881e+05 ... 7.943e+05</div><input id="attrs-90e8d463-20df-4d7b-92c2-bbef97d903f2" class="xr-var-attrs-in" type="checkbox"><label for="attrs-90e8d463-20df-4d7b-92c2-bbef97d903f2" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-79088250-f214-4bd9-9919-39aff4f299cf" class="xr-var-data-in" type="checkbox"><label for="data-79088250-f214-4bd9-9919-39aff4f299cf" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>units :</span></dt><dd>metre</dd><dt><span>resolution :</span></dt><dd>30.0</dd><dt><span>crs :</span></dt><dd>EPSG:32616</dd></dl></div><div class="xr-var-data"><pre>array([788055., 788085., 788115., ..., 794265., 794295., 794325.])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span>spatial_ref</span></div><div class="xr-var-dims">()</div><div class="xr-var-dtype">int32</div><div class="xr-var-preview xr-preview">32616</div><input id="attrs-0e686cc9-72ff-4ad0-8c54-04e141f3811d" class="xr-var-attrs-in" type="checkbox"><label for="attrs-0e686cc9-72ff-4ad0-8c54-04e141f3811d" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-75c72a21-361a-402b-825c-f9e7f8f9e284" class="xr-var-data-in" type="checkbox"><label for="data-75c72a21-361a-402b-825c-f9e7f8f9e284" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>spatial_ref :</span></dt><dd>PROJCRS["WGS 84 / UTM zone 16N",BASEGEOGCRS["WGS 84",ENSEMBLE["World Geodetic System 1984 ensemble",MEMBER["World Geodetic System 1984 (Transit)"],MEMBER["World Geodetic System 1984 (G730)"],MEMBER["World Geodetic System 1984 (G873)"],MEMBER["World Geodetic System 1984 (G1150)"],MEMBER["World Geodetic System 1984 (G1674)"],MEMBER["World Geodetic System 1984 (G1762)"],MEMBER["World Geodetic System 1984 (G2139)"],ELLIPSOID["WGS 84",6378137,298.257223563,LENGTHUNIT["metre",1]],ENSEMBLEACCURACY[2.0]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433]],ID["EPSG",4326]],CONVERSION["UTM zone 16N",METHOD["Transverse Mercator",ID["EPSG",9807]],PARAMETER["Latitude of natural origin",0,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8801]],PARAMETER["Longitude of natural origin",-87,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8802]],PARAMETER["Scale factor at natural origin",0.9996,SCALEUNIT["unity",1],ID["EPSG",8805]],PARAMETER["False easting",500000,LENGTHUNIT["metre",1],ID["EPSG",8806]],PARAMETER["False northing",0,LENGTHUNIT["metre",1],ID["EPSG",8807]]],CS[Cartesian,2],AXIS["(E)",east,ORDER[1],LENGTHUNIT["metre",1]],AXIS["(N)",north,ORDER[2],LENGTHUNIT["metre",1]],USAGE[SCOPE["Engineering survey, topographic mapping."],AREA["Between 90°W and 84°W, northern hemisphere between equator and 84°N, onshore and offshore. Belize. Canada - Manitoba; Nunavut; Ontario. Costa Rica. Cuba. Ecuador - Galapagos. El Salvador. Guatemala. Honduras. Mexico. Nicaragua. United States (USA)."],BBOX[0,-90,84,-84]],ID["EPSG",32616]]</dd><dt><span>crs_wkt :</span></dt><dd>PROJCRS["WGS 84 / UTM zone 16N",BASEGEOGCRS["WGS 84",ENSEMBLE["World Geodetic System 1984 ensemble",MEMBER["World Geodetic System 1984 (Transit)"],MEMBER["World Geodetic System 1984 (G730)"],MEMBER["World Geodetic System 1984 (G873)"],MEMBER["World Geodetic System 1984 (G1150)"],MEMBER["World Geodetic System 1984 (G1674)"],MEMBER["World Geodetic System 1984 (G1762)"],MEMBER["World Geodetic System 1984 (G2139)"],ELLIPSOID["WGS 84",6378137,298.257223563,LENGTHUNIT["metre",1]],ENSEMBLEACCURACY[2.0]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433]],ID["EPSG",4326]],CONVERSION["UTM zone 16N",METHOD["Transverse Mercator",ID["EPSG",9807]],PARAMETER["Latitude of natural origin",0,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8801]],PARAMETER["Longitude of natural origin",-87,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8802]],PARAMETER["Scale factor at natural origin",0.9996,SCALEUNIT["unity",1],ID["EPSG",8805]],PARAMETER["False easting",500000,LENGTHUNIT["metre",1],ID["EPSG",8806]],PARAMETER["False northing",0,LENGTHUNIT["metre",1],ID["EPSG",8807]]],CS[Cartesian,2],AXIS["(E)",east,ORDER[1],LENGTHUNIT["metre",1]],AXIS["(N)",north,ORDER[2],LENGTHUNIT["metre",1]],USAGE[SCOPE["Engineering survey, topographic mapping."],AREA["Between 90°W and 84°W, northern hemisphere between equator and 84°N, onshore and offshore. Belize. Canada - Manitoba; Nunavut; Ontario. Costa Rica. Cuba. Ecuador - Galapagos. El Salvador. Guatemala. Honduras. Mexico. Nicaragua. United States (USA)."],BBOX[0,-90,84,-84]],ID["EPSG",32616]]</dd><dt><span>semi_major_axis :</span></dt><dd>6378137.0</dd><dt><span>semi_minor_axis :</span></dt><dd>6356752.314245179</dd><dt><span>inverse_flattening :</span></dt><dd>298.257223563</dd><dt><span>reference_ellipsoid_name :</span></dt><dd>WGS 84</dd><dt><span>longitude_of_prime_meridian :</span></dt><dd>0.0</dd><dt><span>prime_meridian_name :</span></dt><dd>Greenwich</dd><dt><span>geographic_crs_name :</span></dt><dd>WGS 84</dd><dt><span>horizontal_datum_name :</span></dt><dd>World Geodetic System 1984 ensemble</dd><dt><span>projected_crs_name :</span></dt><dd>WGS 84 / UTM zone 16N</dd><dt><span>grid_mapping_name :</span></dt><dd>transverse_mercator</dd><dt><span>latitude_of_projection_origin :</span></dt><dd>0.0</dd><dt><span>longitude_of_central_meridian :</span></dt><dd>-87.0</dd><dt><span>false_easting :</span></dt><dd>500000.0</dd><dt><span>false_northing :</span></dt><dd>0.0</dd><dt><span>scale_factor_at_central_meridian :</span></dt><dd>0.9996</dd></dl></div><div class="xr-var-data"><pre>array(32616, dtype=int32)</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span>time</span></div><div class="xr-var-dims">()</div><div class="xr-var-dtype">datetime64[ns]</div><div class="xr-var-preview xr-preview">2022-06-11T16:22:04.584079</div><input id="attrs-5c41f103-ab73-41ed-9735-43cbd65106ef" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-5c41f103-ab73-41ed-9735-43cbd65106ef" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-9f5627ba-9e0c-45b2-94b3-6baf44004305" class="xr-var-data-in" type="checkbox"><label for="data-9f5627ba-9e0c-45b2-94b3-6baf44004305" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array('2022-06-11T16:22:04.584079000', dtype='datetime64[ns]')</pre></div></li></ul></div></li><li class="xr-section-item"><input id="section-9bf8849f-d394-4d7c-850a-fa5fa17f19c7" class="xr-section-summary-in" type="checkbox" checked=""><label for="section-9bf8849f-d394-4d7c-850a-fa5fa17f19c7" class="xr-section-summary">Data variables: <span>(7)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><ul class="xr-var-list"><li class="xr-var-item"><div class="xr-var-name"><span>nir08</span></div><div class="xr-var-dims">(y, x)</div><div class="xr-var-dtype">uint16</div><div class="xr-var-preview xr-preview">16636 16749 16906 ... 9233 9219</div><input id="attrs-b5295db9-9dc9-4b62-8445-bb560127bb19" class="xr-var-attrs-in" type="checkbox"><label for="attrs-b5295db9-9dc9-4b62-8445-bb560127bb19" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-06ff78b5-b33d-4d85-9cbe-8bd3e377d084" class="xr-var-data-in" type="checkbox"><label for="data-06ff78b5-b33d-4d85-9cbe-8bd3e377d084" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>nodata :</span></dt><dd>0</dd></dl></div><div class="xr-var-data"><pre>array([[16636, 16749, 16906, ...,  8536,  8521,  8502],
       [17144, 17292, 17315, ...,  8538,  8537,  8517],
       [17331, 17437, 16964, ...,  8574,  8597,  8541],
       ...,
       [ 9111,  9094,  9205, ...,  9298,  9273,  9279],
       [ 9111,  9079,  9165, ...,  9333,  9263,  9247],
       [ 9160,  9123,  9186, ...,  9247,  9233,  9219]], dtype=uint16)</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span>red</span></div><div class="xr-var-dims">(y, x)</div><div class="xr-var-dtype">int64</div><div class="xr-var-preview xr-preview">33 34 34 35 38 ... 37 36 37 37 36</div><input id="attrs-1983f6c9-51d7-45b0-95d7-3e5fdc71f531" class="xr-var-attrs-in" type="checkbox"><label for="attrs-1983f6c9-51d7-45b0-95d7-3e5fdc71f531" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-d097f64f-dc47-4e06-8ac9-9297b2bd8f53" class="xr-var-data-in" type="checkbox"><label for="data-d097f64f-dc47-4e06-8ac9-9297b2bd8f53" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>nodata :</span></dt><dd>0</dd></dl></div><div class="xr-var-data"><pre>array([[33, 34, 34, ...,  9,  9, 10],
       [33, 33, 35, ...,  9,  9, 10],
       [33, 33, 35, ...,  9, 10, 10],
       ...,
       [29, 29, 33, ..., 37, 37, 38],
       [29, 28, 32, ..., 39, 37, 36],
       [30, 30, 32, ..., 37, 37, 36]])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span>green</span></div><div class="xr-var-dims">(y, x)</div><div class="xr-var-dtype">int64</div><div class="xr-var-preview xr-preview">32 33 34 36 39 ... 29 29 29 29 29</div><input id="attrs-33ec3220-4360-4153-9708-d00a3306e369" class="xr-var-attrs-in" type="checkbox"><label for="attrs-33ec3220-4360-4153-9708-d00a3306e369" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-b170a77d-1a2f-464f-ba3f-b110ed6c2ff7" class="xr-var-data-in" type="checkbox"><label for="data-b170a77d-1a2f-464f-ba3f-b110ed6c2ff7" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>nodata :</span></dt><dd>0</dd></dl></div><div class="xr-var-data"><pre>array([[32, 33, 34, ..., 10, 10,  9],
       [32, 32, 35, ..., 10,  9,  7],
       [33, 33, 34, ..., 10, 11, 11],
       ...,
       [15, 15, 18, ..., 31, 30, 32],
       [14, 14, 17, ..., 32, 30, 30],
       [16, 16, 17, ..., 29, 29, 29]])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span>blue</span></div><div class="xr-var-dims">(y, x)</div><div class="xr-var-dtype">int64</div><div class="xr-var-preview xr-preview">43 47 45 49 52 ... 70 69 68 66 66</div><input id="attrs-83c5c71f-73f6-451e-88c6-c6495f868a9b" class="xr-var-attrs-in" type="checkbox"><label for="attrs-83c5c71f-73f6-451e-88c6-c6495f868a9b" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-bb3be79b-5ba0-46a4-af24-92c9139afae0" class="xr-var-data-in" type="checkbox"><label for="data-bb3be79b-5ba0-46a4-af24-92c9139afae0" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>nodata :</span></dt><dd>0</dd></dl></div><div class="xr-var-data"><pre>array([[43, 47, 45, ..., 19, 21, 20],
       [43, 43, 44, ..., 19, 22, 20],
       [43, 43, 45, ..., 20, 23, 22],
       ...,
       [55, 55, 60, ..., 67, 69, 69],
       [55, 54, 57, ..., 69, 69, 69],
       [55, 56, 58, ..., 68, 66, 66]])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span>qa_pixel</span></div><div class="xr-var-dims">(y, x)</div><div class="xr-var-dtype">uint16</div><div class="xr-var-preview xr-preview">21824 21824 21824 ... 22280 22280</div><input id="attrs-a9c55737-d634-4397-bdbf-16eb23bdc785" class="xr-var-attrs-in" type="checkbox"><label for="attrs-a9c55737-d634-4397-bdbf-16eb23bdc785" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-b3e4aaf5-3344-48c1-a609-cd3d96b8ce7e" class="xr-var-data-in" type="checkbox"><label for="data-b3e4aaf5-3344-48c1-a609-cd3d96b8ce7e" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>nodata :</span></dt><dd>1</dd></dl></div><div class="xr-var-data"><pre>array([[21824, 21824, 21824, ..., 22280, 22280, 22280],
       [21824, 21824, 21824, ..., 22280, 22280, 22280],
       [21824, 21824, 21824, ..., 22280, 22280, 22280],
       ...,
       [22280, 22280, 22280, ..., 22280, 22280, 22280],
       [22280, 22280, 22280, ..., 22280, 22280, 22280],
       [22280, 22280, 22280, ..., 22280, 22280, 22280]], dtype=uint16)</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span>lwir11</span></div><div class="xr-var-dims">(y, x)</div><div class="xr-var-dtype">uint16</div><div class="xr-var-preview xr-preview">39453 39482 39491 ... 38829 38837</div><input id="attrs-62acb11e-e677-4747-8dd1-3fd28d8334f0" class="xr-var-attrs-in" type="checkbox"><label for="attrs-62acb11e-e677-4747-8dd1-3fd28d8334f0" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-b521683c-fcee-47ea-b793-0c6a6b9bb70f" class="xr-var-data-in" type="checkbox"><label for="data-b521683c-fcee-47ea-b793-0c6a6b9bb70f" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>nodata :</span></dt><dd>0</dd></dl></div><div class="xr-var-data"><pre>array([[39453, 39482, 39491, ..., 38376, 38349, 38294],
       [39438, 39433, 39437, ..., 38369, 38341, 38292],
       [39419, 39399, 39390, ..., 38356, 38332, 38294],
       ...,
       [38304, 38354, 38403, ..., 38795, 38818, 38849],
       [38282, 38344, 38397, ..., 38812, 38828, 38847],
       [38242, 38308, 38367, ..., 38820, 38829, 38837]], dtype=uint16)</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span>ndvi</span></div><div class="xr-var-dims">(y, x)</div><div class="xr-var-dtype">float64</div><div class="xr-var-preview xr-preview">0.996 0.9959 0.996 ... 0.992 0.9922</div><input id="attrs-d852b1ec-7ac6-4a44-9d9a-3bf70363a58c" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-d852b1ec-7ac6-4a44-9d9a-3bf70363a58c" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-88cfe6bd-e7be-4f35-a6f9-889c9b946eb5" class="xr-var-data-in" type="checkbox"><label for="data-88cfe6bd-e7be-4f35-a6f9-889c9b946eb5" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array([[0.99604055, 0.99594828, 0.99598583, ..., 0.9978935 , 0.9978898 ,
        0.99765038],
       [0.99615765, 0.99619048, 0.99596542, ..., 0.997894  , 0.99789375,
        0.99765451],
       [0.99619903, 0.9962221 , 0.99588211, ..., 0.99790283, 0.99767631,
        0.99766109],
       ...,
       [0.99365427, 0.99364244, 0.9928556 , ..., 0.99207284, 0.99205156,
        0.99184287],
       [0.99365427, 0.99385088, 0.99304121, ..., 0.99167734, 0.99204301,
        0.99224389],
       [0.99347116, 0.99344477, 0.99305706, ..., 0.9920293 , 0.99201726,
        0.99222042]])</pre></div></li></ul></div></li><li class="xr-section-item"><input id="section-36e5929a-1948-4b9a-a28f-3340fca7ab20" class="xr-section-summary-in" type="checkbox"><label for="section-36e5929a-1948-4b9a-a28f-3340fca7ab20" class="xr-section-summary">Indexes: <span>(2)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><ul class="xr-var-list"><li class="xr-var-item"><div class="xr-index-name"><div>y</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-4809feee-9958-455c-9019-48411f843943" class="xr-index-data-in" type="checkbox"><label for="index-4809feee-9958-455c-9019-48411f843943" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Float64Index([4997145.0, 4997115.0, 4997085.0, 4997055.0, 4997025.0, 4996995.0,
              4996965.0, 4996935.0, 4996905.0, 4996875.0,
              ...
              4991145.0, 4991115.0, 4991085.0, 4991055.0, 4991025.0, 4990995.0,
              4990965.0, 4990935.0, 4990905.0, 4990875.0],
             dtype='float64', name='y', length=210))</pre></div></li><li class="xr-var-item"><div class="xr-index-name"><div>x</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-94f16ee9-6caf-4fb6-a51d-cbbd73a70561" class="xr-index-data-in" type="checkbox"><label for="index-94f16ee9-6caf-4fb6-a51d-cbbd73a70561" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Float64Index([788055.0, 788085.0, 788115.0, 788145.0, 788175.0, 788205.0,
              788235.0, 788265.0, 788295.0, 788325.0,
              ...
              794055.0, 794085.0, 794115.0, 794145.0, 794175.0, 794205.0,
              794235.0, 794265.0, 794295.0, 794325.0],
             dtype='float64', name='x', length=210))</pre></div></li></ul></div></li><li class="xr-section-item"><input id="section-0293e846-0de0-45c3-8ec0-e8ac80a04ada" class="xr-section-summary-in" type="checkbox" disabled=""><label for="section-0293e846-0de0-45c3-8ec0-e8ac80a04ada" class="xr-section-summary" title="Expand/collapse section">Attributes: <span>(0)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><dl class="xr-attrs"></dl></div></li></ul></div></div>
</div>
</div>
<p>Finally, let’s plot surface temperature (available via the <code>lwir11</code> key). According to the MPC tutorial:</p>
<blockquote class="blockquote">
<p>The raw values are rescaled, so you should scale and offset the data before interpreting it. Use the metadata in the asset’s raster_bands to find the scale and offset values:</p>
</blockquote>
<div class="cell" data-execution_count="51">
<div class="sourceCode cell-code" id="cb77" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb77-1">band_info <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> best_item_landsat.item_obj.assets[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lwir11"</span>].extra_fields[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"raster:bands"</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb77-2">band_info</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="51">
<pre><code>{'unit': 'kelvin',
 'scale': 0.00341802,
 'nodata': 0,
 'offset': 149.0,
 'data_type': 'uint16',
 'spatial_resolution': 30}</code></pre>
</div>
</div>
<p>To go from raw values to Kelvin we do this (again, just following the MPC tutorial):</p>
<div class="cell" data-execution_count="52">
<div class="sourceCode cell-code" id="cb79" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb79-1">temperature <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> landsat_image_array_odcstac[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lwir11"</span>].astype(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">float</span>)</span>
<span id="cb79-2">temperature <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*=</span> band_info[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"scale"</span>]</span>
<span id="cb79-3">temperature <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> band_info[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"offset"</span>]</span>
<span id="cb79-4">temperature[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, :<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="52">

<div><svg style="position: absolute; width: 0; height: 0; overflow: hidden">
<defs>
<symbol id="icon-database" viewbox="0 0 32 32">
<path d="M16 0c-8.837 0-16 2.239-16 5v4c0 2.761 7.163 5 16 5s16-2.239 16-5v-4c0-2.761-7.163-5-16-5z"></path>
<path d="M16 17c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z"></path>
<path d="M16 26c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z"></path>
</symbol>
<symbol id="icon-file-text2" viewbox="0 0 32 32">
<path d="M28.681 7.159c-0.694-0.947-1.662-2.053-2.724-3.116s-2.169-2.030-3.116-2.724c-1.612-1.182-2.393-1.319-2.841-1.319h-15.5c-1.378 0-2.5 1.121-2.5 2.5v27c0 1.378 1.122 2.5 2.5 2.5h23c1.378 0 2.5-1.122 2.5-2.5v-19.5c0-0.448-0.137-1.23-1.319-2.841zM24.543 5.457c0.959 0.959 1.712 1.825 2.268 2.543h-4.811v-4.811c0.718 0.556 1.584 1.309 2.543 2.268zM28 29.5c0 0.271-0.229 0.5-0.5 0.5h-23c-0.271 0-0.5-0.229-0.5-0.5v-27c0-0.271 0.229-0.5 0.5-0.5 0 0 15.499-0 15.5 0v7c0 0.552 0.448 1 1 1h7v19.5z"></path>
<path d="M23 26h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
<path d="M23 22h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
<path d="M23 18h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
</symbol>
</defs>
</svg>
<style>/* CSS stylesheet for displaying xarray objects in jupyterlab.
 *
 */

:root {
  --xr-font-color0: var(--jp-content-font-color0, rgba(0, 0, 0, 1));
  --xr-font-color2: var(--jp-content-font-color2, rgba(0, 0, 0, 0.54));
  --xr-font-color3: var(--jp-content-font-color3, rgba(0, 0, 0, 0.38));
  --xr-border-color: var(--jp-border-color2, #e0e0e0);
  --xr-disabled-color: var(--jp-layout-color3, #bdbdbd);
  --xr-background-color: var(--jp-layout-color0, white);
  --xr-background-color-row-even: var(--jp-layout-color1, white);
  --xr-background-color-row-odd: var(--jp-layout-color2, #eeeeee);
}

html[theme=dark],
body[data-theme=dark],
body.vscode-dark {
  --xr-font-color0: rgba(255, 255, 255, 1);
  --xr-font-color2: rgba(255, 255, 255, 0.54);
  --xr-font-color3: rgba(255, 255, 255, 0.38);
  --xr-border-color: #1F1F1F;
  --xr-disabled-color: #515151;
  --xr-background-color: #111111;
  --xr-background-color-row-even: #111111;
  --xr-background-color-row-odd: #313131;
}

.xr-wrap {
  display: block !important;
  min-width: 300px;
  max-width: 700px;
}

.xr-text-repr-fallback {
  /* fallback to plain text repr when CSS is not injected (untrusted notebook) */
  display: none;
}

.xr-header {
  padding-top: 6px;
  padding-bottom: 6px;
  margin-bottom: 4px;
  border-bottom: solid 1px var(--xr-border-color);
}

.xr-header > div,
.xr-header > ul {
  display: inline;
  margin-top: 0;
  margin-bottom: 0;
}

.xr-obj-type,
.xr-array-name {
  margin-left: 2px;
  margin-right: 10px;
}

.xr-obj-type {
  color: var(--xr-font-color2);
}

.xr-sections {
  padding-left: 0 !important;
  display: grid;
  grid-template-columns: 150px auto auto 1fr 20px 20px;
}

.xr-section-item {
  display: contents;
}

.xr-section-item input {
  display: none;
}

.xr-section-item input + label {
  color: var(--xr-disabled-color);
}

.xr-section-item input:enabled + label {
  cursor: pointer;
  color: var(--xr-font-color2);
}

.xr-section-item input:enabled + label:hover {
  color: var(--xr-font-color0);
}

.xr-section-summary {
  grid-column: 1;
  color: var(--xr-font-color2);
  font-weight: 500;
}

.xr-section-summary > span {
  display: inline-block;
  padding-left: 0.5em;
}

.xr-section-summary-in:disabled + label {
  color: var(--xr-font-color2);
}

.xr-section-summary-in + label:before {
  display: inline-block;
  content: '►';
  font-size: 11px;
  width: 15px;
  text-align: center;
}

.xr-section-summary-in:disabled + label:before {
  color: var(--xr-disabled-color);
}

.xr-section-summary-in:checked + label:before {
  content: '▼';
}

.xr-section-summary-in:checked + label > span {
  display: none;
}

.xr-section-summary,
.xr-section-inline-details {
  padding-top: 4px;
  padding-bottom: 4px;
}

.xr-section-inline-details {
  grid-column: 2 / -1;
}

.xr-section-details {
  display: none;
  grid-column: 1 / -1;
  margin-bottom: 5px;
}

.xr-section-summary-in:checked ~ .xr-section-details {
  display: contents;
}

.xr-array-wrap {
  grid-column: 1 / -1;
  display: grid;
  grid-template-columns: 20px auto;
}

.xr-array-wrap > label {
  grid-column: 1;
  vertical-align: top;
}

.xr-preview {
  color: var(--xr-font-color3);
}

.xr-array-preview,
.xr-array-data {
  padding: 0 5px !important;
  grid-column: 2;
}

.xr-array-data,
.xr-array-in:checked ~ .xr-array-preview {
  display: none;
}

.xr-array-in:checked ~ .xr-array-data,
.xr-array-preview {
  display: inline-block;
}

.xr-dim-list {
  display: inline-block !important;
  list-style: none;
  padding: 0 !important;
  margin: 0;
}

.xr-dim-list li {
  display: inline-block;
  padding: 0;
  margin: 0;
}

.xr-dim-list:before {
  content: '(';
}

.xr-dim-list:after {
  content: ')';
}

.xr-dim-list li:not(:last-child):after {
  content: ',';
  padding-right: 5px;
}

.xr-has-index {
  font-weight: bold;
}

.xr-var-list,
.xr-var-item {
  display: contents;
}

.xr-var-item > div,
.xr-var-item label,
.xr-var-item > .xr-var-name span {
  background-color: var(--xr-background-color-row-even);
  margin-bottom: 0;
}

.xr-var-item > .xr-var-name:hover span {
  padding-right: 5px;
}

.xr-var-list > li:nth-child(odd) > div,
.xr-var-list > li:nth-child(odd) > label,
.xr-var-list > li:nth-child(odd) > .xr-var-name span {
  background-color: var(--xr-background-color-row-odd);
}

.xr-var-name {
  grid-column: 1;
}

.xr-var-dims {
  grid-column: 2;
}

.xr-var-dtype {
  grid-column: 3;
  text-align: right;
  color: var(--xr-font-color2);
}

.xr-var-preview {
  grid-column: 4;
}

.xr-index-preview {
  grid-column: 2 / 5;
  color: var(--xr-font-color2);
}

.xr-var-name,
.xr-var-dims,
.xr-var-dtype,
.xr-preview,
.xr-attrs dt {
  white-space: nowrap;
  overflow: hidden;
  text-overflow: ellipsis;
  padding-right: 10px;
}

.xr-var-name:hover,
.xr-var-dims:hover,
.xr-var-dtype:hover,
.xr-attrs dt:hover {
  overflow: visible;
  width: auto;
  z-index: 1;
}

.xr-var-attrs,
.xr-var-data,
.xr-index-data {
  display: none;
  background-color: var(--xr-background-color) !important;
  padding-bottom: 5px !important;
}

.xr-var-attrs-in:checked ~ .xr-var-attrs,
.xr-var-data-in:checked ~ .xr-var-data,
.xr-index-data-in:checked ~ .xr-index-data {
  display: block;
}

.xr-var-data > table {
  float: right;
}

.xr-var-name span,
.xr-var-data,
.xr-index-name div,
.xr-index-data,
.xr-attrs {
  padding-left: 25px !important;
}

.xr-attrs,
.xr-var-attrs,
.xr-var-data,
.xr-index-data {
  grid-column: 1 / -1;
}

dl.xr-attrs {
  padding: 0;
  margin: 0;
  display: grid;
  grid-template-columns: 125px auto;
}

.xr-attrs dt,
.xr-attrs dd {
  padding: 0;
  margin: 0;
  float: left;
  padding-right: 10px;
  width: auto;
}

.xr-attrs dt {
  font-weight: normal;
  grid-column: 1;
}

.xr-attrs dt:hover span {
  display: inline-block;
  background: var(--xr-background-color);
  padding-right: 10px;
}

.xr-attrs dd {
  grid-column: 2;
  white-space: pre-wrap;
  word-break: break-all;
}

.xr-icon-database,
.xr-icon-file-text2,
.xr-no-icon {
  display: inline-block;
  vertical-align: middle;
  width: 1em;
  height: 1.5em !important;
  stroke-width: 0;
  stroke: currentColor;
  fill: currentColor;
}
</style><pre class="xr-text-repr-fallback">&lt;xarray.DataArray 'lwir11' (y: 5, x: 5)&gt;
array([[283.85114306, 283.95026564, 283.98102782, 283.99811792,
        283.97419178],
       [283.79987276, 283.78278266, 283.79645474, 283.75885652,
        283.7383484 ],
       [283.73493038, 283.66656998, 283.6358078 , 283.56061136,
        283.57086542],
       [283.6528979 , 283.62213572, 283.52984918, 283.41705452,
        283.42730858],
       [283.66315196, 283.57770146, 283.45465274, 283.32476798,
        283.35894818]])
Coordinates:
  * y            (y) float64 4.997e+06 4.997e+06 4.997e+06 4.997e+06 4.997e+06
  * x            (x) float64 7.881e+05 7.881e+05 7.881e+05 7.881e+05 7.882e+05
    spatial_ref  int32 32616
    time         datetime64[ns] 2022-06-11T16:22:04.584079
Attributes:
    nodata:   0</pre><div class="xr-wrap" style="display:none"><div class="xr-header"><div class="xr-obj-type">xarray.DataArray</div><div class="xr-array-name">'lwir11'</div><ul class="xr-dim-list"><li><span class="xr-has-index">y</span>: 5</li><li><span class="xr-has-index">x</span>: 5</li></ul></div><ul class="xr-sections"><li class="xr-section-item"><div class="xr-array-wrap"><input id="section-03c9c43b-a4c1-44ac-af20-018f6cb36680" class="xr-array-in" type="checkbox" checked=""><label for="section-03c9c43b-a4c1-44ac-af20-018f6cb36680" title="Show/hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-array-preview xr-preview"><span>283.9 284.0 284.0 284.0 284.0 283.8 ... 283.7 283.6 283.5 283.3 283.4</span></div><div class="xr-array-data"><pre>array([[283.85114306, 283.95026564, 283.98102782, 283.99811792,
        283.97419178],
       [283.79987276, 283.78278266, 283.79645474, 283.75885652,
        283.7383484 ],
       [283.73493038, 283.66656998, 283.6358078 , 283.56061136,
        283.57086542],
       [283.6528979 , 283.62213572, 283.52984918, 283.41705452,
        283.42730858],
       [283.66315196, 283.57770146, 283.45465274, 283.32476798,
        283.35894818]])</pre></div></div></li><li class="xr-section-item"><input id="section-3c586603-8ee9-474e-8322-f43c49b012ef" class="xr-section-summary-in" type="checkbox" checked=""><label for="section-3c586603-8ee9-474e-8322-f43c49b012ef" class="xr-section-summary">Coordinates: <span>(4)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><ul class="xr-var-list"><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">y</span></div><div class="xr-var-dims">(y)</div><div class="xr-var-dtype">float64</div><div class="xr-var-preview xr-preview">4.997e+06 4.997e+06 ... 4.997e+06</div><input id="attrs-cec49ca4-2df4-4d74-b409-6e5ff5a8d1b2" class="xr-var-attrs-in" type="checkbox"><label for="attrs-cec49ca4-2df4-4d74-b409-6e5ff5a8d1b2" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-1717e072-7eea-4cc3-b0ff-750f8031bb64" class="xr-var-data-in" type="checkbox"><label for="data-1717e072-7eea-4cc3-b0ff-750f8031bb64" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>units :</span></dt><dd>metre</dd><dt><span>resolution :</span></dt><dd>-30.0</dd><dt><span>crs :</span></dt><dd>EPSG:32616</dd></dl></div><div class="xr-var-data"><pre>array([4997145., 4997115., 4997085., 4997055., 4997025.])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">x</span></div><div class="xr-var-dims">(x)</div><div class="xr-var-dtype">float64</div><div class="xr-var-preview xr-preview">7.881e+05 7.881e+05 ... 7.882e+05</div><input id="attrs-1da47d74-4aa0-410e-a36d-4e543618b5b2" class="xr-var-attrs-in" type="checkbox"><label for="attrs-1da47d74-4aa0-410e-a36d-4e543618b5b2" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-373d757f-fc74-41be-9928-44ee462c3faf" class="xr-var-data-in" type="checkbox"><label for="data-373d757f-fc74-41be-9928-44ee462c3faf" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>units :</span></dt><dd>metre</dd><dt><span>resolution :</span></dt><dd>30.0</dd><dt><span>crs :</span></dt><dd>EPSG:32616</dd></dl></div><div class="xr-var-data"><pre>array([788055., 788085., 788115., 788145., 788175.])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span>spatial_ref</span></div><div class="xr-var-dims">()</div><div class="xr-var-dtype">int32</div><div class="xr-var-preview xr-preview">32616</div><input id="attrs-d7be170d-b008-4dad-b500-b626fe0e057a" class="xr-var-attrs-in" type="checkbox"><label for="attrs-d7be170d-b008-4dad-b500-b626fe0e057a" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-74f66321-003c-430d-82ce-d4c683da0d54" class="xr-var-data-in" type="checkbox"><label for="data-74f66321-003c-430d-82ce-d4c683da0d54" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>spatial_ref :</span></dt><dd>PROJCRS["WGS 84 / UTM zone 16N",BASEGEOGCRS["WGS 84",ENSEMBLE["World Geodetic System 1984 ensemble",MEMBER["World Geodetic System 1984 (Transit)"],MEMBER["World Geodetic System 1984 (G730)"],MEMBER["World Geodetic System 1984 (G873)"],MEMBER["World Geodetic System 1984 (G1150)"],MEMBER["World Geodetic System 1984 (G1674)"],MEMBER["World Geodetic System 1984 (G1762)"],MEMBER["World Geodetic System 1984 (G2139)"],ELLIPSOID["WGS 84",6378137,298.257223563,LENGTHUNIT["metre",1]],ENSEMBLEACCURACY[2.0]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433]],ID["EPSG",4326]],CONVERSION["UTM zone 16N",METHOD["Transverse Mercator",ID["EPSG",9807]],PARAMETER["Latitude of natural origin",0,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8801]],PARAMETER["Longitude of natural origin",-87,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8802]],PARAMETER["Scale factor at natural origin",0.9996,SCALEUNIT["unity",1],ID["EPSG",8805]],PARAMETER["False easting",500000,LENGTHUNIT["metre",1],ID["EPSG",8806]],PARAMETER["False northing",0,LENGTHUNIT["metre",1],ID["EPSG",8807]]],CS[Cartesian,2],AXIS["(E)",east,ORDER[1],LENGTHUNIT["metre",1]],AXIS["(N)",north,ORDER[2],LENGTHUNIT["metre",1]],USAGE[SCOPE["Engineering survey, topographic mapping."],AREA["Between 90°W and 84°W, northern hemisphere between equator and 84°N, onshore and offshore. Belize. Canada - Manitoba; Nunavut; Ontario. Costa Rica. Cuba. Ecuador - Galapagos. El Salvador. Guatemala. Honduras. Mexico. Nicaragua. United States (USA)."],BBOX[0,-90,84,-84]],ID["EPSG",32616]]</dd><dt><span>crs_wkt :</span></dt><dd>PROJCRS["WGS 84 / UTM zone 16N",BASEGEOGCRS["WGS 84",ENSEMBLE["World Geodetic System 1984 ensemble",MEMBER["World Geodetic System 1984 (Transit)"],MEMBER["World Geodetic System 1984 (G730)"],MEMBER["World Geodetic System 1984 (G873)"],MEMBER["World Geodetic System 1984 (G1150)"],MEMBER["World Geodetic System 1984 (G1674)"],MEMBER["World Geodetic System 1984 (G1762)"],MEMBER["World Geodetic System 1984 (G2139)"],ELLIPSOID["WGS 84",6378137,298.257223563,LENGTHUNIT["metre",1]],ENSEMBLEACCURACY[2.0]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433]],ID["EPSG",4326]],CONVERSION["UTM zone 16N",METHOD["Transverse Mercator",ID["EPSG",9807]],PARAMETER["Latitude of natural origin",0,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8801]],PARAMETER["Longitude of natural origin",-87,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8802]],PARAMETER["Scale factor at natural origin",0.9996,SCALEUNIT["unity",1],ID["EPSG",8805]],PARAMETER["False easting",500000,LENGTHUNIT["metre",1],ID["EPSG",8806]],PARAMETER["False northing",0,LENGTHUNIT["metre",1],ID["EPSG",8807]]],CS[Cartesian,2],AXIS["(E)",east,ORDER[1],LENGTHUNIT["metre",1]],AXIS["(N)",north,ORDER[2],LENGTHUNIT["metre",1]],USAGE[SCOPE["Engineering survey, topographic mapping."],AREA["Between 90°W and 84°W, northern hemisphere between equator and 84°N, onshore and offshore. Belize. Canada - Manitoba; Nunavut; Ontario. Costa Rica. Cuba. Ecuador - Galapagos. El Salvador. Guatemala. Honduras. Mexico. Nicaragua. United States (USA)."],BBOX[0,-90,84,-84]],ID["EPSG",32616]]</dd><dt><span>semi_major_axis :</span></dt><dd>6378137.0</dd><dt><span>semi_minor_axis :</span></dt><dd>6356752.314245179</dd><dt><span>inverse_flattening :</span></dt><dd>298.257223563</dd><dt><span>reference_ellipsoid_name :</span></dt><dd>WGS 84</dd><dt><span>longitude_of_prime_meridian :</span></dt><dd>0.0</dd><dt><span>prime_meridian_name :</span></dt><dd>Greenwich</dd><dt><span>geographic_crs_name :</span></dt><dd>WGS 84</dd><dt><span>horizontal_datum_name :</span></dt><dd>World Geodetic System 1984 ensemble</dd><dt><span>projected_crs_name :</span></dt><dd>WGS 84 / UTM zone 16N</dd><dt><span>grid_mapping_name :</span></dt><dd>transverse_mercator</dd><dt><span>latitude_of_projection_origin :</span></dt><dd>0.0</dd><dt><span>longitude_of_central_meridian :</span></dt><dd>-87.0</dd><dt><span>false_easting :</span></dt><dd>500000.0</dd><dt><span>false_northing :</span></dt><dd>0.0</dd><dt><span>scale_factor_at_central_meridian :</span></dt><dd>0.9996</dd></dl></div><div class="xr-var-data"><pre>array(32616, dtype=int32)</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span>time</span></div><div class="xr-var-dims">()</div><div class="xr-var-dtype">datetime64[ns]</div><div class="xr-var-preview xr-preview">2022-06-11T16:22:04.584079</div><input id="attrs-ee0a6eda-d788-4416-9c19-0ca5aad59312" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-ee0a6eda-d788-4416-9c19-0ca5aad59312" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-2c03697b-d4c6-4c8e-ad1d-0ef2edf66ac4" class="xr-var-data-in" type="checkbox"><label for="data-2c03697b-d4c6-4c8e-ad1d-0ef2edf66ac4" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array('2022-06-11T16:22:04.584079000', dtype='datetime64[ns]')</pre></div></li></ul></div></li><li class="xr-section-item"><input id="section-6e1a7e98-bc26-4b43-befa-04144f2662dc" class="xr-section-summary-in" type="checkbox"><label for="section-6e1a7e98-bc26-4b43-befa-04144f2662dc" class="xr-section-summary">Indexes: <span>(2)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><ul class="xr-var-list"><li class="xr-var-item"><div class="xr-index-name"><div>y</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-1864d885-fc0b-447e-a358-37ffa3957bbd" class="xr-index-data-in" type="checkbox"><label for="index-1864d885-fc0b-447e-a358-37ffa3957bbd" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Float64Index([4997145.0, 4997115.0, 4997085.0, 4997055.0, 4997025.0], dtype='float64', name='y'))</pre></div></li><li class="xr-var-item"><div class="xr-index-name"><div>x</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-5daca901-b96f-43c9-9b1a-17cb193fe83d" class="xr-index-data-in" type="checkbox"><label for="index-5daca901-b96f-43c9-9b1a-17cb193fe83d" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Float64Index([788055.0, 788085.0, 788115.0, 788145.0, 788175.0], dtype='float64', name='x'))</pre></div></li></ul></div></li><li class="xr-section-item"><input id="section-8c6d9f08-c56b-4f89-ac99-ecdc5c478294" class="xr-section-summary-in" type="checkbox" checked=""><label for="section-8c6d9f08-c56b-4f89-ac99-ecdc5c478294" class="xr-section-summary">Attributes: <span>(1)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><dl class="xr-attrs"><dt><span>nodata :</span></dt><dd>0</dd></dl></div></li></ul></div></div>
</div>
</div>
<div class="cell" data-execution_count="53">
<div class="sourceCode cell-code" id="cb80" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb80-1">celsius <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> temperature <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">273.15</span></span>
<span id="cb80-2">celsius.plot(cmap<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"magma"</span>, size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/algaebloom-part3/index_files/figure-html/cell-54-output-1.png" class="img-fluid"></p>
</div>
</div>
</section>
</section>
<section id="a-good-place-to-stop-for-part-3" class="level1">
<h1>A good place to stop for part 3</h1>
<p>In this part we:</p>
<ul>
<li>Used the STAC API to search for images of interest in MPC based on a target location and target date,</li>
<li>Used GeoPandas to manage the process of filtering out and finding the best image</li>
<li>Used xarray to manually rescale Landsat bands to 0-255 scale.</li>
<li>Got odc-stac installed and updated Landsat cropping procedure</li>
<li>Plotted NDVI and Surface Temperature</li>
</ul>
<p>In the next part, we’ll focus on converting our imagery data to features that can be used in a machine learning model.</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div quarto-reuse="quarto-reuse" class="quarto-appendix-contents"><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</a></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{isken2023,
  author = {Mark Isken},
  title = {Algal Bloom Detection Extended Tutorial - {Part} 3: {Finding}
    Images of Interest},
  date = {2023-01-29},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-isken2023" class="csl-entry quarto-appendix-citeas">
Mark Isken. 2023. <span>“Algal Bloom Detection Extended Tutorial - Part
3: Finding Images of Interest.”</span> January 29, 2023.
</div></div></section></div> ]]></description>
  <category>geospatial</category>
  <category>geonewb</category>
  <category>python</category>
  <category>geopandas</category>
  <category>sentinel-2</category>
  <category>landsat</category>
  <category>planetary-computer</category>
  <guid>https://bitsofanalytics.org/posts/algaebloom-part3/</guid>
  <pubDate>Sun, 29 Jan 2023 05:00:00 GMT</pubDate>
</item>
<item>
  <title>Algal bloom detection extended tutorial - Part 2: Planetary Computer intro</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/algaebloom-part2/</link>
  <description><![CDATA[ 



<p>This is part of the <a href="https://bitsofanalytics.org/posts/geonewb-intro/">geonewb</a> series of posts.</p>
<p>In <a href="https://bitsofanalytics.org/posts/algaebloom-part1/">Part 1</a> of this series we got an overview some of the Python geospatial libraries, created a conda virtual environment and explored the project data associated with the <a href="https://www.drivendata.org/competitions/143/tick-tick-bloom/page/649/">Detecting harmful algal bloom challenge</a>. Now we’ll move on in the tutorial to exploring:</p>
<ul>
<li><strong>acquiring satellite image data</strong> (both Sentinel-2 and Landsat) from <strong>Microsoft’s Planetary Computer</strong>,</li>
<li>build a basic understanding of the <strong>structure and data content</strong> of these satellite created images,</li>
<li>basic <strong>viewing and manipulation of satellite imagery</strong>.</li>
</ul>
<p>As mentioned in the Part 1, I’m following along and taking some deeper dives and various detours from the official <a href="https://drivendata.co/blog/tick-tick-bloom-benchmark">Getting Started Tutorial</a>.</p>
<p>In this part we’ll primarily work with a single image file. In Part 3 we’ll tackle the more general problem of programmatically finding images that intersect some point of interest and that were created with a specific data range (as well as doing some basic GeoPandas work).</p>
<p>In subsequent parts we’ll tackle the <strong>feature engineering</strong> and <strong>predictive modeling</strong> sections of the original tutorial.</p>
<div class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Need to do some date math and need to work with file paths</span></span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> datetime <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> timedelta</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> pathlib <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Path</span></code></pre></div>
</div>
<div class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Big 3 - come on, of course you'll need these</span></span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> matplotlib.pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb2-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb2-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span></code></pre></div>
</div>
<div class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> IPython.display <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Image</span>
<span id="cb3-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> PIL <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Image <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> PILImage</span></code></pre></div>
</div>
<div class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>matplotlib inline</span></code></pre></div>
</div>
<section id="finding-and-acquiring-satellite-imagery-data" class="level1">
<h1>Finding and acquiring satellite imagery data</h1>
<p>Unline many challenges at <a href="https://www.drivendata.org/">DrivenData</a>, the feature data for this challenge is not directly provided. We need to get it using various APIs from specific sources. The date and location for each <code>uid</code> in the metadata can be used to find relevant satellite images from a number of different places. There are four approved data sources and all the details are described on the project home page at the following links:</p>
<ul>
<li><a href="https://www.drivendata.org/competitions/143/tick-tick-bloom/page/650/#sentinel-2-1">Sentinel-2 satellite imagery</a></li>
<li><a href="https://www.drivendata.org/competitions/143/tick-tick-bloom/page/650/#landsat-1">Landsat satellite imagery</a></li>
<li><a href="https://www.drivendata.org/competitions/143/tick-tick-bloom/page/650/#climate-data-1">NOAA’s High-Resolution Rapid Refresh (HRRR) climate data</a></li>
<li><a href="https://www.drivendata.org/competitions/143/tick-tick-bloom/page/650/#elevation-data-1">Copernicus DEM elevation data</a></li>
</ul>
<p>For now we will just focus on finding relevant Sentinel-2 and Landsat data. This <a href="https://www.drivendata.org/competitions/143/tick-tick-bloom/page/650/#satellite-imagery">page in the challenge site</a> has additional information and resources related to retrieving satellite imagery data. From that page you can get a very good high level overview of the different <em>levels</em> of satellite imagery data, top of atmosphere reflectance vs bottom of atmosphere reflectance, atmospheric corrections, spectral bands and algorithmic bands, as well as the relevant links for accessing data from the MPC.</p>
<p>From the main tutorial:</p>
<blockquote class="blockquote">
<p>The general steps we’ll use to pull satellite data are:</p>
<ol type="1">
<li><p>Establish a connection to the Planetary Computer’s STAC API using the planetary_computer and pystac_client Python packages.</p></li>
<li><p>Query the STAC API for scenes that capture our in situ labels. For each sample, we’ll search for imagery that includes the sample’s location (latitude and longitude) around the date the sample was taken. In this benchmark, we’ll use only Sentinel-2 L2A and Landsat Level-2 data.</p></li>
<li><p>Select one image for each sample. We’ll use Sentinel-2 data wherever it is available, because it is higher resolution. We’ll have to use Landsat for data before roughly 2016, because Sentinel-2 was not available yet.</p></li>
<li><p>Convert the image to a 1-dimensional list of features that can be input into our tree model</p></li>
</ol>
</blockquote>
<p>Before trying to grab any data, let’s learn a bit more about Sentinel-2.</p>
</section>
<section id="understanding-sentinel-2-imaging-data" class="level1">
<h1>Understanding Sentinel-2 imaging data</h1>
<p>Start by going to the <a href="https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi/overview">Sentinel Mission Overview page</a> and spend some time reading about the mission and the associated data. You can find even more detail in the <a href="https://sentinel.esa.int/documents/247904/685211/Sentinel-2_User_Handbook">Sentinel 2 - User Handbook</a>.</p>
<div class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">Image(url<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'images/sentinel2_overview.png'</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="5">
<img src="https://bitsofanalytics.org/posts/algaebloom-part2/images/sentinel2_overview.png">
</div>
</div>
<p>After doing a bunch of reading, I still had many questions.</p>
<ul>
<li>How do we do basic things like find images (geoTIFF) files and save the image itself to disk?</li>
<li>What exactly is in this imaging data? There are numerous bands with different types of data and at different resolution levels (size in meters of each cell width).</li>
<li>How do we read this imaging data into data structures amenable to further analysis?</li>
<li>What kinds of analysis can we do with this data?</li>
<li>What existing resources are available for newbies to Sentinel image data analysis?</li>
</ul>
<p>A good place to start is with the Microsoft’s Planetary Computer (MPC) and its Explore feature. We’ll use that to find an image of interest and start to figure out how to work with it.</p>
</section>
<section id="walk-through-for-one-sentinel-2-sample-from-microsofts-planetary-computer" class="level1">
<h1>Walk through for one Sentinel-2 sample from Microsoft’s Planetary Computer</h1>
<p>The <a href="https://planetarycomputer.microsoft.com/docs/overview/about">Planetary Computer</a> has a few different pieces:</p>
<ul>
<li><a href="https://planetarycomputer.microsoft.com/catalog">Data Catalog</a> - It is a ginormous repository of well cataloged data all about Earth’s various systems and includes a web based interface that allows users to find relevant data - for free.</li>
<li><a href="https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac/">API</a> - It has an API that leverages open source tools to make it easy to do data searches by time and location. Focuses on Python.</li>
<li><a href="https://planetarycomputer.microsoft.com/docs/overview/environment/">Hub</a> - A managed compute environment for doing cloud based geospatial analysis at scale. For this part you need to apply for access.</li>
<li><a href="https://planetarycomputer.microsoft.com/applications">Applications</a> - an ecosystem of people doing meaningful work with the Planetary Computer.</li>
</ul>
<p>You can manually explore the data catalogs by going to <a href="https://planetarycomputer.microsoft.com/explore">https://planetarycomputer.microsoft.com/explore</a>. You can zoom on the map, specify a dataset collection, do some filtering, and get a list of items. You can view an item’s metadata through the online interface, but to get at the actual <em>assets</em>, you need to use the <a href="https://pystac.readthedocs.io/en/stable/api.html">PyStac API</a>. For Sentinel data, the assets include the <a href="https://www.cogeo.org/">COGs (cloud optimized GeoTIFF)</a> associated with each spectral band’s reflectance level for each pixel measured by the satellite. There is additional metadata at the item level and the assets level. The actual image related data is stored in Azure blobs and an <code>href</code> property is included in each COG.</p>
<p>First, let’s see how to grab a specific item from MPC if you know the item URL. How would you know that? You can use the online MPC catalog explorer to find an item you are interested in and then ask for a code snippet. For example, I zoomed in around the eastern coast of Michigan near Thunder Bay. Then in the left side menu, select the Sentinel-2A catalog, and a list of images was found sorted in decreasing date order - you can also filter by date range. As you scroll through the search results you’ll start to get a sense of some of the challenges associated with imagery data - like clouds!</p>
<p>I picked one from early November, 2022 that looked relatively cloud free. After selecting an image item, you’ll be able to click on the squiqqly brackets icon to get a Python code snippet for accessing this item via the PyStac API. Here’s the snippet:</p>
<div class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pystac</span>
<span id="cb6-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> planetary_computer</span>
<span id="cb6-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rioxarray</span>
<span id="cb6-4"></span>
<span id="cb6-5">item_url <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-2-l2a/items/S2B_MSIL2A_20221103T163439_R083_T16TGQ_20221104T085305"</span></span>
<span id="cb6-6"></span>
<span id="cb6-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load the individual item metadata and sign the assets</span></span>
<span id="cb6-8">item <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pystac.Item.from_file(item_url)</span>
<span id="cb6-9">signed_item <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> planetary_computer.sign(item)</span>
<span id="cb6-10"></span>
<span id="cb6-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Open one of the data assets (other asset keys to use: 'B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B09', 'B11', 'B12', 'B8A', 'SCL', 'WVP', 'visual')</span></span>
<span id="cb6-12">asset_href <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> signed_item.assets[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AOT"</span>].href</span>
<span id="cb6-13">ds <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rioxarray.open_rasterio(asset_href)</span>
<span id="cb6-14">ds</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="6">

<div><svg style="position: absolute; width: 0; height: 0; overflow: hidden">
<defs>
<symbol id="icon-database" viewbox="0 0 32 32">
<path d="M16 0c-8.837 0-16 2.239-16 5v4c0 2.761 7.163 5 16 5s16-2.239 16-5v-4c0-2.761-7.163-5-16-5z"></path>
<path d="M16 17c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z"></path>
<path d="M16 26c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z"></path>
</symbol>
<symbol id="icon-file-text2" viewbox="0 0 32 32">
<path d="M28.681 7.159c-0.694-0.947-1.662-2.053-2.724-3.116s-2.169-2.030-3.116-2.724c-1.612-1.182-2.393-1.319-2.841-1.319h-15.5c-1.378 0-2.5 1.121-2.5 2.5v27c0 1.378 1.122 2.5 2.5 2.5h23c1.378 0 2.5-1.122 2.5-2.5v-19.5c0-0.448-0.137-1.23-1.319-2.841zM24.543 5.457c0.959 0.959 1.712 1.825 2.268 2.543h-4.811v-4.811c0.718 0.556 1.584 1.309 2.543 2.268zM28 29.5c0 0.271-0.229 0.5-0.5 0.5h-23c-0.271 0-0.5-0.229-0.5-0.5v-27c0-0.271 0.229-0.5 0.5-0.5 0 0 15.499-0 15.5 0v7c0 0.552 0.448 1 1 1h7v19.5z"></path>
<path d="M23 26h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
<path d="M23 22h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
<path d="M23 18h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z"></path>
</symbol>
</defs>
</svg>
<style>/* CSS stylesheet for displaying xarray objects in jupyterlab.
 *
 */

:root {
  --xr-font-color0: var(--jp-content-font-color0, rgba(0, 0, 0, 1));
  --xr-font-color2: var(--jp-content-font-color2, rgba(0, 0, 0, 0.54));
  --xr-font-color3: var(--jp-content-font-color3, rgba(0, 0, 0, 0.38));
  --xr-border-color: var(--jp-border-color2, #e0e0e0);
  --xr-disabled-color: var(--jp-layout-color3, #bdbdbd);
  --xr-background-color: var(--jp-layout-color0, white);
  --xr-background-color-row-even: var(--jp-layout-color1, white);
  --xr-background-color-row-odd: var(--jp-layout-color2, #eeeeee);
}

html[theme=dark],
body[data-theme=dark],
body.vscode-dark {
  --xr-font-color0: rgba(255, 255, 255, 1);
  --xr-font-color2: rgba(255, 255, 255, 0.54);
  --xr-font-color3: rgba(255, 255, 255, 0.38);
  --xr-border-color: #1F1F1F;
  --xr-disabled-color: #515151;
  --xr-background-color: #111111;
  --xr-background-color-row-even: #111111;
  --xr-background-color-row-odd: #313131;
}

.xr-wrap {
  display: block !important;
  min-width: 300px;
  max-width: 700px;
}

.xr-text-repr-fallback {
  /* fallback to plain text repr when CSS is not injected (untrusted notebook) */
  display: none;
}

.xr-header {
  padding-top: 6px;
  padding-bottom: 6px;
  margin-bottom: 4px;
  border-bottom: solid 1px var(--xr-border-color);
}

.xr-header > div,
.xr-header > ul {
  display: inline;
  margin-top: 0;
  margin-bottom: 0;
}

.xr-obj-type,
.xr-array-name {
  margin-left: 2px;
  margin-right: 10px;
}

.xr-obj-type {
  color: var(--xr-font-color2);
}

.xr-sections {
  padding-left: 0 !important;
  display: grid;
  grid-template-columns: 150px auto auto 1fr 20px 20px;
}

.xr-section-item {
  display: contents;
}

.xr-section-item input {
  display: none;
}

.xr-section-item input + label {
  color: var(--xr-disabled-color);
}

.xr-section-item input:enabled + label {
  cursor: pointer;
  color: var(--xr-font-color2);
}

.xr-section-item input:enabled + label:hover {
  color: var(--xr-font-color0);
}

.xr-section-summary {
  grid-column: 1;
  color: var(--xr-font-color2);
  font-weight: 500;
}

.xr-section-summary > span {
  display: inline-block;
  padding-left: 0.5em;
}

.xr-section-summary-in:disabled + label {
  color: var(--xr-font-color2);
}

.xr-section-summary-in + label:before {
  display: inline-block;
  content: '►';
  font-size: 11px;
  width: 15px;
  text-align: center;
}

.xr-section-summary-in:disabled + label:before {
  color: var(--xr-disabled-color);
}

.xr-section-summary-in:checked + label:before {
  content: '▼';
}

.xr-section-summary-in:checked + label > span {
  display: none;
}

.xr-section-summary,
.xr-section-inline-details {
  padding-top: 4px;
  padding-bottom: 4px;
}

.xr-section-inline-details {
  grid-column: 2 / -1;
}

.xr-section-details {
  display: none;
  grid-column: 1 / -1;
  margin-bottom: 5px;
}

.xr-section-summary-in:checked ~ .xr-section-details {
  display: contents;
}

.xr-array-wrap {
  grid-column: 1 / -1;
  display: grid;
  grid-template-columns: 20px auto;
}

.xr-array-wrap > label {
  grid-column: 1;
  vertical-align: top;
}

.xr-preview {
  color: var(--xr-font-color3);
}

.xr-array-preview,
.xr-array-data {
  padding: 0 5px !important;
  grid-column: 2;
}

.xr-array-data,
.xr-array-in:checked ~ .xr-array-preview {
  display: none;
}

.xr-array-in:checked ~ .xr-array-data,
.xr-array-preview {
  display: inline-block;
}

.xr-dim-list {
  display: inline-block !important;
  list-style: none;
  padding: 0 !important;
  margin: 0;
}

.xr-dim-list li {
  display: inline-block;
  padding: 0;
  margin: 0;
}

.xr-dim-list:before {
  content: '(';
}

.xr-dim-list:after {
  content: ')';
}

.xr-dim-list li:not(:last-child):after {
  content: ',';
  padding-right: 5px;
}

.xr-has-index {
  font-weight: bold;
}

.xr-var-list,
.xr-var-item {
  display: contents;
}

.xr-var-item > div,
.xr-var-item label,
.xr-var-item > .xr-var-name span {
  background-color: var(--xr-background-color-row-even);
  margin-bottom: 0;
}

.xr-var-item > .xr-var-name:hover span {
  padding-right: 5px;
}

.xr-var-list > li:nth-child(odd) > div,
.xr-var-list > li:nth-child(odd) > label,
.xr-var-list > li:nth-child(odd) > .xr-var-name span {
  background-color: var(--xr-background-color-row-odd);
}

.xr-var-name {
  grid-column: 1;
}

.xr-var-dims {
  grid-column: 2;
}

.xr-var-dtype {
  grid-column: 3;
  text-align: right;
  color: var(--xr-font-color2);
}

.xr-var-preview {
  grid-column: 4;
}

.xr-index-preview {
  grid-column: 2 / 5;
  color: var(--xr-font-color2);
}

.xr-var-name,
.xr-var-dims,
.xr-var-dtype,
.xr-preview,
.xr-attrs dt {
  white-space: nowrap;
  overflow: hidden;
  text-overflow: ellipsis;
  padding-right: 10px;
}

.xr-var-name:hover,
.xr-var-dims:hover,
.xr-var-dtype:hover,
.xr-attrs dt:hover {
  overflow: visible;
  width: auto;
  z-index: 1;
}

.xr-var-attrs,
.xr-var-data,
.xr-index-data {
  display: none;
  background-color: var(--xr-background-color) !important;
  padding-bottom: 5px !important;
}

.xr-var-attrs-in:checked ~ .xr-var-attrs,
.xr-var-data-in:checked ~ .xr-var-data,
.xr-index-data-in:checked ~ .xr-index-data {
  display: block;
}

.xr-var-data > table {
  float: right;
}

.xr-var-name span,
.xr-var-data,
.xr-index-name div,
.xr-index-data,
.xr-attrs {
  padding-left: 25px !important;
}

.xr-attrs,
.xr-var-attrs,
.xr-var-data,
.xr-index-data {
  grid-column: 1 / -1;
}

dl.xr-attrs {
  padding: 0;
  margin: 0;
  display: grid;
  grid-template-columns: 125px auto;
}

.xr-attrs dt,
.xr-attrs dd {
  padding: 0;
  margin: 0;
  float: left;
  padding-right: 10px;
  width: auto;
}

.xr-attrs dt {
  font-weight: normal;
  grid-column: 1;
}

.xr-attrs dt:hover span {
  display: inline-block;
  background: var(--xr-background-color);
  padding-right: 10px;
}

.xr-attrs dd {
  grid-column: 2;
  white-space: pre-wrap;
  word-break: break-all;
}

.xr-icon-database,
.xr-icon-file-text2,
.xr-no-icon {
  display: inline-block;
  vertical-align: middle;
  width: 1em;
  height: 1.5em !important;
  stroke-width: 0;
  stroke: currentColor;
  fill: currentColor;
}
</style><pre class="xr-text-repr-fallback">&lt;xarray.DataArray (band: 1, y: 10980, x: 10980)&gt;
[120560400 values with dtype=uint16]
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 7e+05 7e+05 7e+05 ... 8.097e+05 8.097e+05 8.098e+05
  * y            (y) float64 5e+06 5e+06 5e+06 ... 4.89e+06 4.89e+06 4.89e+06
    spatial_ref  int64 0
Attributes:
    AREA_OR_POINT:  Area
    _FillValue:     0
    scale_factor:   1.0
    add_offset:     0.0</pre><div class="xr-wrap" style="display:none"><div class="xr-header"><div class="xr-obj-type">xarray.DataArray</div><div class="xr-array-name"></div><ul class="xr-dim-list"><li><span class="xr-has-index">band</span>: 1</li><li><span class="xr-has-index">y</span>: 10980</li><li><span class="xr-has-index">x</span>: 10980</li></ul></div><ul class="xr-sections"><li class="xr-section-item"><div class="xr-array-wrap"><input id="section-72ad2e75-5775-43da-947c-f99559be2cfe" class="xr-array-in" type="checkbox" checked=""><label for="section-72ad2e75-5775-43da-947c-f99559be2cfe" title="Show/hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-array-preview xr-preview"><span>...</span></div><div class="xr-array-data"><pre>[120560400 values with dtype=uint16]</pre></div></div></li><li class="xr-section-item"><input id="section-a8de3853-4983-48ff-a0f2-a30090e24bb6" class="xr-section-summary-in" type="checkbox" checked=""><label for="section-a8de3853-4983-48ff-a0f2-a30090e24bb6" class="xr-section-summary">Coordinates: <span>(4)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><ul class="xr-var-list"><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">band</span></div><div class="xr-var-dims">(band)</div><div class="xr-var-dtype">int64</div><div class="xr-var-preview xr-preview">1</div><input id="attrs-5459c199-0b75-4c16-9388-12877b393a01" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-5459c199-0b75-4c16-9388-12877b393a01" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-9cd5f3a9-0c7a-4c77-9420-beec1117a2e9" class="xr-var-data-in" type="checkbox"><label for="data-9cd5f3a9-0c7a-4c77-9420-beec1117a2e9" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array([1])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">x</span></div><div class="xr-var-dims">(x)</div><div class="xr-var-dtype">float64</div><div class="xr-var-preview xr-preview">7e+05 7e+05 ... 8.097e+05 8.098e+05</div><input id="attrs-0c210a6b-666a-4539-93b2-3f9991165898" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-0c210a6b-666a-4539-93b2-3f9991165898" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-2e1273cb-7148-4ffc-b464-337be4ac0867" class="xr-var-data-in" type="checkbox"><label for="data-2e1273cb-7148-4ffc-b464-337be4ac0867" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array([699965., 699975., 699985., ..., 809735., 809745., 809755.])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span class="xr-has-index">y</span></div><div class="xr-var-dims">(y)</div><div class="xr-var-dtype">float64</div><div class="xr-var-preview xr-preview">5e+06 5e+06 ... 4.89e+06 4.89e+06</div><input id="attrs-6a713549-fd70-4441-b4c4-a3c3bbe8c9bb" class="xr-var-attrs-in" type="checkbox" disabled=""><label for="attrs-6a713549-fd70-4441-b4c4-a3c3bbe8c9bb" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-5bdff61c-495b-45e9-8c93-0efa28da885b" class="xr-var-data-in" type="checkbox"><label for="data-5bdff61c-495b-45e9-8c93-0efa28da885b" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"></dl></div><div class="xr-var-data"><pre>array([5000035., 5000025., 5000015., ..., 4890265., 4890255., 4890245.])</pre></div></li><li class="xr-var-item"><div class="xr-var-name"><span>spatial_ref</span></div><div class="xr-var-dims">()</div><div class="xr-var-dtype">int64</div><div class="xr-var-preview xr-preview">0</div><input id="attrs-050ea770-f9f4-4cfd-9904-8f62250cfe6c" class="xr-var-attrs-in" type="checkbox"><label for="attrs-050ea770-f9f4-4cfd-9904-8f62250cfe6c" title="Show/Hide attributes"><svg class="icon xr-icon-file-text2"><use href="#icon-file-text2"></use></svg></label><input id="data-8e89235e-90f3-4504-a84b-d62a6fd39f64" class="xr-var-data-in" type="checkbox"><label for="data-8e89235e-90f3-4504-a84b-d62a6fd39f64" title="Show/Hide data repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-var-attrs"><dl class="xr-attrs"><dt><span>crs_wkt :</span></dt><dd>PROJCS["WGS 84 / UTM zone 16N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-87],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32616"]]</dd><dt><span>semi_major_axis :</span></dt><dd>6378137.0</dd><dt><span>semi_minor_axis :</span></dt><dd>6356752.314245179</dd><dt><span>inverse_flattening :</span></dt><dd>298.257223563</dd><dt><span>reference_ellipsoid_name :</span></dt><dd>WGS 84</dd><dt><span>longitude_of_prime_meridian :</span></dt><dd>0.0</dd><dt><span>prime_meridian_name :</span></dt><dd>Greenwich</dd><dt><span>geographic_crs_name :</span></dt><dd>WGS 84</dd><dt><span>horizontal_datum_name :</span></dt><dd>World Geodetic System 1984</dd><dt><span>projected_crs_name :</span></dt><dd>WGS 84 / UTM zone 16N</dd><dt><span>grid_mapping_name :</span></dt><dd>transverse_mercator</dd><dt><span>latitude_of_projection_origin :</span></dt><dd>0.0</dd><dt><span>longitude_of_central_meridian :</span></dt><dd>-87.0</dd><dt><span>false_easting :</span></dt><dd>500000.0</dd><dt><span>false_northing :</span></dt><dd>0.0</dd><dt><span>scale_factor_at_central_meridian :</span></dt><dd>0.9996</dd><dt><span>spatial_ref :</span></dt><dd>PROJCS["WGS 84 / UTM zone 16N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-87],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32616"]]</dd><dt><span>GeoTransform :</span></dt><dd>699960.0 10.0 0.0 5000040.0 0.0 -10.0</dd></dl></div><div class="xr-var-data"><pre>array(0)</pre></div></li></ul></div></li><li class="xr-section-item"><input id="section-041d7b2b-316b-455d-aed4-5267a2e23856" class="xr-section-summary-in" type="checkbox"><label for="section-041d7b2b-316b-455d-aed4-5267a2e23856" class="xr-section-summary">Indexes: <span>(3)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><ul class="xr-var-list"><li class="xr-var-item"><div class="xr-index-name"><div>band</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-ac66cd82-a65d-4b2c-98f9-b95ba12b1267" class="xr-index-data-in" type="checkbox"><label for="index-ac66cd82-a65d-4b2c-98f9-b95ba12b1267" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Int64Index([1], dtype='int64', name='band'))</pre></div></li><li class="xr-var-item"><div class="xr-index-name"><div>x</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-3d6674bf-fcbe-419d-8518-f1187494bbe8" class="xr-index-data-in" type="checkbox"><label for="index-3d6674bf-fcbe-419d-8518-f1187494bbe8" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Float64Index([699965.0, 699975.0, 699985.0, 699995.0, 700005.0, 700015.0,
              700025.0, 700035.0, 700045.0, 700055.0,
              ...
              809665.0, 809675.0, 809685.0, 809695.0, 809705.0, 809715.0,
              809725.0, 809735.0, 809745.0, 809755.0],
             dtype='float64', name='x', length=10980))</pre></div></li><li class="xr-var-item"><div class="xr-index-name"><div>y</div></div><div class="xr-index-preview">PandasIndex</div><div></div><input id="index-7487bf52-26af-4600-9506-1d1795489dae" class="xr-index-data-in" type="checkbox"><label for="index-7487bf52-26af-4600-9506-1d1795489dae" title="Show/Hide index repr"><svg class="icon xr-icon-database"><use href="#icon-database"></use></svg></label><div class="xr-index-data"><pre>PandasIndex(Float64Index([5000035.0, 5000025.0, 5000015.0, 5000005.0, 4999995.0, 4999985.0,
              4999975.0, 4999965.0, 4999955.0, 4999945.0,
              ...
              4890335.0, 4890325.0, 4890315.0, 4890305.0, 4890295.0, 4890285.0,
              4890275.0, 4890265.0, 4890255.0, 4890245.0],
             dtype='float64', name='y', length=10980))</pre></div></li></ul></div></li><li class="xr-section-item"><input id="section-71b5e9d2-1c0d-4ed4-8967-c086a52b3cdb" class="xr-section-summary-in" type="checkbox" checked=""><label for="section-71b5e9d2-1c0d-4ed4-8967-c086a52b3cdb" class="xr-section-summary">Attributes: <span>(4)</span></label><div class="xr-section-inline-details"></div><div class="xr-section-details"><dl class="xr-attrs"><dt><span>AREA_OR_POINT :</span></dt><dd>Area</dd><dt><span>_FillValue :</span></dt><dd>0</dd><dt><span>scale_factor :</span></dt><dd>1.0</dd><dt><span>add_offset :</span></dt><dd>0.0</dd></dl></div></li></ul></div></div>
</div>
</div>
<p>There’s much to be learned from a careful look at this code. First, the imports.</p>
<ul>
<li><code>pystac</code> - we need this work work with MPC’s STAC API</li>
<li><code>planetary_computer</code> - even though MPC allows free access via the STAC API, we need this library in order to <em>sign</em> the item we are trying to retrieve. As we’ll see later, this results in a long token string getting appended to the item URL. More on signing later.</li>
<li><code>rioxarray</code> - as we saw in <a href="https://bitsofanalytics.org/posts/algaebloom-part1/#rioxarray---read-raster-data-into-xarray-objects">this section of Part 1</a>, rioxarray is needed to open the actual raster image file and tuck it into an xarray <code>DataArray</code>.</li>
</ul>
<p>Now for the <code>item_url</code>. It looks like this:</p>
<pre><code>https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-2-l2a/items/
S2B_MSIL2A_20221103T163439_R083_T16TGQ_20221104T085305</code></pre>
<p>We see from the URL that:</p>
<ul>
<li>we are going to be using the STAC API</li>
<li>this image is from the Sentinel-2 mission</li>
<li>this is Level 2A data</li>
<li>this particular image is one of a larger collection of image items</li>
<li>the particular image has a unique id of <code>S2B_MSIL2A_20221103T163439_R083_T16TGQ_20221104T085305</code></li>
</ul>
<p>Notice that the <em>datatake sensing time</em> (a date and time) are embedded in the <code>id</code>. The <code>id</code> naming conventions are explained at <a href="https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi/naming-convention">https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi/naming-convention</a> which also includes links to detailed product specification pages.</p>
<p>The <code>S2B</code> is the <em>mission id</em> and the <code>MSIL2A</code> is:</p>
<blockquote class="blockquote">
<p>MSIL1C denotes the Level-1C product level/ MSIL2A denotes the Level-2A product level</p>
</blockquote>
<p>The <code>R083</code> is is the <em>relative orbit number</em> and the <code>T16TGQ</code> is a <em>tile number field</em> and the second datetime is:</p>
<blockquote class="blockquote">
<p>The second date is the &lt;Product Discriminator&gt; field, which is 15 characters in length, and is used to distinguish between different end user products from the same datatake. Depending on the instance, the time in this field can be earlier or slightly later than the datatake sensing time.</p>
</blockquote>
<p>Grab the image item and sign it.</p>
<div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1">item <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pystac.Item.from_file(item_url)</span>
<span id="cb8-2">signed_item <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> planetary_computer.sign(item)</span></code></pre></div>
<p>We are just passing the URL for the item we want and using <code>pystac</code> to get it and <code>planetary_computer</code> to sign it. If it’s not signed, we won’t be able to access its details. Let’s explore this PyStac <code>Item</code> object.</p>
<div class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(signed_item)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;Item id=S2B_MSIL2A_20221103T163439_R083_T16TGQ_20221104T085305&gt;</code></pre>
</div>
</div>
<p>What about the attributes of an <code>Item</code>?</p>
<div class="cell" data-tags="[]" data-execution_count="8">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1">[att <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> att <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dir</span>(signed_item) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'__'</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> att]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="8">
<pre><code>['STAC_OBJECT_TYPE',
 '_abc_impl',
 '_object_links',
 '_repr_html_',
 'add_asset',
 'add_link',
 'add_links',
 'assets',
 'bbox',
 'clear_links',
 'clone',
 'collection_id',
 'common_metadata',
 'datetime',
 'extra_fields',
 'from_dict',
 'from_file',
 'full_copy',
 'geometry',
 'get_assets',
 'get_collection',
 'get_datetime',
 'get_links',
 'get_parent',
 'get_root',
 'get_root_link',
 'get_self_href',
 'get_single_link',
 'get_stac_objects',
 'id',
 'links',
 'make_asset_hrefs_absolute',
 'make_asset_hrefs_relative',
 'matches_object_type',
 'properties',
 'remove_links',
 'resolve_links',
 'save_object',
 'self_href',
 'set_collection',
 'set_datetime',
 'set_parent',
 'set_root',
 'set_self_href',
 'stac_extensions',
 'to_dict',
 'validate']</code></pre>
</div>
</div>
<p>Let’s check out a few basic things.</p>
<div class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'The item is id </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>signed_item<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">id</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb13-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'The bounding box for this item is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>signed_item<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>bbox<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>The item is id S2B_MSIL2A_20221103T163439_R083_T16TGQ_20221104T085305
The bounding box for this item is [-84.50015, 44.09977092, -83.0639, 45.12552488]</code></pre>
</div>
</div>
<div class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1">signed_item.properties</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="10">
<pre><code>{'datetime': '2022-11-03T16:34:39.024000Z',
 'platform': 'Sentinel-2B',
 'proj:epsg': 32616,
 'instruments': ['msi'],
 's2:mgrs_tile': '16TGQ',
 'constellation': 'Sentinel 2',
 's2:granule_id': 'S2B_OPER_MSI_L2A_TL_ESRI_20221104T085306_A029567_T16TGQ_N04.00',
 'eo:cloud_cover': 8.857534,
 's2:datatake_id': 'GS2B_20221103T163439_029567_N04.00',
 's2:product_uri': 'S2B_MSIL2A_20221103T163439_N0400_R083_T16TGQ_20221104T085305.SAFE',
 's2:datastrip_id': 'S2B_OPER_MSI_L2A_DS_ESRI_20221104T085306_S20221103T163800_N04.00',
 's2:product_type': 'S2MSI2A',
 'sat:orbit_state': 'descending',
 's2:datatake_type': 'INS-NOBS',
 's2:generation_time': '2022-11-04T08:53:05.399641Z',
 'sat:relative_orbit': 83,
 's2:water_percentage': 21.674606,
 's2:mean_solar_zenith': 60.4296636848309,
 's2:mean_solar_azimuth': 169.548110700749,
 's2:processing_baseline': '04.00',
 's2:snow_ice_percentage': 0.000169,
 's2:vegetation_percentage': 35.986048,
 's2:thin_cirrus_percentage': 7.807394,
 's2:cloud_shadow_percentage': 0.276844,
 's2:nodata_pixel_percentage': 0.00358,
 's2:unclassified_percentage': 0.003371,
 's2:dark_features_percentage': 0.002027,
 's2:not_vegetated_percentage': 33.199403,
 's2:degraded_msi_data_percentage': 0.0027,
 's2:high_proba_clouds_percentage': 0.065832,
 's2:reflectance_conversion_factor': 1.0148973472866,
 's2:medium_proba_clouds_percentage': 0.984309,
 's2:saturated_defective_pixel_percentage': 0.0}</code></pre>
</div>
</div>
<p>The <code>'proj:epsg': 32616</code> corresponds to the WGS 84 / UTM zone 16N coordinate reference system. See <a href="https://epsg.io/32616">https://epsg.io/32616</a>. The units are in meters.</p>
<p>The actual data we are after lives in the <code>assets</code> dictionary. Before diving in, let’s see what assets are available.</p>
<div class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> asset_key, asset <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> signed_item.assets.items():</span>
<span id="cb17-2">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>asset_key<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&lt;25}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> - </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>asset<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>title<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>AOT                       - Aerosol optical thickness (AOT)
B01                       - Band 1 - Coastal aerosol - 60m
B02                       - Band 2 - Blue - 10m
B03                       - Band 3 - Green - 10m
B04                       - Band 4 - Red - 10m
B05                       - Band 5 - Vegetation red edge 1 - 20m
B06                       - Band 6 - Vegetation red edge 2 - 20m
B07                       - Band 7 - Vegetation red edge 3 - 20m
B08                       - Band 8 - NIR - 10m
B09                       - Band 9 - Water vapor - 60m
B11                       - Band 11 - SWIR (1.6) - 20m
B12                       - Band 12 - SWIR (2.2) - 20m
B8A                       - Band 8A - Vegetation red edge 4 - 20m
SCL                       - Scene classfication map (SCL)
WVP                       - Water vapour (WVP)
visual                    - True color image
preview                   - Thumbnail
safe-manifest             - SAFE manifest
granule-metadata          - Granule metadata
inspire-metadata          - INSPIRE metadata
product-metadata          - Product metadata
datastrip-metadata        - Datastrip metadata
tilejson                  - TileJSON with default rendering
rendered_preview          - Rendered preview</code></pre>
</div>
</div>
<p>More about these bands from the original tutorial:</p>
<blockquote class="blockquote">
<p>We have visible bands (red, green, and blue), as well as a number of other spectral ranges and a few algorithmic bands. The Sentinel-2 <a href="https://docs.sentinel-hub.com/api/latest/data/sentinel-2-l2a/#available-bands-and-data">mission guide</a> has more details about what these bands are and how to use them!</p>
<p>A few of the <a href="https://sentinels.copernicus.eu/web/sentinel/technical-guides/sentinel-2-msi/level-2a/algorithm">algorithmic bands</a> that may be useful are:</p>
<ul>
<li><a href="https://docs.sentinel-hub.com/api/latest/data/sentinel-2-l2a/#units">Scene classification (SCL)</a>: The scene classification band sorts pixels into categories including water, high cloud probability, medium &gt; cloud probability, and vegetation. Water pixels could be used to calculate the size of a given water body, which impacts the behavior of blooms. Vegetation can indicate non-toxic marine life like sea grass that sometimes resembles cyanobacteria.</li>
</ul>
</blockquote>
<p><strong>NOTE</strong> The tutorial then mentions the CLM band, but this no longer seems to exist.</p>
<p>The SLC, AOT (haze), and WVP are considered <em>Quality Assurance</em> bands and can be useful in filtering out low quality images. See <a href="https://docs.digitalearthafrica.org/en/latest/data_specs/Sentinel-2_Level-2A_specs.html">https://docs.digitalearthafrica.org/en/latest/data_specs/Sentinel-2_Level-2A_specs.html</a> for a nice summary of these.</p>
<p>Here are the SCL values (color values from part of the <a href="https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR#image-properties">Google Earth Engine docs</a>)</p>
<pre><code>    Value   Color   Description
    1       ff0004  Saturated or defective
    2       868686  Dark Area Pixels
    3       774b0a  Cloud Shadows
    4       10d22c  Vegetation
    5       ffff52  Bare Soils
    6       0000ff  Water
    7       818181  Clouds Low Probability / Unclassified
    8       c0c0c0  Clouds Medium Probability
    9       f1f1f1  Clouds High Probability
    10      bac5eb  Cirrus
    11      52fff9  Snow / Ice</code></pre>
<p>As we’ll soon see, the <code>visual</code> band contains the red, green, and blue bands (not surprising). The <code>rendered_preview</code> is a png file. So, that is pretty straightforward to view using <code>IPython.display.Image</code>.</p>
<div class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb20-1">Image(url<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>signed_item.assets[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rendered_preview'</span>].href)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="12">
<img src="https://planetarycomputer.microsoft.com/api/data/v1/item/preview.png?collection=sentinel-2-l2a&amp;item=S2B_MSIL2A_20221103T163439_R083_T16TGQ_20221104T085305&amp;assets=visual&amp;asset_bidx=visual%7C1%2C2%2C3&amp;nodata=0&amp;format=png">
</div>
</div>
<section id="cropping-or-clipping-the-image" class="level2">
<h2 class="anchored" data-anchor-id="cropping-or-clipping-the-image">Cropping or clipping the image</h2>
<p>Before taking a closer look at the details of specific items in the<code>assets</code> dictionary, let’s see how we can crop an item asset using a bounding box. Often we’ll want to do this because we are interested in some small portion of the overall image.</p>
<p>The geopy library has a distance related function that we can use to find a bounding box of a specific size.</p>
<div class="cell" data-tags="[]" data-execution_count="13">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb21-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> geopy.distance <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> distance</span></code></pre></div>
</div>
<p>Here’s a short example from the geopy docs.</p>
<div class="cell" data-execution_count="14">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb22-1">newport_ri <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">41.49008</span>, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">71.312796</span>)</span>
<span id="cb22-2">cleveland_oh <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">41.499498</span>, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">81.695391</span>)</span>
<span id="cb22-3"></span>
<span id="cb22-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use default geodesic distance</span></span>
<span id="cb22-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'newport to cleveland (miles): </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>distance<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>distance(newport_ri, cleveland_oh)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>miles<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb22-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'newport to cleveland (km): </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>distance<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>distance(newport_ri, cleveland_oh)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>km<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb22-7"></span>
<span id="cb22-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use great_circle distance</span></span>
<span id="cb22-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'newport to cleveland (miles) using great_circle: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>distance<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>great_circle(newport_ri, cleveland_oh)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>miles<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb22-10"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'newport to cleveland (km) using great_circle: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>distance<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>great_circle(newport_ri, cleveland_oh)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>km<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb22-11"></span>
<span id="cb22-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use different elipsiod</span></span>
<span id="cb22-13"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'newport to cleveland (miles) ellipsoid=GRS-80: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>distance<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>distance(newport_ri, cleveland_oh, ellipsoid<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"GRS-80"</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>miles<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb22-14"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'newport to cleveland (km) ellipsoid=GRS-80: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>distance<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>distance(newport_ri, cleveland_oh, ellipsoid<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"GRS-80"</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>km<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>newport to cleveland (miles): 538.3904453677203
newport to cleveland (km): 866.4554329098685

newport to cleveland (miles) using great_circle: 536.9979906964344
newport to cleveland (km) using great_circle: 864.2144943393625

newport to cleveland (miles) ellipsoid=GRS-80: 538.3904453716029
newport to cleveland (km) ellipsoid=GRS-80: 866.4554329161169</code></pre>
</div>
</div>
<p>Let’s create a function to make it easy to define a bounding box around a lat, long pair. This code is right from the <a href="https://drivendata.co/blog/tick-tick-bloom-benchmark">algae bloom tutorial</a>.</p>
<div class="cell" data-execution_count="15">
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb24-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># get our bounding box to search latitude and longitude coordinates</span></span>
<span id="cb24-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> get_bounding_box(latitude, longitude, meter_buffer<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>):</span>
<span id="cb24-3">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb24-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Given a latitude, longitude, and buffer in meters, returns a bounding</span></span>
<span id="cb24-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    box around the point with the buffer on the left, right, top, and bottom.</span></span>
<span id="cb24-6"></span>
<span id="cb24-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Returns a list of [minx, miny, maxx, maxy]</span></span>
<span id="cb24-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    """</span></span>
<span id="cb24-9">    distance_search <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> distance.distance(meters<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>meter_buffer)</span>
<span id="cb24-10"></span>
<span id="cb24-11">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># calculate the lat/long bounds based on ground distance</span></span>
<span id="cb24-12">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># bearings are cardinal directions to move (south, west, north, and east)</span></span>
<span id="cb24-13">    min_lat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> distance_search.destination((latitude, longitude), bearing<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">180</span>)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb24-14">    min_long <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> distance_search.destination((latitude, longitude), bearing<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">270</span>)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb24-15">    max_lat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> distance_search.destination((latitude, longitude), bearing<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb24-16">    max_long <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> distance_search.destination((latitude, longitude), bearing<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">90</span>)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb24-17"></span>
<span id="cb24-18">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> [min_long, min_lat, max_long, max_lat]</span></code></pre></div>
</div>
<div class="cell" data-execution_count="16">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb25-1">lat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">45.0393354</span></span>
<span id="cb25-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">long</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">83.3037169</span></span></code></pre></div>
</div>
<div class="cell" data-execution_count="17">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb26-1">bbox <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_bounding_box(lat, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">long</span>, meter_buffer<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>)</span>
<span id="cb26-2">bbox</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="17">
<pre><code>[-83.31640840401764, 45.030337128781625, -83.29102539598236, 45.04833365698021]</code></pre>
</div>
</div>
<p>We need to pick a specific item asset to crop - let’s do the <code>visual</code> asset. It’s the true color image and contains red, blue and green spectral bands. This is where we need the <a href="https://corteva.github.io/rioxarray/stable/">rioxarray</a> library to read in the raster file which is available from MPC using the asset’s <code>href</code> attribute.</p>
<p>Use <code>rioxarray.open_rasterio</code> along with <code>rio.clip_box</code> to clip (crop) the image. The <code>open_rasterio</code> function is actually wrapping functionality from the <a href="https://rasterio.readthedocs.io/en/latest/">rasterio library</a> and <code>rio</code> is an <em>accessor</em> which let’s us get at <code>rioxarray</code> functions. Let’s just get this working and then we can create a function we can reuse.</p>
<p>For the coordinate reference system we specify <code>"EPSG:4326"</code> - the WGS84 standard. Now, I’m a little confused because if you look above at the <code>signed_item</code> properties, you’ll find <code>'proj:epsg': 32616,</code>. EPSG:32616 is a specific UTM zone (16N) within WGS-84. But, if we specify this EPSG code below, we are told that our bbox doesn’t intersect the image. Clearly I need to learn more about coordinate reference systems.</p>
<div class="cell" data-tags="[]" data-execution_count="18">
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb28-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Specify which asset we want</span></span>
<span id="cb28-2">minx, miny, maxx, maxy <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bbox</span>
<span id="cb28-3">which_asset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'visual'</span></span>
<span id="cb28-4"></span>
<span id="cb28-5">image <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rioxarray.open_rasterio(signed_item.assets[which_asset].href).rio.clip_box(</span>
<span id="cb28-6">        minx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>minx,</span>
<span id="cb28-7">        miny<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>miny,</span>
<span id="cb28-8">        maxx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>maxx,</span>
<span id="cb28-9">        maxy<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>maxy,</span>
<span id="cb28-10">        crs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"EPSG:4326"</span>,</span>
<span id="cb28-11">)</span>
<span id="cb28-12"></span>
<span id="cb28-13"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'image data type: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(image)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>image data type: &lt;class 'xarray.core.dataarray.DataArray'&gt;</code></pre>
</div>
</div>
<p>The raster file gets read into a <code>DataArray</code> from the xarray package.</p>
<div class="cell" data-execution_count="19">
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb30-1">image.shape</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="19">
<pre><code>(3, 210, 210)</code></pre>
</div>
</div>
<p>The first dimension is color band, the second is height and the third is width.</p>
<p>To get at the actual values we use the <code>values</code> property of the image xarray. For assets with multiple bands (such as <code>visual</code>), each band has an associated (y, x) matrix.</p>
<div class="cell" data-tags="[]" data-execution_count="20">
<div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb32-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(image[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">115</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">105</span>].values)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[[115 124 128 134 133]
 [120 131 129 129 133]
 [129 136 131 129 128]
 [140 137 135 135 131]
 [109 124 138 140 138]
 [ 50  75 112 128 122]
 [ 36  38  61  75  71]
 [ 52  48  49  36  34]
 [ 58  59  52  42  48]
 [ 57  59  50  49  54]
 [ 43  43  53  53  53]
 [ 47  40  53  51  46]
 [ 42  39  43  41  37]
 [ 38  37  32  29  30]
 [ 43  40  31  27  26]]</code></pre>
</div>
</div>
<p>The values in the <code>visual</code> band are scaled on a 0-255 scale. If you look at some of the other bands, the values are NOT scaled. More on that later.</p>
<p>Let’s wrap this in a function. We’ll make sure we are opening a signed item by using <code>planetary_computer.sign</code>.</p>
<div class="cell" data-execution_count="21">
<div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb34-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> crop_sentinel_image(item, bounding_box, asset_str):</span>
<span id="cb34-2">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb34-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Given a STAC item from Sentinel-2 and a bounding box tuple in the format</span></span>
<span id="cb34-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    (minx, miny, maxx, maxy), return a cropped portion of the item's visual</span></span>
<span id="cb34-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    imagery in the bounding box.</span></span>
<span id="cb34-6"></span>
<span id="cb34-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Returns the image as a numpy array with dimensions (color band, height, width)</span></span>
<span id="cb34-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    """</span></span>
<span id="cb34-9">    (minx, miny, maxx, maxy) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bounding_box</span>
<span id="cb34-10"></span>
<span id="cb34-11">    cropped_image <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rioxarray.open_rasterio(planetary_computer.sign(item.assets[asset_str].href)).rio.clip_box(</span>
<span id="cb34-12">        minx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>minx,</span>
<span id="cb34-13">        miny<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>miny,</span>
<span id="cb34-14">        maxx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>maxx,</span>
<span id="cb34-15">        maxy<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>maxy,</span>
<span id="cb34-16">        crs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"EPSG:4326"</span>,</span>
<span id="cb34-17">    )</span>
<span id="cb34-18"></span>
<span id="cb34-19">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> cropped_image</span></code></pre></div>
</div>
<p>Now we’ll crop the image, returning a <code>DataArray</code>, and create a numpy array from that. Then we can view the image by plotting the numpy array with matplotlib.</p>
<div class="cell" data-execution_count="22">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb35-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Crop the image</span></span>
<span id="cb35-2">cropped_img <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crop_sentinel_image(item, bbox, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'visual'</span>)</span>
<span id="cb35-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'cropped_image is a </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(cropped_img)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> with dimensions of </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>cropped_img<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>dims<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> and shape = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>cropped_img<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb35-4"></span>
<span id="cb35-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a numpy array from the cropped image</span></span>
<span id="cb35-6">cropped_img_array <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cropped_img.to_numpy()</span>
<span id="cb35-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'cropped_image_array is a </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(cropped_img_array)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> with shape = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>cropped_img_array<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>cropped_image is a &lt;class 'xarray.core.dataarray.DataArray'&gt; with dimensions of ('band', 'y', 'x') and shape = (3, 210, 210)
cropped_image_array is a &lt;class 'numpy.ndarray'&gt; with shape = (3, 210, 210)</code></pre>
</div>
</div>
<p>You can see how the xarray package adds dimension names to numpy arrays.</p>
<p>We have to transpose some of the dimensions to plot since matplotlib expects channels in a certain order (y, x, band). Note that the band dimension is of length 3 - red, green and blue.</p>
<div class="cell" data-tags="[]" data-execution_count="23">
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb37-1">plt.imshow(np.transpose(cropped_img_array, axes<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="23">
<pre><code>&lt;matplotlib.image.AxesImage at 0x7f96e4360fa0&gt;</code></pre>
</div>
<div class="cell-output cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/algaebloom-part2/index_files/figure-html/cell-24-output-2.png" class="img-fluid"></p>
</div>
</div>
<p>One last detour before looking a little closer at the <code>assets</code> dictionary - downloading and saving one of these GeoTIFF files on a local machine. While it’s great that all the images are always available from the cloud that is the MPC, sometimes we want to have a few local images to play around with during our learning.</p>
</section>
<section id="downloading-and-saving-a-sentinel-2-image-on-local-machine" class="level2">
<h2 class="anchored" data-anchor-id="downloading-and-saving-a-sentinel-2-image-on-local-machine">Downloading and saving a Sentinel-2 image on local machine</h2>
<p>The following tutorial walks through the steps of finding, exploring, saving local, and processing local of an image file from MPC.</p>
<ul>
<li><a href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/introduce-a-notebook-gallery-image-to-process-geospatial-data/ba-p/3582865">Introduce a Notebook gallery image to process Geospatial data from Planetary Computer with STAC API</a></li>
</ul>
<p>We are simply using the <code>requests</code> library to grab the image using a signed item’s <code>href</code> property and then writing it as a binary file to our local filesystem.</p>
<div class="cell" data-execution_count="24">
<div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb39-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> requests</span></code></pre></div>
</div>
<div class="cell" data-execution_count="25">
<div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb40-1">signed_href <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> signed_item.assets[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'visual'</span>].href</span></code></pre></div>
</div>
<p>We’ll keep the original filename and we can extract it from the URL by splitting on the <code>'?'</code> and then using Pathlib’s <code>name</code> attribute.</p>
<div class="cell" data-execution_count="26">
<div class="sourceCode cell-code" id="cb41" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb41-1">base_url <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> signed_item.assets[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'visual'</span>].href.split(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'?'</span>)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb41-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(base_url)</span>
<span id="cb41-3">filename <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Path(base_url).name</span>
<span id="cb41-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(filename)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/16/T/GQ/2022/11/03/S2B_MSIL2A_20221103T163439_N0400_R083_T16TGQ_20221104T085305.SAFE/GRANULE/L2A_T16TGQ_A029567_20221103T163800/IMG_DATA/R10m/T16TGQ_20221103T163439_TCI_10m.tif
T16TGQ_20221103T163439_TCI_10m.tif</code></pre>
</div>
</div>
<p>Ok, let’s save it in the <code>images/</code> subfolder.</p>
<div class="cell" data-execution_count="27">
<div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb43-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># use requests to grab the file and write it out</span></span>
<span id="cb43-2"></span>
<span id="cb43-3">redownload <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Avoiding redownloading</span></span>
<span id="cb43-4"></span>
<span id="cb43-5"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> redownload:</span>
<span id="cb43-6">    response <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> requests.get(signed_href)</span>
<span id="cb43-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">with</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(Path(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'images'</span>, filename), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"wb"</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> f:</span>
<span id="cb43-8">        f.write(response.content)</span></code></pre></div>
</div>
<p>Great, we can save a file locally if we want. Here’s an overly verbose functionized version. We’ll use it later to download one of the other image bands.</p>
<div class="cell" data-execution_count="28">
<div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb44-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> download_tif_from_mpc(item, which_asset, dest_folder):</span>
<span id="cb44-2">    signed_href <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pc.sign(item).assets[which_asset].href</span>
<span id="cb44-3">    base_url <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> signed_href.split(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'?'</span>)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb44-4">    filename <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Path(base_url).name</span>
<span id="cb44-5">    save_path <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Path(dest_folder, filename)</span>
<span id="cb44-6">    response <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> requests.get(signed_href)</span>
<span id="cb44-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">with</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(save_path, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"wb"</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> f:</span>
<span id="cb44-8">        f.write(response.content)</span></code></pre></div>
</div>
<p>One reason that I wanted to have a few local TIFF files was so that I could explore and mess around with them in <a href="https://www.qgis.org/en/site/">QGIS</a>, a terrific free and open source desktop GIS package. There are tons of tutorials, it’s easy to use, has a vibrant user community and numerous specialized tools (both built in and via plug-ins). You can even do things like launch GDAL commands from a GUI and then see the corresponding command line code. Here’s a screenshot of the TIFF file we just downloaded. You can see the individual RGB bands on the Layers area.</p>
<div>
<p><img src="https://bitsofanalytics.org/posts/algaebloom-part2/images/thunder_bay_qgis.png" width="500"></p>
</div>
</section>
<section id="exploring-the-assets-of-our-stac-item" class="level2">
<h2 class="anchored" data-anchor-id="exploring-the-assets-of-our-stac-item">Exploring the <code>assets</code> of our STAC item</h2>
<p>The dictionary keys are one of the values that we saw earlier (e.g.&nbsp;<code>B01</code>, <code>B02</code>, …) and the values<code>Asset</code> objects and that have an <code>href</code> attribute that points to where the asset is stored in MPC. If you look at the URLs, you can see the appended tokens indicating that the item has been signed. For example (it’s wide, you’ll need to scroll):</p>
<div class="sourceCode" id="cb45" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb45-1">{<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'AOT'</span>: <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>Asset href<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>https:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>sentinel2l2a01.blob.core.windows.net<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>sentinel2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>l2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>T<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>GQ<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2022</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">11</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>S2B_MSIL2A_20221103T163439_N0400_R083_T16TGQ_20221104T085305.SAFE<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>GRANULE<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>L2A_T16TGQ_A029567_20221103T163800<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>IMG_DATA<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>R10m<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>T16TGQ_20221103T163439_AOT_10m.tif?st<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2023</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">27</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">T22</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">A47</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">A33Z</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>se<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2023</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">28</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">T23</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">A32</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">A33Z</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>sp<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>rl<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>sv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2021</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>sr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>c<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>skoid<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>c85c15d6<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>d1ae<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">42</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">d4</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>af60<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>e2ca0f81359b<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>sktid<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">72</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">f988bf</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">86</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">f1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">41</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">af</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">91</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">ab</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">d7cd011db47</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>skt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2023</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">28</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">T21</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">A28</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">A38Z</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>ske<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2023</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">4T21</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">A28</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">A38Z</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>sks<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>b<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>skv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2021</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>sig<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>ZrRJpxPGmswyY7GH3HlNiuFKJPR5VA4hbne4gbDoUR0<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">D</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span>,</span>
<span id="cb45-2"> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'B01'</span>: <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>Asset href<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>https:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>sentinel2l2a01.blob.core.windows.net<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>sentinel2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>l2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>T<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>GQ<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2022</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">11</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>S2B_MSIL2A_20221103T163439_N0400_R083_T16TGQ_20221104T085305.SAFE<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>GRANULE<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>L2A_T16TGQ_A029567_20221103T163800<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>IMG_DATA<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>R60m<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>T16TGQ_20221103T163439_B01_60m.tif?st<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2023</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">27</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">T22</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">A47</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">A33Z</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>se<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2023</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">28</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">T23</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">A32</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">A33Z</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>sp<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>rl<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>sv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2021</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>sr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>c<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>skoid<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>c85c15d6<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>d1ae<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">42</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">d4</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>af60<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>e2ca0f81359b<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>sktid<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">72</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">f988bf</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">86</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">f1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">41</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">af</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">91</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">ab</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">d7cd011db47</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>skt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2023</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">28</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">T21</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">A28</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">A38Z</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>ske<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2023</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">4T21</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">A28</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">A38Z</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>sks<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>b<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>skv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2021</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>sig<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>ZrRJpxPGmswyY7GH3HlNiuFKJPR5VA4hbne4gbDoUR0<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">D</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span>,</span>
<span id="cb45-3">... </span></code></pre></div>
<p>You’ll also see that the bands are <a href="https://en.wikipedia.org/wiki/TIFF">TIFF files</a> (actually <a href="https://en.wikipedia.org/wiki/GeoTIFF">GeoTIFF</a> files) - they have a <code>.tif</code> extension. TIFF files are commonly used for storing raster data and can included tagged metadata and even multiple “images” within one file. The GeoTIFF standard allows georeferencing info to be included as tagged metadata in a TIFF file. There are also some XML assets that appear to be metadata related.</p>
<p>The STAC spec can be found at <a href="https://github.com/radiantearth/stac-spec">https://github.com/radiantearth/stac-spec</a>.</p>
<p>Let’s examine the <code>'B08'</code> asset - <a href="https://en.wikipedia.org/wiki/Infrared">the NIR (near infrared) spectral band</a>.</p>
<div class="cell" data-execution_count="30">
<div class="sourceCode cell-code" id="cb46" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb46-1">nir_asset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> signed_item.assets[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'B08'</span>]</span>
<span id="cb46-2">[att <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> att <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dir</span>(nir_asset) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'__'</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> att]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="30">
<pre><code>['_repr_html_',
 'clone',
 'common_metadata',
 'description',
 'extra_fields',
 'from_dict',
 'get_absolute_href',
 'href',
 'media_type',
 'owner',
 'roles',
 'set_owner',
 'title',
 'to_dict']</code></pre>
</div>
</div>
<p>The <code>to_dict</code> method will give use details about the band.</p>
<div class="cell" data-execution_count="31">
<div class="sourceCode cell-code" id="cb48" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb48-1">nir_asset.to_dict()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="31">
<pre><code>{'href': 'https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/16/T/GQ/2022/11/03/S2B_MSIL2A_20221103T163439_N0400_R083_T16TGQ_20221104T085305.SAFE/GRANULE/L2A_T16TGQ_A029567_20221103T163800/IMG_DATA/R10m/T16TGQ_20221103T163439_B08_10m.tif?st=2023-01-27T22%3A47%3A33Z&amp;se=2023-01-28T23%3A32%3A33Z&amp;sp=rl&amp;sv=2021-06-08&amp;sr=c&amp;skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&amp;sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&amp;skt=2023-01-28T21%3A28%3A38Z&amp;ske=2023-02-04T21%3A28%3A38Z&amp;sks=b&amp;skv=2021-06-08&amp;sig=ZrRJpxPGmswyY7GH3HlNiuFKJPR5VA4hbne4gbDoUR0%3D',
 'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
 'title': 'Band 8 - NIR - 10m',
 'proj:bbox': [699960.0, 4890240.0, 809760.0, 5000040.0],
 'proj:shape': [10980, 10980],
 'proj:transform': [10.0, 0.0, 699960.0, 0.0, -10.0, 5000040.0],
 'gsd': 10.0,
 'eo:bands': [{'name': 'B08',
   'common_name': 'nir',
   'description': 'Band 8 - NIR',
   'center_wavelength': 0.842,
   'full_width_half_max': 0.145}],
 'roles': ['data']}</code></pre>
</div>
</div>
<div class="cell" data-execution_count="32">
<div class="sourceCode cell-code" id="cb50" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb50-1">[key <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> key <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> nir_asset.to_dict()]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="32">
<pre><code>['href',
 'type',
 'title',
 'proj:bbox',
 'proj:shape',
 'proj:transform',
 'gsd',
 'eo:bands',
 'roles']</code></pre>
</div>
</div>
<p>What are these?</p>
<p>The first three are general information about the item band.</p>
<ul>
<li><code>href</code> - URL for the data in Azure blob storage</li>
<li><code>type</code> - the image is a <a href="https://www.cogeo.org/">COG, or cloud-optimized GeoTIFF</a></li>
<li><code>title</code> - it’s Band 8, NIR and it’s at 10m resolution</li>
</ul>
<p>Next are three projection related properties and are part of the STAC extensions ecosystem. See <a href="https://github.com/stac-extensions/projection">the STAC Extensions repo</a> for more details.</p>
<ul>
<li><code>proj:bbox</code> - bounding box of the asset CRS (WGS84/UTM zone 16N)</li>
</ul>
<p>The bbox <code>[699960.0, 4890240.0, 809760.0, 5000040.0]</code> corresponds to [min x, min y, max x, max y] ([west, south, east, north]) where <code>x</code> is the easting and <code>y</code> is the northing in UTM zone 16N. The first two coordinates specify the lower left of the bbox and the last two, the upper right. The center coordinates of UTM zone 16N are (500000.0, 4649776.22). All <a href="https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system">UTM zones</a> are centered longitudinally at 500000 to prevent negative eastings. So, the bbox for this image is in the upper right (NE) part of the zone, relative to its center coordinates.</p>
<ul>
<li><code>proj:shape</code> - number of rows and columns in the raster image (i.e., in Y, X order)</li>
<li><code>proj:transform</code> - this is a linear mapping from pixel coordinate space to projection coodinate space. In other words, this mapping is used to map a specific pixel to a spot on the earth using the projected coordinate system we are using. See <a href="https://github.com/stac-extensions/projection#projtransform">https://github.com/stac-extensions/projection#projtransform</a> for details. The numbers are actually the first two rows of a certain 3x3 matrix used to do the affine transformation.</li>
</ul>
<p>Next is the <code>gsd</code> - <em>Ground Sample Distance</em> (or resolution) in the base units of the projected coordinate reference system.</p>
<p>The electro-optical bands property <code>eo:bands</code> propery gives more details about this specific spectral band - see <a href="https://github.com/stac-extensions/eo">https://github.com/stac-extensions/eo</a>. Finally, the <code>roles</code> property gives us a sense of how we use might this item asset. Besides, <code>data</code>, other common roles are <code>thumbnail</code>, <code>overview</code>, and <code>metadata</code>. See <a href="https://github.com/radiantearth/stac-spec/blob/v1.0.0/item-spec/item-spec.md#asset-roles">https://github.com/radiantearth/stac-spec/blob/v1.0.0/item-spec/item-spec.md#asset-roles</a>. for more details.</p>
<p>Now we’ll crop the <code>B08</code> band using the same bbox we’ve been using. After cropping we’ll look at the raw data values as well as preview the image. Remember, we are using rioxarray to read the TIFF file into an xarray <code>DataArray</code>.</p>
<div class="cell" data-execution_count="33">
<div class="sourceCode cell-code" id="cb52" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb52-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Crop the image</span></span>
<span id="cb52-2">cropped_B08_img <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crop_sentinel_image(signed_item, bbox, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'B08'</span>)</span>
<span id="cb52-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'cropped_image is a </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(cropped_img)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> with dimensions of </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>cropped_img<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>dims<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> and shape = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>cropped_img<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb52-4"></span>
<span id="cb52-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a numpy array from the cropped image</span></span>
<span id="cb52-6">cropped_B08_img_array <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cropped_B08_img.to_numpy()</span>
<span id="cb52-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'cropped_image_array is a </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(cropped_B08_img_array)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> with shape = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>cropped_B08_img<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>cropped_image is a &lt;class 'xarray.core.dataarray.DataArray'&gt; with dimensions of ('band', 'y', 'x') and shape = (3, 210, 210)
cropped_image_array is a &lt;class 'numpy.ndarray'&gt; with shape = (1, 210, 210)</code></pre>
</div>
</div>
<p>Let’s plot it.</p>
<div class="cell" data-tags="[]" data-execution_count="34">
<div class="sourceCode cell-code" id="cb54" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb54-1">plt.imshow(np.transpose(cropped_B08_img_array, axes<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="34">
<pre><code>&lt;matplotlib.image.AxesImage at 0x7f96deaab5b0&gt;</code></pre>
</div>
<div class="cell-output cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/algaebloom-part2/index_files/figure-html/cell-34-output-2.png" class="img-fluid"></p>
</div>
</div>
<p>To see the actual underlying data values, we again use the <code>values</code> property <code>DataArray</code> object.</p>
<div class="cell" data-execution_count="35">
<div class="sourceCode cell-code" id="cb56" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb56-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(cropped_B08_img[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">115</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">105</span>].values)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[[1830 1852 1842 1907 1822]
 [1973 1890 1874 1878 1862]
 [2174 2014 1958 1906 1874]
 [2394 2190 2070 1979 1932]
 [2218 2212 2190 2108 2011]
 [1880 2024 2110 2296 2106]
 [2354 2100 1860 1903 1710]
 [3034 2852 2510 2076 1886]
 [3408 3308 2936 2652 2652]
 [3382 3144 3100 3188 3146]
 [2776 2598 2760 3166 3210]
 [2548 2550 2714 2826 2646]
 [2592 2598 2584 2670 2620]
 [2456 2432 2494 2660 2644]
 [2620 2656 2670 2770 2804]]</code></pre>
</div>
</div>
<p>Obviously, unlike the RGB bands, these are NOT on a 0-255 scale. What are they? They are <em>surface reflectance values</em> and have been scaled to be integers between 1 and 10000 with 0 being the “nodata” value (different bands will have different unscaled ranges of values). They are considered <em>Analysis Ready Data</em> and have been produced after atmospheric corrections have been applied to the raw data from the satellite sensors. The combination of atomosphic corrections and the common scale make it easier to use spectral data in machine learning models.</p>
<p>Here are some good resources for digger deeper into Sentinel data and reflectance values:</p>
<ul>
<li><a href="https://docs.digitalearthafrica.org/en/latest/data_specs/Sentinel-2_Level-2A_specs.html">Sentinel-2 Level-2A Surface Reflectance</a></li>
<li><a href="https://sentinels.copernicus.eu/web/sentinel/technical-guides/sentinel-2-msi">Sentinel-2 MSI Technical Guide</a></li>
<li><a href="https://sentinels.copernicus.eu/web/sentinel/technical-guides/sentinel-2-msi/level-2a/algorithm">Level-2A Algorithm Overview</a></li>
<li><a href="https://semiautomaticclassificationmanual-v5.readthedocs.io/en/latest/remote_sensing.html#image-conversion-to-reflectance">Image conversion to reflectance</a></li>
</ul>
</section>
<section id="a-good-place-to-stop-for-part-2" class="level2">
<h2 class="anchored" data-anchor-id="a-good-place-to-stop-for-part-2">A good place to stop for part 2</h2>
<p>In this part we:</p>
<ul>
<li>Used the MPC explorer to find an image item (a COG)</li>
<li>Used the STAC API to access the image and used rioxarray to load it into a <code>DataArray</code> (from xarray package)</li>
<li>Learned how to download a Sentinel-2 image item</li>
<li>Explored the structure of a Sentinel-2 item</li>
<li>Used geopy to create a bounding box</li>
<li>Cropped an image with a bounding box and rioxarray</li>
<li>Dug into the spectral band data that can be found in the item assets collection</li>
<li>Viewed images with matplotlib</li>
</ul>
<p>In the next part, we’ll use Python to find a bunch of images that intersect a point of interest and that were created during a specific data range. We’ll also see how the GeoPandas package can be used for working with multiple image items in a query-like way.</p>


</section>
</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div quarto-reuse="quarto-reuse" class="quarto-appendix-contents"><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</a></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{isken2023,
  author = {Mark Isken},
  title = {Algal Bloom Detection Extended Tutorial - {Part} 2:
    {Planetary} {Computer} Intro},
  date = {2023-01-28},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-isken2023" class="csl-entry quarto-appendix-citeas">
Mark Isken. 2023. <span>“Algal Bloom Detection Extended Tutorial - Part
2: Planetary Computer Intro.”</span> January 28, 2023.
</div></div></section></div> ]]></description>
  <category>geonewb</category>
  <category>geospatial</category>
  <category>python</category>
  <category>sentinel-2</category>
  <category>planetary-computer</category>
  <guid>https://bitsofanalytics.org/posts/algaebloom-part2/</guid>
  <pubDate>Sat, 28 Jan 2023 05:00:00 GMT</pubDate>
</item>
<item>
  <title>Geospatial podcasts</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/geo-podcasts/</link>
  <description><![CDATA[ 



<section id="mapscaping-podcast" class="level2">
<h2 class="anchored" data-anchor-id="mapscaping-podcast"><a href="https://mapscaping.com/podcasts/">Mapscaping Podcast</a></h2>
<p>I’ve been listening a lot to this podcast. It’s a great resource and covers a wide range of geospatial topics. One thing that makes it somewhat unique is that there are numerous episodes focusing on geospatial career related topics such as <a href="https://mapscaping.com/podcast/mid-career-change/">mid-career changes</a>, <a href="https://mapscaping.com/podcast/geospatial-side-hustles/">side hustles</a> and <a href="https://mapscaping.com/podcast/getting-your-geospatial-start-up-started/">advice for geospatial startups</a>. While listening to the <a href="">episode on finding your dream job in earth observation</a> the subject of podcasting came up and several other geospatial podcasts were recommended. So, I figured I’d list a few of them here so that I remember to start checking them out.</p>
</section>
<section id="minds-behind-maps" class="level2">
<h2 class="anchored" data-anchor-id="minds-behind-maps"><a href="https://www.mindsbehindmaps.com/">Minds Behind Maps</a></h2>
<blockquote class="blockquote">
<p>These are conversations with the people behind the maps powering the modern world.</p>
</blockquote>
<p>I’m looking forward to checking these out as they are long-form podcasts, and I really like long-form podcasts (e.g.&nbsp;<a href="https://www.acquired.fm/">Acquired</a>)</p>
</section>
<section id="the-scene-from-above-podcast" class="level2">
<h2 class="anchored" data-anchor-id="the-scene-from-above-podcast"><a href="https://scenefromabove.podbean.com/">The Scene From Above Podcast</a></h2>
<blockquote class="blockquote">
<p>A show that brings you informal discussions about the cool things in the world of Earth observation.</p>
</blockquote>
</section>
<section id="geo-mob" class="level2">
<h2 class="anchored" data-anchor-id="geo-mob"><a href="https://thegeomob.com/podcast">Geo Mob</a></h2>
<blockquote class="blockquote">
<p>Geomob provides a relaxed forum to learn about and discuss geoinnovation in any and all forms. We do this via regular in-person events and our weekly podcast.</p>
</blockquote>


</section>

 ]]></description>
  <category>geonewb</category>
  <category>geospatial</category>
  <guid>https://bitsofanalytics.org/posts/geo-podcasts/</guid>
  <pubDate>Sat, 28 Jan 2023 05:00:00 GMT</pubDate>
</item>
<item>
  <title>Algal bloom detection extended tutorial - Part 1: geospatial libraries</title>
  <dc:creator>Mark Isken</dc:creator>
  <link>https://bitsofanalytics.org/posts/algaebloom-part1/</link>
  <description><![CDATA[ 



<p>This is part of the <a href="https://bitsofanalytics.org/posts/geonewb-intro/">geonewb</a> series of posts.</p>
<p>I’m working through the <a href="https://drivendata.co/blog/tick-tick-bloom-benchmark">Getting Started Tutorial</a> that’s part of the <a href="https://www.drivendata.org/competitions/143/tick-tick-bloom/page/649/">Detecting harmful algal bloom challenge</a> run by the <a href="https://www.drivendata.org/">Driven Data</a> folks. Since I’ve worked just a bit with geographic and even less with image data, I’m using this challenge as a vehicle for quickly learning more about these topics.</p>
<p>It’s a terrific tutorial and as I was working through it I would branch off to learn more about some of the underlying concepts and technologies. This led to a mess of Jupyter notebooks and now I’m trying to bring all of that together into a somewhat extended version of the original tutorial. I’m also going to break this up into a few parts to keep the notebook sizes reasonable. In this first part we’ll focus on the following:</p>
<ul>
<li>an overview of the <strong>algae bloom detection challenge</strong>,</li>
<li>getting a high level <strong>understanding of the myriad of geospatial Python packages</strong> we’ll be using as well as of Microsoft’s Planetary Computer</li>
<li>creating a <strong>Conda virtual environment</strong> with the necessary geospatial packages.</li>
<li>exploring the <strong>project data</strong> from DrivenData (metadata and training labels)</li>
</ul>
<p>In Part 2 (and maybe 3 depending on length), we’ll cover:</p>
<ul>
<li><strong>acquiring satellite image data</strong> (both Sentinel-2 and Landsat) from <strong>Microsoft’s Planetary Computer</strong>,</li>
<li>build a basic understanding of the <strong>structure and data content</strong> of these satellite created images,</li>
<li>basic <strong>viewing and manipulation of satellite imagery</strong>.</li>
</ul>
<p>In subsequent parts we’ll tackle the <strong>feature engineering</strong> and <strong>predictive modeling</strong> sections of the <em>original tutorial</em>.</p>
<p>When I quote directly from the original tutorial or from other online documentation, I’ll format it as a block quote. For example, from the <a href="https://www.drivendata.org/competitions/143/tick-tick-bloom/page/649/">main project page</a>:</p>
<blockquote class="blockquote">
<p>Inland water bodies like lakes and reservoirs provide critical drinking water and recreation for communities, and habitats for marine life. A significant challenge that water quality managers face is the formation of harmful algal blooms (HABs) such as cyanobacteria. HABs produce toxins that are poisonous to humans and their pets, and threaten marine ecosystems by blocking sunlight and oxygen. Manual water sampling, or “in situ” sampling, is generally used to monitor cyanobacteria in inland water bodies. In situ sampling is accurate, but time intensive and difficult to perform continuously.</p>
</blockquote>
<blockquote class="blockquote">
<p>Your goal in this challenge is to use satellite imagery to detect and classify the severity of cyanobacteria blooms in small, inland water bodies like reservoirs. Ultimately, better awareness of algal blooms helps keep both the human and marine life that relies on these water bodies safe and healthy.</p>
</blockquote>
<p>I highly recommend having the <a href="https://drivendata.co/blog/tick-tick-bloom-benchmark">original tutorial</a> open as you read or work through this notebook.</p>
<section id="python-and-other-libraries-for-geospatial-analysis" class="level2">
<h2 class="anchored" data-anchor-id="python-and-other-libraries-for-geospatial-analysis">Python (and other) libraries for geospatial analysis</h2>
<p>This tutorial (and geospatial analysis, in general) uses a whole bunch of libraries, each with many dependencies. Just trying to wrap your head around the main geospatial libraries and their role, is non-trivial. Then, getting them all installed into one or more conda virtual environments is also non-trivial. For example, here is a list of all of the <code>import</code> statements from the original tutorial (blank lines indicating a different Jupyter code cell). Actually, these are only the imports related to the sections of the tutorial before the machine learning modeling.</p>
<pre><code>import cv2
from datetime import timedelta
import matplotlib.pyplot as plt
import numpy as np
import odc.stac
import pandas as pd
from pathlib import Path

import geopandas as gpd
from shapely.geometry import Point

import planetary_computer as pc
from pystac_client import Client

import geopy.distance as distance

import rioxarray</code></pre>
<p>While some of these are familiar to anyone doing data science work in Python (matplotlib, numpy, pandas, datetime), there are a number of geospatial specific libraries. I started by exploring the documentation for and playing around with each of these just enough to figure out their major role in geospatial analysis. Many of these packages have important geospatial related dependencies and I’ll describe them as we run into them. After getting a high level look at these, we will try to get them installed within a <a href="https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html">conda virtual environment</a>.</p>
<section id="odc-stac---load-stac-items-into-xarray-dataset" class="level3">
<h3 class="anchored" data-anchor-id="odc-stac---load-stac-items-into-xarray-dataset"><a href="https://odc-stac.readthedocs.io/en/latest/">odc-stac</a> - load STAC items into <code>xarray</code> <code>DataSet</code></h3>
<p>A bunch to unpack here. <a href="https://stacspec.org/en">STAC</a> stands for SpatioTemporal Asset Catalogs. Microsoft’s Planetary Computer API builds on STAC.</p>
<blockquote class="blockquote">
<p>The STAC specification is a common language to describe geospatial information, so it can more easily be worked with, indexed, and discovered.</p>
</blockquote>
<p>What about xarray?</p>
</section>
<section id="xarray---labelled-multidimension-arrays" class="level3">
<h3 class="anchored" data-anchor-id="xarray---labelled-multidimension-arrays"><a href="https://docs.xarray.dev/en/stable/">xarray</a> - labelled multidimension arrays</h3>
<p>Xarray builds on top of NumPy N-d arrays and adds ability to create and work with labels for the dimensions.</p>
<blockquote class="blockquote">
<p>Xarray makes working with labelled multi-dimensional arrays in Python simple, efficient, and fun!</p>
</blockquote>
<p>The two main data structures are <code>DataArray</code> (a N-d generalization of a <code>pandas.Series</code>) and <code>DataSet</code> (an N-d generalization of a <code>pandas.DataFrame</code>). The <a href="https://docs.xarray.dev/en/stable/getting-started-guide/why-xarray.html">Overview: Why xarray?</a> page has a nice level of detail on the case for xarray and its link to geospatial analysis.</p>
</section>
<section id="open-data-cube---an-open-source-geospatial-data-management-and-analysis-platform" class="level3">
<h3 class="anchored" data-anchor-id="open-data-cube---an-open-source-geospatial-data-management-and-analysis-platform"><a href="https://www.opendatacube.org/">Open Data Cube</a> - An Open Source Geospatial Data Management and Analysis Platform</h3>
<p>While digging into odc-stac, it became clear that it was part of a much bigger project called Open Data Cube. The ODC project was initiated in Australia. From the main web site:</p>
<blockquote class="blockquote">
<p>The Open Data Cube (ODC) is an Open Source Geospatial Data Management and Analysis Software project that helps you harness the power of Satellite data. At its core, the ODC is a set of Python libraries and PostgreSQL database that helps you work with geospatial raster data. See our GitHub repository <a href="https://github.com/opendatacube">here&gt;&gt;</a>.</p>
<p>The ODC seeks to increase the value and impact of global Earth observation satellite data by providing an open and freely accessible exploitation architecture. The ODC project seeks to foster a community to develop, sustain, and grow the technology and the breadth and depth of its applications for societal benefit.</p>
</blockquote>
<p>One key feature seems to be the ability to work with images over time in order to visualize the dynamics of landscape change. These time based stacks of images form the “cube”. Like the Planetary Computer and Google Earth Engine, ODC provides not just catalogued data but also a compute environment and apps for exploring data. It, again like the Planetary Computer, is based on free and open source software and is visibly committed to this for the long term.</p>
<p>Some key tools that are part of the ODC ecosystem are <a href="https://github.com/opendatacube/odc-stac">odc-stac</a> and <a href="https://github.com/opendatacube/odc-geo">odc-geo</a>. The former provides an easy way to load STAC items into xarray data structures and the latter provides:</p>
<blockquote class="blockquote">
<p>This library combines geometry shape classes from shapely with CRS from pyproj to provide projection aware <code>Geometry</code></p>
</blockquote>
<p>It seems as if these libraries were part of the ODC core but have been extracted as utilities to facilitate wider use without requiring installation of the full ODC package.</p>
<p>What is unclear to me is the status of the ODC project. The Latest News section’s most recent item is about its 2021 conference. There is recent activity in the <a href="https://github.com/opendatacube/datacube-core">opendatacube-core</a> repo. A <a href="https://www.dea.ga.gov.au/developers/sandbox">sandbox</a> is available but I haven’t had time to try it out.</p>
</section>
<section id="geopandas---add-geospatial-functionality-to-pandas" class="level3">
<h3 class="anchored" data-anchor-id="geopandas---add-geospatial-functionality-to-pandas"><a href="https://geopandas.org/en/stable/index.html">GeoPandas</a> - add geospatial functionality to pandas</h3>
<p>The basic idea is to combine the capabilites of pandas with the shapely library to allow you to work with geospatial data in a pandas-like way.</p>
<blockquote class="blockquote">
<p>GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by <a href="">shapely</a>https://shapely.readthedocs.io/en/stable/index.html. GeoPandas further depends on <a href="https://fiona.readthedocs.io/">fiona</a> for file access and matplotlib for plotting.</p>
</blockquote>
</section>
<section id="shapely---manipulation-and-analysis-of-geometric-objects-in-the-cartesian-plane" class="level3">
<h3 class="anchored" data-anchor-id="shapely---manipulation-and-analysis-of-geometric-objects-in-the-cartesian-plane"><a href="https://shapely.readthedocs.io/en/stable/index.html">shapely</a> - manipulation and analysis of geometric objects in the Cartesian plane</h3>
<p>Shapely makes it easy to work with points, curves, and surfaces with Python. Shapely objects will end up as datatypes in GeoPandas dataframes (a <code>GeoDataFrame</code>).</p>
</section>
<section id="fiona---pythonic-reading-and-writing-of-geospatial-data" class="level3">
<h3 class="anchored" data-anchor-id="fiona---pythonic-reading-and-writing-of-geospatial-data"><a href="https://fiona.readthedocs.io/en/latest/README.html#usage">fiona</a> - Pythonic reading and writing of geospatial data</h3>
<blockquote class="blockquote">
<p>Fiona reads and writes geographic data files and thereby helps Python programmers integrate geographic information systems with other computer systems. Fiona contains extension modules that link the Geospatial Data Abstraction Library (GDAL).</p>
</blockquote>
<p>So, <a href="https://github.com/Toblerity/Fiona">Fiona</a> reads and writes geographic data files in a Pythonic way using GDAL to do the heavy lifting.</p>
<p>So we finally get to meet the venerable <a href="https://gdal.org/">GDAL library</a>.</p>
</section>
<section id="gdal---the-geospatial-data-abstraction-library" class="level3">
<h3 class="anchored" data-anchor-id="gdal---the-geospatial-data-abstraction-library"><a href="https://www.osgeo.org/projects/gdal/">GDAL</a> - the Geospatial Data Abstraction Library</h3>
<p>GDAL is an indispensable part of computational geospatial work. What is it?</p>
<ul>
<li>a translator library for raster and vector geospatial data formats (a few hundred) written in C, C++ and Python,</li>
<li>an open source package (MIT License) released by <a href="https://www.osgeo.org/">The Open Source Geospatial Foundation (OSGeo)</a>,</li>
<li>in addition to being used as a callable library, it includes a set of command line tools,</li>
<li>is used as a core resource in countless GIS and geospatial analysis tools (e.g., free and open-source packages such as <a href="https://www.qgis.org/en/site/">QGIS</a> and <a href="https://grass.osgeo.org/">GRASS</a>; even <a href="https://www.esri.com/en-us/home">ESRI</a> appears to use GDAL to <a href="https://www.esri.com/en-us/home">deal with custom raster formats</a>).</li>
</ul>
<p>Another related library, OGR, is part of the GDAL source code and focuses on “simple features vector data”. This <a href="https://trac.osgeo.org/gdal/wiki/FAQGeneral">GDAL FAQ page</a> gives more detail on the GDAL/OGR relationship. When people say GDAL, it includes OGR. Speaking of saying, both “gee-doll” and “goo-dle” are used.</p>
<p>GDAL/OGR also relies on the <a href="https://proj.org/">PROJ</a> library for <a href="https://en.wikipedia.org/wiki/Map_projection">projections</a> and transformations.</p>
<p>Given the importance of GDAL and its use by so many geospatial software packages, it is somewhat surprising that for many years it was maintained by a single person. Check out this <a href="https://mapscaping.com/podcast/gdal-geospatial-data-abstraction-library/">Mapscaping podcast on GDAL</a> for a fascinating telling of the GDAL story.</p>
<p>As we’ll see, we usually don’t have to install GDAL as it will get installed when we install higher level packages such as GeoPandas.</p>
</section>
<section id="proj---transform-geospatial-coordinates-between-different-coordinate-reference-systems" class="level3">
<h3 class="anchored" data-anchor-id="proj---transform-geospatial-coordinates-between-different-coordinate-reference-systems"><a href="https://proj.org/">PROJ</a> - transform geospatial coordinates between different coordinate reference systems</h3>
<p>If you are going to work with geospatial data, you are going to have to learn about coordinate reference systems (CRS) and map projections. The world isn’t flat and it’s not a perfect sphere. However, most maps are flat. Projections are a way of translating our non-flat earth to a flat representation for mapping. The CRS is a specific type of “grid system” so that numeric X-Y coordinates can be associated with any point on the map.</p>
<p>A great source of Python flavored introductory material on things like CRS and projections and general GIS concepts is the open source book - <a href="https://pygis.io/docs/a_intro.html">PyGIS: Python Open Source Spatial Programming &amp; Remote Sensing</a>. The <a href="https://docs.qgis.org/3.22/en/docs/gentle_gis_introduction/coordinate_reference_systems.html">QGIS documentation</a> also has a good overview of these topics. Then you’ll want to learn about the <a href="https://en.wikipedia.org/wiki/World_Geodetic_System">World Geodesic System (WGS)</a> and things like <a href="https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset">EPSG codes</a>. For example, EPSG:4326 - WGS 84 is the latitude/longitude coordinate system based on the Earth’s center of mass, used by <a href="https://www.gps.gov/">GPS</a> and in many mapping applications.</p>
<p>The PROJ library does the heavy lifting of translating between different CRS and projections. Much like GDAL, it is</p>
<ul>
<li>open source,</li>
<li>used both as a library and command line tool,</li>
<li>foundational software to geospatial analysis,</li>
<li>now maintained by OSGeo.</li>
</ul>
<p>While the underlying library is C/C++, you can use PROJ from Python via the <a href="https://pyproj4.github.io/pyproj/stable/api/proj.html">pyproj package</a>. Like GDAL, pyproj will get installed when we install a higher level package such as GeoPandas. However, there are <a href="https://proj.org/install.html">several different ways to install PROJ</a> and early on I seem to have found one that resulted in multiple (and conflicting) PROJ installations on my Linux machine. Eventually I fixed it - more on this later.</p>
</section>
<section id="planetary_computer---data-api-compute-ecosystem-for-earth-observation" class="level3">
<h3 class="anchored" data-anchor-id="planetary_computer---data-api-compute-ecosystem-for-earth-observation"><a href="https://planetarycomputer.microsoft.com/">planetary_computer</a> - data, API, compute, ecosystem for earth observation</h3>
<p>Microsoft’s, pretty new, <a href="https://planetarycomputer.microsoft.com/docs/overview/about">Planetary Computer</a> is quite an amazing project. It has several major components:</p>
<ul>
<li><a href="https://planetarycomputer.microsoft.com/catalog">Data Catalog</a> - It is a ginormous repository of well cataloged data all about Earth’s various systems and includes a web based interface that allows users to find relevant data - for free.</li>
<li><a href="https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac/">API</a> - It has an API that leverages open source tools to make it easy to do data searches by time and location. Focuses on Python.</li>
<li><a href="https://planetarycomputer.microsoft.com/docs/overview/environment/">Hub</a> - A managed compute environment for doing cloud based geospatial analysis at scale. For this part you need to apply for access.</li>
<li><a href="https://planetarycomputer.microsoft.com/applications">Applications</a> - an ecosystem of people doing meaningful work with the Planetary Computer.</li>
</ul>
<p>There’s a really good Mapscaping podcast featuring two of the main developers of the Planetary Computer - see <a href="https://mapscaping.com/podcast/the-planetary-computer/">https://mapscaping.com/podcast/the-planetary-computer/</a>. The <a href="https://mapscaping.com/">Mapscaping.com</a> site is a great place to learn about geospatial analysis through its terrific podcasts. For example, when I started going through the algae bloom tutorial, it mentioned two primary sources for obtaining free satellite imagery data - Microsoft’s Planetary Computer and the Google Earth Engine. Which should I use? Did a bunch of reading, where I also learned about this thing called the Sentinel Hub, but really benefited from Daniel O’Donohue’s (the podcaster behind Mapscaping.com) series of podcasts:</p>
<ul>
<li>The Planetary Computer - <a href="https://mapscaping.com/podcast/the-planetary-computer/">https://mapscaping.com/podcast/the-planetary-computer/</a></li>
<li>Introducing Google Earth Engine - <a href="https://mapscaping.com/podcast/introducing-google-earth-engine/">https://mapscaping.com/podcast/introducing-google-earth-engine/</a></li>
<li>Sentinel Hub - <a href="https://mapscaping.com/podcast/sentinel-hub/">https://mapscaping.com/podcast/sentinel-hub/</a></li>
</ul>
<p>We’ll mostly be using the Planetary Computer to search for and acquire satellite image data from the Sentinel-2 and Landsat catalogs. We can use pystac-client to do this freely and without any authentication. As mentioned above, authentication is required to use the Hub computing cluster.</p>
</section>
<section id="pystac---a-python-library-for-working-with-stacs" class="level3">
<h3 class="anchored" data-anchor-id="pystac---a-python-library-for-working-with-stacs"><a href="https://pystac.readthedocs.io/en/stable/">PySTAC</a> - A Python library for working with STACs</h3>
<p>The PySTAC package provides a Python based way of creating and working with the fundamental building blocks of STACs - catalogs, collections, and items. <a href="https://stacspec.org/en">STAC</a> itself is a specification. PySTAC provides tools to enable STAC compliant development. There are a few related projects (taken directly from the PySTAC docs):</p>
<ul>
<li><a href="https://github.com/stac-utils/pystac-client">pystac-client</a>: A Python client for working with STAC Catalogs and APIs.</li>
<li><a href="https://github.com/stac-utils/stactools">stactools</a>: A command line tool and library for working with STAC.</li>
<li><a href="https://github.com/sat-utils/sat-stac">sat-stac</a>: A Python 3 library for reading and working with existing Spatio-Temporal Asset Catalogs (STAC). Much of PySTAC builds on the code and concepts of sat-stac.</li>
</ul>
</section>
<section id="pystac-client---a-python-client-for-working-with-stac-catalogs-and-apis" class="level3">
<h3 class="anchored" data-anchor-id="pystac-client---a-python-client-for-working-with-stac-catalogs-and-apis"><a href="https://github.com/stac-utils/pystac-client">pystac-client</a> - A Python client for working with STAC Catalogs and APIs</h3>
<p>Microsoft’s Planetary Computer is heavily based on the STAC specification. The pystac-client package provides the tools we need to interact with the Planetary Computer API. In particular, it provides search related functionality.</p>
</section>
<section id="geopy---python-client-for-several-popular-geocoding-web-services" class="level3">
<h3 class="anchored" data-anchor-id="geopy---python-client-for-several-popular-geocoding-web-services"><a href="https://geopy.readthedocs.io/en/stable/">geopy</a> - Python client for several popular geocoding web services</h3>
<p>The geopy library makes it easier to interact with different geocoding services (e.g.&nbsp;Google Maps) by avoiding having to work with their individual APIs.</p>
<p>While geopy’s primary role is as a way to use Python with web-based geocoding services, it has some functions that are useful for more basic geospatial analysis needs. In particular, the geopy library includes a <a href="https://geopy.readthedocs.io/en/stable/#module-geopy.distance">geodesic distance function</a> that we can use to find a bounding box of a specific size. A bounding box is a rectangular area defined by two latitude values and two longitude values - it’s often abbreviated as a bbox.</p>
<blockquote class="blockquote">
<p>The geodesic distance is the shortest distance on the surface of an ellipsoidal model of the earth. The default algorithm uses the method is given by <a href="https://link.springer.com/article/10.1007/s00190-012-0578-z">Karney (2013)</a> (geodesic); this is accurate to round-off and always converges.</p>
</blockquote>
<p>The default elipsoid is WGS-84, but this can be changed.</p>
</section>
<section id="rioxarray---read-raster-data-into-xarray-objects" class="level3">
<h3 class="anchored" data-anchor-id="rioxarray---read-raster-data-into-xarray-objects"><a href="https://corteva.github.io/rioxarray/html/readme.html">rioxarray</a> - read raster data into xarray objects</h3>
<p>The rioxarray package extends the xarray package to facilitate reading raster data into xarray objects. The actual reading of the raster file is done using another Python package known as <a href="https://rasterio.readthedocs.io/en/latest/">rasterio</a>. From the rasterio docs:</p>
<blockquote class="blockquote">
<p>Geographic information systems use GeoTIFF and other formats to organize and store gridded raster datasets such as satellite imagery and terrain models. Rasterio reads and writes these formats and provides a Python API based on Numpy N-dimensional arrays and GeoJSON.</p>
</blockquote>
<p>Before rasterio came along, we had use Python bindings to GDAL.</p>
<blockquote class="blockquote">
<p>Before Rasterio there was one Python option for accessing the many different kind of raster data files used in the GIS field: the Python bindings distributed with the Geospatial Data Abstraction Library, GDAL. These bindings extend Python, but provide little abstraction for GDAL’s C API. This means that Python programs using them tend to read and run like C programs.</p>
</blockquote>
<p>This <a href="https://gis.stackexchange.com/questions/358036/extracting-data-from-a-raster/358058#358058">GIS Stack Exchange post</a> shows a simple example of using rioxarray to read a raster file and get at the data values in the underlying xarray <code>DataArray</code>.</p>
<p>And, <a href="https://gis.stackexchange.com/questions/333886/what-data-structure-to-use-for-raster-data-when-using-rasterio">this GIS Stack Exchange post</a> xplains how rioxarray combines rasterio with xarray to make it easier to work with raster data in Python. It’s similar to how GeoPandas combines functionality from pandas, shapely and fiona.</p>
<p><a href="https://gis.stackexchange.com/">GIS Stack Exchange</a> is like StackOverflow for geospatial questions. Very helpful.</p>
</section>
<section id="cv2---computer-vision" class="level3">
<h3 class="anchored" data-anchor-id="cv2---computer-vision"><a href="https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html">cv2</a> - computer vision</h3>
<p>The <code>cv2</code> module is actually part of the <a href="https://github.com/opencv/opencv-python">opencv library which supplies Python bindings</a> for the underlying C++ based <a href="https://docs.opencv.org/4.x/d1/dfb/intro.html">OpenCV library</a>. For <a href="https://stackoverflow.com/questions/50092954/import-opencv-vs-import-cv2/50093247">historical reasons</a>, they’ve stayed with <code>import cv2</code>. This library contains hundreds of computer vision related functions and tools.</p>
</section>
</section>
<section id="creating-a-conda-virtual-environment-for-this-geospatial-tutorial" class="level2">
<h2 class="anchored" data-anchor-id="creating-a-conda-virtual-environment-for-this-geospatial-tutorial">Creating a Conda virtual environment for this geospatial tutorial</h2>
<p>Welp, that’s a lot of packages. My first inclination was to create a conda environment YAML file with everything in there and just let it rip. That turned out like you might think - hung forever at the “solving environment” stage and needed to be interupted. Then I tried creating a minimal conda environment and installing the geospatial packages as needed as I worked through the tutorial. I always tried to use conda to do the install, but had to resort to pip for a few things. That also turned out like you might think - things ran for a while but eventually I must have created a dependency conflict and the environment was gorked. Time to be more strategic. I noticed that GeoPandas has a ton of dependencies and figured I’d start by just installing it first and seeing which other packages I needed ended up being installed as a dependency.</p>
<p>Here’s the conda <code>gp_39.yml</code> env file for a minimal GeoPandas install.</p>
<div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Minimal install of GeoPandas</span></span>
<span id="cb2-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">name:</span> gp_39</span>
<span id="cb2-3"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">channels:</span></span>
<span id="cb2-4">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">-</span> defaults</span>
<span id="cb2-5">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">-</span> conda-forge</span>
<span id="cb2-6"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">dependencies:</span></span>
<span id="cb2-7">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">-</span> python=3.9</span>
<span id="cb2-8">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">-</span> geopandas</span>
<span id="cb2-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make environment discoverable by JupyterLab</span></span>
<span id="cb2-10">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">-</span> ipykernel</span>
<span id="cb2-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Environment level pip for eventual pip installs</span></span>
<span id="cb2-12">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">-</span> pip</span></code></pre></div>
<p>I created the enironment:</p>
<div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb3-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">conda</span> env create <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-f</span> gp_39.yml</span></code></pre></div>
<p>and then listed the installed packages after it was done:</p>
<div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb4-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">conda</span> activate gp_39</span>
<span id="cb4-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">conda</span> list</span></code></pre></div>
<p>Here are the relevant geospatial (and data science) packages that got installed when the <code>gp_39</code> conda environment was created.</p>
<div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb5-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">fiona</span>                     1.8.22           py39h417a72b_0</span>
<span id="cb5-2"></span>
<span id="cb5-3"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">gdal</span>                      3.0.2            py39h40f10ac_6  </span>
<span id="cb5-4"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">geopandas</span>                 0.9.0                      py_1  </span>
<span id="cb5-5"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">geopandas-base</span>            0.9.0                      py_1  </span>
<span id="cb5-6"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">geos</span>                      3.8.0                he6710b0_0  </span>
<span id="cb5-7"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">geotiff</span>                   1.7.0                hd69d5b1_0 </span>
<span id="cb5-8"></span>
<span id="cb5-9"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">numpy</span>                     1.23.5           py39h14f4228_0  </span>
<span id="cb5-10"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">numpy-base</span>                1.23.5           py39h31eccc5_0  </span>
<span id="cb5-11"></span>
<span id="cb5-12"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">pandas</span>                    1.5.2            py39h417a72b_0  </span>
<span id="cb5-13"></span>
<span id="cb5-14"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">pillow</span>                    9.3.0            py39hace64e9_1  </span>
<span id="cb5-15"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">pip</span>                       22.3.1           py39h06a4308_0  </span>
<span id="cb5-16"></span>
<span id="cb5-17"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">proj</span>                      6.2.1                h05a3930_0 </span>
<span id="cb5-18"></span>
<span id="cb5-19"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">pyproj</span>                    2.6.1.post1      py39hb3025e9_1 </span>
<span id="cb5-20"></span>
<span id="cb5-21"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">scikit-learn</span>              1.2.0            py39h6a678d5_0  </span>
<span id="cb5-22"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">scipy</span>                     1.9.3            py39h14f4228_0  </span>
<span id="cb5-23"></span>
<span id="cb5-24"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">shapely</span>                   1.8.4            py39h81ba7c5_0</span></code></pre></div>
<p>Some of things that did NOT get installed, but that we’ll need (maybe) are:</p>
<ul>
<li>xarray</li>
<li>rioxarray</li>
<li>planetary_computer</li>
<li>pystac-client</li>
<li>geopy</li>
<li>opencv</li>
<li>odc-stac</li>
</ul>
<p>Mixing conda installs with pip installs is a tricky business and it’s really important to install everything you can via conda before mixing in pip installs. Of the packages above, the following seem to suggest that a conda install is possible or even suggested:</p>
<ul>
<li>rioxarray</li>
<li>xarray (this will get installed by rioxarray)</li>
<li>geopy</li>
<li>planetary-computer</li>
<li>pystac-client</li>
</ul>
<p>Next I installed planetary-computer from the conda-forge channel.</p>
<div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb6-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">conda</span> install <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-c</span> conda-forge planetary-computer</span></code></pre></div>
<p>Installing planetary-computer also installed pystac-client.</p>
<p>Next was geopy and then rioxarray. I looked at the dependencies for <a href="https://github.com/conda-forge/rioxarray-feedstock">rioxarray on conda-forge</a> and see the following:</p>
<pre><code>- python &gt;=3.8
- rasterio &gt;=1.1.1
- scipy
- xarray &gt;=0.17
- pyproj &gt;=2.2
- packaging</code></pre>
<p>The pyproj dependency is already met. The main conda channel has version 1.2.10 of rasterio and version 0.20.1 of rasterio - so we should be good there.</p>
<div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb8-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">conda</span> install <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-c</span> conda-forge geopy</span>
<span id="cb8-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">conda</span> install <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-c</span> conda-forge rioxarray</span></code></pre></div>
<p>Both installed with no issues. I think that my early PROJ conflict was due to improperly installing rioxarray.</p>
<p>At this point, the only two packages that are not part of our virtual environment are opencv and odc-stac. In reading through the tutorial, these were only used for processing Landsat data. In particular, opencv was used to normalize Landsat imagery values to a 0-255 scale and odc-stac seemed like it was doing a job that pystac-client could do. Given the issues I had with installing these two packages, I decided to just not install them and either find another or implement my own normalization function - we’ll see this in a later part of this series of posts.</p>
<p><strong>UPDATE 2023-01-29</strong> Realized my odc-stac install problems were due to my own ill advised mixing of pip and conda installs. Did the following and now odc-stac installed and working just fine.</p>
<div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb9-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">conda</span> install <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-c</span> conda-forge odc-stac</span></code></pre></div>
<p>With our conda environment in place, we are ready to explore the project data.</p>
</section>
<section id="project-data-from-drivendata" class="level2">
<h2 class="anchored" data-anchor-id="project-data-from-drivendata">Project data from DrivenData</h2>
<p>The three main datafiles are available from https://www.drivendata.org/competitions/143/tick-tick-bloom/data/.</p>
<ul>
<li>Metadata - Metadata for both train and test set</li>
<li>Train labels - Cyanobacteria labels for the train set</li>
<li>Submission format - Format for submissions, with a placeholder for severity</li>
</ul>
<blockquote class="blockquote">
<p>Labels in this competition are based on “in situ” samples that were collected manually and then analyzed for cyanobacteria density. Each measurement is a unique combination of date and location (latitude and longitude).</p>
<p>There are three data files available in the download. Acquisition of the actual image data that is needed to create modeling features will be the primary focus of this part of the tutorial.</p>
</blockquote>
<pre><code>.
├── metadata.csv
├── submission_format.csv
└── train_labels.csv</code></pre>
<p>I just followed the data folder structure used in the tutorial.</p>
<section id="library-imports-and-data-directory-setup" class="level3">
<h3 class="anchored" data-anchor-id="library-imports-and-data-directory-setup">Library imports and data directory setup</h3>
<p>We’ll load a few libraries up front. For the rest, we will load as needed.</p>
<div class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Need to do some date math and need to work with file paths</span></span>
<span id="cb11-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> datetime <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> timedelta</span>
<span id="cb11-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> pathlib <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Path</span></code></pre></div>
</div>
<div class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Big 3 - come on, of course you'll need these</span></span>
<span id="cb12-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> matplotlib.pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb12-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb12-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span></code></pre></div>
</div>
<div class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>matplotlib inline</span></code></pre></div>
</div>
<div class="cell" data-tags="[]" data-execution_count="4">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1">DATA_DIR <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Path.cwd().resolve() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/final/public"</span></span>
<span id="cb14-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> DATA_DIR.exists()</span></code></pre></div>
</div>
<p>Let’s explore each of the data files.</p>
</section>
<section id="explore-the-metadata-metadata.csv" class="level3">
<h3 class="anchored" data-anchor-id="explore-the-metadata-metadata.csv">Explore the metadata (<code>metadata.csv</code>)</h3>
<p>The metadata tells us where each unique sample was taken and whether it is part of the train or test data.</p>
<div class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1">metadata <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(DATA_DIR <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"metadata.csv"</span>, parse_dates<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'date'</span>])</span>
<span id="cb15-2">metadata.head()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="5">

<div>

<table class="dataframe table table-sm table-striped">
  <thead>
    <tr>
      <th></th>
      <th>uid</th>
      <th>latitude</th>
      <th>longitude</th>
      <th>date</th>
      <th>split</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>aabm</td>
      <td>39.080319</td>
      <td>-86.430867</td>
      <td>2018-05-14</td>
      <td>train</td>
    </tr>
    <tr>
      <th>1</th>
      <td>aabn</td>
      <td>36.559700</td>
      <td>-121.510000</td>
      <td>2016-08-31</td>
      <td>test</td>
    </tr>
    <tr>
      <th>2</th>
      <td>aacd</td>
      <td>35.875083</td>
      <td>-78.878434</td>
      <td>2020-11-19</td>
      <td>train</td>
    </tr>
    <tr>
      <th>3</th>
      <td>aaee</td>
      <td>35.487000</td>
      <td>-79.062133</td>
      <td>2016-08-24</td>
      <td>train</td>
    </tr>
    <tr>
      <th>4</th>
      <td>aaff</td>
      <td>38.049471</td>
      <td>-99.827001</td>
      <td>2019-07-23</td>
      <td>train</td>
    </tr>
  </tbody>
</table>
</div>
</div>
</div>
<div class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1">metadata.info()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;class 'pandas.core.frame.DataFrame'&gt;
RangeIndex: 23570 entries, 0 to 23569
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   uid        23570 non-null  object        
 1   latitude   23570 non-null  float64       
 2   longitude  23570 non-null  float64       
 3   date       23570 non-null  datetime64[ns]
 4   split      23570 non-null  object        
dtypes: datetime64[ns](1), float64(2), object(2)
memory usage: 920.8+ KB</code></pre>
</div>
</div>
<p>The field defs are:</p>
<ul>
<li><code>uid (str)</code>: unique ID for each row. Each row is a unique combination of date and location (latitude and longitude).</li>
<li><code>date (pd.datetime)</code>: date when the sample was collected, in the format YYYY-MM-DD</li>
<li><code>latitude (float)</code>: latitude of the location where the sample was collected</li>
<li><code>longitude (float)</code>: longitude of the location where the sample was collected</li>
<li><code>region (str)</code>: region of the US. This will be used in scoring to calculate region-specific RMSEs. Final score will be the average across the four US regions.</li>
<li><code>split (str)</code>: indicates whether the row is part of the train set or the test set. Metadata is provided for all points in the train and test sets.</li>
</ul>
<p>Let’s look at the size of the training and test data.</p>
<div class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb18-1">metadata.groupby([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'split'</span>]).count()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="7">

<div>

<table class="dataframe table table-sm table-striped">
  <thead>
    <tr>
      <th></th>
      <th>uid</th>
      <th>latitude</th>
      <th>longitude</th>
      <th>date</th>
    </tr>
    <tr>
      <th>split</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>test</th>
      <td>6510</td>
      <td>6510</td>
      <td>6510</td>
      <td>6510</td>
    </tr>
    <tr>
      <th>train</th>
      <td>17060</td>
      <td>17060</td>
      <td>17060</td>
      <td>17060</td>
    </tr>
  </tbody>
</table>
</div>
</div>
</div>
<div class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1">metadata.split.value_counts(dropna<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="8">
<pre><code>train    17060
test      6510
Name: split, dtype: int64</code></pre>
</div>
</div>
<blockquote class="blockquote">
<p>The geographic points in the train and test sets are distinct. This means that none of the test set points are also in the train set, so your model’s performance will be measured on unseen locations.</p>
</blockquote>
<blockquote class="blockquote">
<p>The main feature data for this competition is satellite imagery from Sentinel-2 and Landsat. Participants will access all feature data through external, publicly available APIs. Relevant imagery can be identified using the location and date of each sample from the metadata.</p>
</blockquote>
<div class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb21-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Confirm all uid are unique</span></span>
<span id="cb21-2">metadata[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'uid'</span>].nunique()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="9">
<pre><code>23570</code></pre>
</div>
</div>
<div class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb23-1">metadata.groupby([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'split'</span>])[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'uid'</span>].nunique()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="10">
<pre><code>split
test      6510
train    17060
Name: uid, dtype: int64</code></pre>
</div>
</div>
<section id="distribution-of-samples-by-location" class="level4">
<h4 class="anchored" data-anchor-id="distribution-of-samples-by-location">Distribution of samples by location</h4>
<p>Ok, time to start learning more about location data. Here is where GeoPandas and shapely will come in handy.</p>
<div class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb25-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> geopandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> gpd</span>
<span id="cb25-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> shapely.geometry <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Point</span></code></pre></div>
</div>
<p>First we load a base map upon which to plot the points.</p>
<div class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb26-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># load the default geopandas base map file to plot points on</span></span>
<span id="cb26-2">world <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> gpd.read_file(gpd.datasets.get_path(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"naturalearth_lowres"</span>))</span></code></pre></div>
</div>
<div class="cell" data-execution_count="13">
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb27-1">world.plot()<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/algaebloom-part1/index_files/figure-html/cell-14-output-1.png" class="img-fluid"></p>
</div>
</div>
<p>Now, let’s just plot the USA.</p>
<div class="cell" data-execution_count="14">
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb28-1">fig, ax <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> plt.subplots(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span>
<span id="cb28-2"></span>
<span id="cb28-3">base <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> world[world.name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"United States of America"</span>].plot(</span>
<span id="cb28-4">    edgecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray"</span>, color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ghostwhite"</span>, figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>), alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>, ax<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>ax</span>
<span id="cb28-5">)</span></code></pre></div>
<div class="cell-output cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/algaebloom-part1/index_files/figure-html/cell-15-output-1.png" class="img-fluid"></p>
</div>
</div>
<p>Before going any further, let’s explore the underlying <code>GeoDataFrame</code>.</p>
<div class="cell" data-execution_count="15">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb29-1">world.info()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;class 'geopandas.geodataframe.GeoDataFrame'&gt;
RangeIndex: 177 entries, 0 to 176
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   pop_est     177 non-null    int64   
 1   continent   177 non-null    object  
 2   name        177 non-null    object  
 3   iso_a3      177 non-null    object  
 4   gdp_md_est  177 non-null    float64 
 5   geometry    177 non-null    geometry
dtypes: float64(1), geometry(1), int64(1), object(3)
memory usage: 8.4+ KB</code></pre>
</div>
</div>
<div class="cell" data-execution_count="16">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb31-1">world.head(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="16">

<div>

<table class="dataframe table table-sm table-striped">
  <thead>
    <tr>
      <th></th>
      <th>pop_est</th>
      <th>continent</th>
      <th>name</th>
      <th>iso_a3</th>
      <th>gdp_md_est</th>
      <th>geometry</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>920938</td>
      <td>Oceania</td>
      <td>Fiji</td>
      <td>FJI</td>
      <td>8374.0</td>
      <td>MULTIPOLYGON (((180.00000 -16.06713, 180.00000...</td>
    </tr>
    <tr>
      <th>1</th>
      <td>53950935</td>
      <td>Africa</td>
      <td>Tanzania</td>
      <td>TZA</td>
      <td>150600.0</td>
      <td>POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...</td>
    </tr>
    <tr>
      <th>2</th>
      <td>603253</td>
      <td>Africa</td>
      <td>W. Sahara</td>
      <td>ESH</td>
      <td>906.5</td>
      <td>POLYGON ((-8.66559 27.65643, -8.66512 27.58948...</td>
    </tr>
    <tr>
      <th>3</th>
      <td>35623680</td>
      <td>North America</td>
      <td>Canada</td>
      <td>CAN</td>
      <td>1674000.0</td>
      <td>MULTIPOLYGON (((-122.84000 49.00000, -122.9742...</td>
    </tr>
    <tr>
      <th>4</th>
      <td>326625791</td>
      <td>North America</td>
      <td>United States of America</td>
      <td>USA</td>
      <td>18560000.0</td>
      <td>MULTIPOLYGON (((-122.84000 49.00000, -120.0000...</td>
    </tr>
    <tr>
      <th>5</th>
      <td>18556698</td>
      <td>Asia</td>
      <td>Kazakhstan</td>
      <td>KAZ</td>
      <td>460700.0</td>
      <td>POLYGON ((87.35997 49.21498, 86.59878 48.54918...</td>
    </tr>
    <tr>
      <th>6</th>
      <td>29748859</td>
      <td>Asia</td>
      <td>Uzbekistan</td>
      <td>UZB</td>
      <td>202300.0</td>
      <td>POLYGON ((55.96819 41.30864, 55.92892 44.99586...</td>
    </tr>
    <tr>
      <th>7</th>
      <td>6909701</td>
      <td>Oceania</td>
      <td>Papua New Guinea</td>
      <td>PNG</td>
      <td>28020.0</td>
      <td>MULTIPOLYGON (((141.00021 -2.60015, 142.73525 ...</td>
    </tr>
    <tr>
      <th>8</th>
      <td>260580739</td>
      <td>Asia</td>
      <td>Indonesia</td>
      <td>IDN</td>
      <td>3028000.0</td>
      <td>MULTIPOLYGON (((141.00021 -2.60015, 141.01706 ...</td>
    </tr>
    <tr>
      <th>9</th>
      <td>44293293</td>
      <td>South America</td>
      <td>Argentina</td>
      <td>ARG</td>
      <td>879400.0</td>
      <td>MULTIPOLYGON (((-68.63401 -52.63637, -68.25000...</td>
    </tr>
    <tr>
      <th>10</th>
      <td>17789267</td>
      <td>South America</td>
      <td>Chile</td>
      <td>CHL</td>
      <td>436100.0</td>
      <td>MULTIPOLYGON (((-68.63401 -52.63637, -68.63335...</td>
    </tr>
    <tr>
      <th>11</th>
      <td>83301151</td>
      <td>Africa</td>
      <td>Dem. Rep. Congo</td>
      <td>COD</td>
      <td>66010.0</td>
      <td>POLYGON ((29.34000 -4.49998, 29.51999 -5.41998...</td>
    </tr>
    <tr>
      <th>12</th>
      <td>7531386</td>
      <td>Africa</td>
      <td>Somalia</td>
      <td>SOM</td>
      <td>4719.0</td>
      <td>POLYGON ((41.58513 -1.68325, 40.99300 -0.85829...</td>
    </tr>
    <tr>
      <th>13</th>
      <td>47615739</td>
      <td>Africa</td>
      <td>Kenya</td>
      <td>KEN</td>
      <td>152700.0</td>
      <td>POLYGON ((39.20222 -4.67677, 37.76690 -3.67712...</td>
    </tr>
    <tr>
      <th>14</th>
      <td>37345935</td>
      <td>Africa</td>
      <td>Sudan</td>
      <td>SDN</td>
      <td>176300.0</td>
      <td>POLYGON ((24.56737 8.22919, 23.80581 8.66632, ...</td>
    </tr>
    <tr>
      <th>15</th>
      <td>12075985</td>
      <td>Africa</td>
      <td>Chad</td>
      <td>TCD</td>
      <td>30590.0</td>
      <td>POLYGON ((23.83766 19.58047, 23.88689 15.61084...</td>
    </tr>
    <tr>
      <th>16</th>
      <td>10646714</td>
      <td>North America</td>
      <td>Haiti</td>
      <td>HTI</td>
      <td>19340.0</td>
      <td>POLYGON ((-71.71236 19.71446, -71.62487 19.169...</td>
    </tr>
    <tr>
      <th>17</th>
      <td>10734247</td>
      <td>North America</td>
      <td>Dominican Rep.</td>
      <td>DOM</td>
      <td>161900.0</td>
      <td>POLYGON ((-71.70830 18.04500, -71.68774 18.316...</td>
    </tr>
    <tr>
      <th>18</th>
      <td>142257519</td>
      <td>Europe</td>
      <td>Russia</td>
      <td>RUS</td>
      <td>3745000.0</td>
      <td>MULTIPOLYGON (((178.72530 71.09880, 180.00000 ...</td>
    </tr>
    <tr>
      <th>19</th>
      <td>329988</td>
      <td>North America</td>
      <td>Bahamas</td>
      <td>BHS</td>
      <td>9066.0</td>
      <td>MULTIPOLYGON (((-78.98000 26.79000, -78.51000 ...</td>
    </tr>
    <tr>
      <th>20</th>
      <td>2931</td>
      <td>South America</td>
      <td>Falkland Is.</td>
      <td>FLK</td>
      <td>281.8</td>
      <td>POLYGON ((-61.20000 -51.85000, -60.00000 -51.2...</td>
    </tr>
    <tr>
      <th>21</th>
      <td>5320045</td>
      <td>Europe</td>
      <td>Norway</td>
      <td>-99</td>
      <td>364700.0</td>
      <td>MULTIPOLYGON (((15.14282 79.67431, 15.52255 80...</td>
    </tr>
    <tr>
      <th>22</th>
      <td>57713</td>
      <td>North America</td>
      <td>Greenland</td>
      <td>GRL</td>
      <td>2173.0</td>
      <td>POLYGON ((-46.76379 82.62796, -43.40644 83.225...</td>
    </tr>
    <tr>
      <th>23</th>
      <td>140</td>
      <td>Seven seas (open ocean)</td>
      <td>Fr. S. Antarctic Lands</td>
      <td>ATF</td>
      <td>16.0</td>
      <td>POLYGON ((68.93500 -48.62500, 69.58000 -48.940...</td>
    </tr>
    <tr>
      <th>24</th>
      <td>1291358</td>
      <td>Asia</td>
      <td>Timor-Leste</td>
      <td>TLS</td>
      <td>4975.0</td>
      <td>POLYGON ((124.96868 -8.89279, 125.08625 -8.656...</td>
    </tr>
  </tbody>
</table>
</div>
</div>
</div>
<p>The <code>world</code> object is a pandas <code>DataFrame</code> with one of the columns being a <code>GeoSeries</code> having a <code>geometry</code> data type. You can have multiple such columns but only one geometry is <em>active</em> at a time. Each row is a country.</p>
<div class="cell" data-execution_count="17">
<div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb32-1">world[world[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'iso_a3'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'USA'</span>]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="17">

<div>

<table class="dataframe table table-sm table-striped">
  <thead>
    <tr>
      <th></th>
      <th>pop_est</th>
      <th>continent</th>
      <th>name</th>
      <th>iso_a3</th>
      <th>gdp_md_est</th>
      <th>geometry</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>4</th>
      <td>326625791</td>
      <td>North America</td>
      <td>United States of America</td>
      <td>USA</td>
      <td>18560000.0</td>
      <td>MULTIPOLYGON (((-122.84000 49.00000, -120.0000...</td>
    </tr>
  </tbody>
</table>
</div>
</div>
</div>
<p>The <code>MULTIPOLYGON</code> object is from the shapely library.</p>
<div class="cell" data-execution_count="18">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb33-1">world[world[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'iso_a3'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'USA'</span>][<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'geometry'</span>]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="18">
<pre><code>4    MULTIPOLYGON (((-122.84000 49.00000, -120.0000...
Name: geometry, dtype: geometry</code></pre>
</div>
</div>
<p>Let’s recreate the plot but overlay the sample points.</p>
<div class="cell" data-execution_count="19">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb35-1">fig, ax <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> plt.subplots(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span>
<span id="cb35-2"></span>
<span id="cb35-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># map the training data</span></span>
<span id="cb35-4">base <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> world[world.name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"United States of America"</span>].plot(</span>
<span id="cb35-5">    edgecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray"</span>, color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ghostwhite"</span>, figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>), alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>, ax<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>ax</span>
<span id="cb35-6">)</span>
<span id="cb35-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Subset the `metadata` dataframe</span></span>
<span id="cb35-8">train_meta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> metadata[metadata[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"split"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"train"</span>]</span>
<span id="cb35-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a list of `Point` (a shapely data type) objects based on the lat-long columns</span></span>
<span id="cb35-10">geometry <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Point(xy) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> xy <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(train_meta[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"longitude"</span>], train_meta[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"latitude"</span>])]</span>
<span id="cb35-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a new GeoDataFrame from the training data and the geometry (list of shapely Point objects) we just created</span></span>
<span id="cb35-12">gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> gpd.GeoDataFrame(train_meta, geometry<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>geometry)</span>
<span id="cb35-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Plot the training data in blue</span></span>
<span id="cb35-14">gdf.plot(ax<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>base, marker<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"."</span>, markersize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>, label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Train"</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span>)</span>
<span id="cb35-15"></span>
<span id="cb35-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># map the test data - same steps as above, but plot in orange</span></span>
<span id="cb35-17">test_meta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> metadata[metadata[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"split"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"test"</span>]</span>
<span id="cb35-18">geometry <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Point(xy) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> xy <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(test_meta[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"longitude"</span>], test_meta[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"latitude"</span>])]</span>
<span id="cb35-19">gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> gpd.GeoDataFrame(test_meta, geometry<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>geometry)</span>
<span id="cb35-20">gdf.plot(ax<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>base, marker<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"."</span>, markersize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"orange"</span>, label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Test"</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span>)</span>
<span id="cb35-21"></span>
<span id="cb35-22">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Longitude"</span>)</span>
<span id="cb35-23">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Latitude"</span>)</span>
<span id="cb35-24"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set x and y axis limits to just show the continental USA</span></span>
<span id="cb35-25">plt.xlim([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">125</span>, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">65</span>])</span>
<span id="cb35-26">plt.ylim([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>])</span>
<span id="cb35-27">plt.legend(loc<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, markerscale<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="19">
<pre><code>&lt;matplotlib.legend.Legend at 0x7fbd95fab940&gt;</code></pre>
</div>
<div class="cell-output cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/algaebloom-part1/index_files/figure-html/cell-20-output-2.png" class="img-fluid"></p>
</div>
</div>
<p>Before moving on, let’s look at the <code>geometry</code> list of <code>Point</code> data.</p>
<div class="cell" data-execution_count="20">
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb37-1">geometry[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="20">
<pre><code>[&lt;shapely.geometry.point.Point at 0x7fbd961c5d60&gt;,
 &lt;shapely.geometry.point.Point at 0x7fbd964c3d90&gt;,
 &lt;shapely.geometry.point.Point at 0x7fbd964a9760&gt;,
 &lt;shapely.geometry.point.Point at 0x7fbd961eb910&gt;,
 &lt;shapely.geometry.point.Point at 0x7fbd961eb790&gt;]</code></pre>
</div>
</div>
<div class="cell" data-execution_count="21">
<div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb39-1">first_point <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> geometry[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb39-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(first_point)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>POINT (-121.51 36.5597)</code></pre>
</div>
</div>
<p>Points aren’t going to have too many attributes beyond their location.</p>
<div class="cell" data-execution_count="22">
<div class="sourceCode cell-code" id="cb41" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb41-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'A point has an area of </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>first_point<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>area<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb41-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'This point has a centroid at </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>first_point<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>centroid<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>A point has an area of 0.0
This point has a centroid at POINT (-121.51 36.5597)</code></pre>
</div>
</div>
<p>This was my first time using GeoPandas + matplotlib to create maps and found it pretty simple to pick up. I had used R’s <a href="https://journal.r-project.org/archive/2013-1/kahle-wickham.pdf">ggmap package</a> a bit and this felt kind of similar (yes, ggplot2 is amazing but R users should have little problem doing mapping in Python). I’m planning on doing some similar geonewb posts on geospatial analysis in R.</p>
</section>
<section id="distribution-of-samples-over-time" class="level4">
<h4 class="anchored" data-anchor-id="distribution-of-samples-over-time">Distribution of samples over time</h4>
<div class="cell" data-execution_count="23">
<div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb43-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Let's get min and max values for all the fields.</span></span>
<span id="cb43-2">metadata.groupby([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'split'</span>]).agg([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'min'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'max'</span>])</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="23">

<div>

<table class="dataframe table table-sm table-striped">
  <thead>
    <tr>
      <th></th>
      <th colspan="2" halign="left">uid</th>
      <th colspan="2" halign="left">latitude</th>
      <th colspan="2" halign="left">longitude</th>
      <th colspan="2" halign="left">date</th>
    </tr>
    <tr>
      <th></th>
      <th>min</th>
      <th>max</th>
      <th>min</th>
      <th>max</th>
      <th>min</th>
      <th>max</th>
      <th>min</th>
      <th>max</th>
    </tr>
    <tr>
      <th>split</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>test</th>
      <td>aabn</td>
      <td>zzzi</td>
      <td>26.94453</td>
      <td>48.97325</td>
      <td>-124.0352</td>
      <td>-67.69865</td>
      <td>2013-01-08</td>
      <td>2021-12-29</td>
    </tr>
    <tr>
      <th>train</th>
      <td>aabm</td>
      <td>zzyb</td>
      <td>26.38943</td>
      <td>48.90706</td>
      <td>-124.1792</td>
      <td>-68.06507</td>
      <td>2013-01-04</td>
      <td>2021-12-14</td>
    </tr>
  </tbody>
</table>
</div>
</div>
</div>
<div class="cell" data-execution_count="24">
<div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb44-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Another way to check just the date range using "named aggregations"</span></span>
<span id="cb44-2">metadata.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"split"</span>).agg(min_date<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"date"</span>, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>), max_date<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"date"</span>, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="24">

<div>

<table class="dataframe table table-sm table-striped">
  <thead>
    <tr>
      <th></th>
      <th>min_date</th>
      <th>max_date</th>
    </tr>
    <tr>
      <th>split</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>test</th>
      <td>2013-01-08</td>
      <td>2021-12-29</td>
    </tr>
    <tr>
      <th>train</th>
      <td>2013-01-04</td>
      <td>2021-12-14</td>
    </tr>
  </tbody>
</table>
</div>
</div>
</div>
<p>Let’s plot the number of records by year. We can use pandas crosstab to get a plottable dataframe.</p>
<div class="cell" data-execution_count="25">
<div class="sourceCode cell-code" id="cb45" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb45-1">recs_by_split_year <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.crosstab(metadata.date.dt.year, metadata.split)</span>
<span id="cb45-2">recs_by_split_year</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="25">

<div>

<table class="dataframe table table-sm table-striped">
  <thead>
    <tr>
      <th>split</th>
      <th>test</th>
      <th>train</th>
    </tr>
    <tr>
      <th>date</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2013</th>
      <td>521</td>
      <td>1362</td>
    </tr>
    <tr>
      <th>2014</th>
      <td>602</td>
      <td>1504</td>
    </tr>
    <tr>
      <th>2015</th>
      <td>527</td>
      <td>2299</td>
    </tr>
    <tr>
      <th>2016</th>
      <td>625</td>
      <td>2428</td>
    </tr>
    <tr>
      <th>2017</th>
      <td>1031</td>
      <td>2435</td>
    </tr>
    <tr>
      <th>2018</th>
      <td>777</td>
      <td>1940</td>
    </tr>
    <tr>
      <th>2019</th>
      <td>1199</td>
      <td>1788</td>
    </tr>
    <tr>
      <th>2020</th>
      <td>546</td>
      <td>1706</td>
    </tr>
    <tr>
      <th>2021</th>
      <td>682</td>
      <td>1598</td>
    </tr>
  </tbody>
</table>
</div>
</div>
</div>
<div class="cell" data-execution_count="26">
<div class="sourceCode cell-code" id="cb46" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb46-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Quick plot with pandas plot()</span></span>
<span id="cb46-2">recs_by_split_year.plot(kind<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bar"</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/algaebloom-part1/index_files/figure-html/cell-27-output-1.png" class="img-fluid"></p>
</div>
</div>
<div class="cell" data-execution_count="27">
<div class="sourceCode cell-code" id="cb47" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb47-1">metadata.groupby([metadata.date.dt.year, metadata.date.dt.month])[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'uid'</span>].count().plot(kind<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"line"</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/algaebloom-part1/index_files/figure-html/cell-28-output-1.png" class="img-fluid"></p>
</div>
</div>
<div class="cell" data-execution_count="28">
<div class="sourceCode cell-code" id="cb48" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb48-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># what seasons are the data points from?</span></span>
<span id="cb48-2">metadata[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"season"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (</span>
<span id="cb48-3">    metadata.date.dt.month.replace([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>], <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"winter"</span>)</span>
<span id="cb48-4">    .replace([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>], <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"spring"</span>)</span>
<span id="cb48-5">    .replace([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>], <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"summer"</span>)</span>
<span id="cb48-6">    .replace([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">11</span>], <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"fall"</span>)</span>
<span id="cb48-7">)</span>
<span id="cb48-8">metadata.season.value_counts()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="28">
<pre><code>summer    10813
spring     5045
fall       4758
winter     2954
Name: season, dtype: int64</code></pre>
</div>
</div>
<p>Since summer is usually associated with algal blooms, it’s not surprising that it’s sampled more frequently.</p>
<blockquote class="blockquote">
<p>Harmful algal blooms are more likely to be dangerous during the summer because more individuals are taking advantage of water bodies like lakes for recreation.</p>
</blockquote>
<p>Create a separate map for each season.</p>
<div class="cell" data-execution_count="29">
<div class="sourceCode cell-code" id="cb50" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb50-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># where is data from for each season?</span></span>
<span id="cb50-2">fig, axes <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> plt.subplots(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb50-3"></span>
<span id="cb50-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Since axes is a 2x2, we'll flatten it to a vector with four elements</span></span>
<span id="cb50-5"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> season, ax <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(metadata.season.unique(), axes.flatten()):</span>
<span id="cb50-6">    base <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> world[world.name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"United States of America"</span>].plot(</span>
<span id="cb50-7">        edgecolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray"</span>, color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ghostwhite"</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>, ax<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>ax</span>
<span id="cb50-8">    )</span>
<span id="cb50-9"></span>
<span id="cb50-10">    sub <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> metadata[metadata.season <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> season]</span>
<span id="cb50-11">    geometry <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Point(xy) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> xy <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(sub[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"longitude"</span>], sub[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"latitude"</span>])]</span>
<span id="cb50-12">    gdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> gpd.GeoDataFrame(sub, geometry<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>geometry)</span>
<span id="cb50-13">    gdf.plot(ax<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>base, marker<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"."</span>, markersize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.5</span>)</span>
<span id="cb50-14">    ax.set_xlim([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">125</span>, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">66</span>])</span>
<span id="cb50-15">    ax.set_ylim([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>])</span>
<span id="cb50-16">    ax.set_title(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>season<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>capitalize()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> data points"</span>)</span>
<span id="cb50-17">    ax.axis(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"off"</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/algaebloom-part1/index_files/figure-html/cell-30-output-1.png" class="img-fluid"></p>
</div>
</div>
<p>That’s a good start in exploring <code>metadata.csv</code>. On to the training data.</p>
</section>
</section>
<section id="explore-the-training-data-train_labels.csv" class="level3">
<h3 class="anchored" data-anchor-id="explore-the-training-data-train_labels.csv">Explore the training data (<code>train_labels.csv</code>)</h3>
<div class="cell" data-execution_count="30">
<div class="sourceCode cell-code" id="cb51" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb51-1">train_labels <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(DATA_DIR <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"train_labels.csv"</span>)</span>
<span id="cb51-2">train_labels.head()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="30">

<div>

<table class="dataframe table table-sm table-striped">
  <thead>
    <tr>
      <th></th>
      <th>uid</th>
      <th>region</th>
      <th>severity</th>
      <th>density</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>aabm</td>
      <td>midwest</td>
      <td>1</td>
      <td>585.0</td>
    </tr>
    <tr>
      <th>1</th>
      <td>aacd</td>
      <td>south</td>
      <td>1</td>
      <td>290.0</td>
    </tr>
    <tr>
      <th>2</th>
      <td>aaee</td>
      <td>south</td>
      <td>1</td>
      <td>1614.0</td>
    </tr>
    <tr>
      <th>3</th>
      <td>aaff</td>
      <td>midwest</td>
      <td>3</td>
      <td>111825.0</td>
    </tr>
    <tr>
      <th>4</th>
      <td>aafl</td>
      <td>midwest</td>
      <td>4</td>
      <td>2017313.0</td>
    </tr>
  </tbody>
</table>
</div>
</div>
</div>
<div class="cell" data-execution_count="31">
<div class="sourceCode cell-code" id="cb52" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb52-1">train_labels.shape</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="31">
<pre><code>(17060, 4)</code></pre>
</div>
</div>
<p>We have one row per in situ sample. Each row is a unique combination of date and location (latitude + longitude). There are columns for:</p>
<ul>
<li><code>uid (str)</code>: unique ID for each row. The uid maps each row in train_labels.csv to metadata.csv</li>
<li><code>region (str)</code>: US region in which the sample was taken. Scores are calculated separately for each of these regions, and then averaged. See the Problem Description page for details.</li>
<li><code>severity (int)</code>: severity level based on the cyanobacteria density. This is what you’ll be predicting.</li>
<li><code>density (float)</code>: raw measurement of total cyanobacteria density in cells per milliliter (mL)</li>
</ul>
<p>Join the training data with the metadata so that we have access to the lat, long and date fields.</p>
<div class="cell" data-tags="[]" data-execution_count="32">
<div class="sourceCode cell-code" id="cb54" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb54-1">train_labels_and_metadata <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> train_labels.merge(</span>
<span id="cb54-2">    metadata, how<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"left"</span>, left_on<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"uid"</span>, right_on<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"uid"</span>, validate<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"1:1"</span></span>
<span id="cb54-3">)</span>
<span id="cb54-4">train_labels_and_metadata</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="32">

<div>

<table class="dataframe table table-sm table-striped">
  <thead>
    <tr>
      <th></th>
      <th>uid</th>
      <th>region</th>
      <th>severity</th>
      <th>density</th>
      <th>latitude</th>
      <th>longitude</th>
      <th>date</th>
      <th>split</th>
      <th>season</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>aabm</td>
      <td>midwest</td>
      <td>1</td>
      <td>585.0</td>
      <td>39.080319</td>
      <td>-86.430867</td>
      <td>2018-05-14</td>
      <td>train</td>
      <td>spring</td>
    </tr>
    <tr>
      <th>1</th>
      <td>aacd</td>
      <td>south</td>
      <td>1</td>
      <td>290.0</td>
      <td>35.875083</td>
      <td>-78.878434</td>
      <td>2020-11-19</td>
      <td>train</td>
      <td>fall</td>
    </tr>
    <tr>
      <th>2</th>
      <td>aaee</td>
      <td>south</td>
      <td>1</td>
      <td>1614.0</td>
      <td>35.487000</td>
      <td>-79.062133</td>
      <td>2016-08-24</td>
      <td>train</td>
      <td>summer</td>
    </tr>
    <tr>
      <th>3</th>
      <td>aaff</td>
      <td>midwest</td>
      <td>3</td>
      <td>111825.0</td>
      <td>38.049471</td>
      <td>-99.827001</td>
      <td>2019-07-23</td>
      <td>train</td>
      <td>summer</td>
    </tr>
    <tr>
      <th>4</th>
      <td>aafl</td>
      <td>midwest</td>
      <td>4</td>
      <td>2017313.0</td>
      <td>39.474744</td>
      <td>-86.898353</td>
      <td>2021-08-23</td>
      <td>train</td>
      <td>summer</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <th>17055</th>
      <td>zzsv</td>
      <td>south</td>
      <td>3</td>
      <td>113125.0</td>
      <td>38.707825</td>
      <td>-75.080867</td>
      <td>2018-06-27</td>
      <td>train</td>
      <td>summer</td>
    </tr>
    <tr>
      <th>17056</th>
      <td>zzuq</td>
      <td>south</td>
      <td>3</td>
      <td>175726.0</td>
      <td>35.794000</td>
      <td>-79.015368</td>
      <td>2015-08-06</td>
      <td>train</td>
      <td>summer</td>
    </tr>
    <tr>
      <th>17057</th>
      <td>zzwo</td>
      <td>midwest</td>
      <td>2</td>
      <td>48510.0</td>
      <td>39.792190</td>
      <td>-99.971050</td>
      <td>2017-06-19</td>
      <td>train</td>
      <td>summer</td>
    </tr>
    <tr>
      <th>17058</th>
      <td>zzwq</td>
      <td>south</td>
      <td>1</td>
      <td>1271.0</td>
      <td>35.794000</td>
      <td>-79.012551</td>
      <td>2015-03-24</td>
      <td>train</td>
      <td>spring</td>
    </tr>
    <tr>
      <th>17059</th>
      <td>zzyb</td>
      <td>south</td>
      <td>1</td>
      <td>9682.0</td>
      <td>35.742000</td>
      <td>-79.238600</td>
      <td>2016-11-21</td>
      <td>train</td>
      <td>fall</td>
    </tr>
  </tbody>
</table>
<p>17060 rows × 9 columns</p>
</div>
</div>
</div>
<div class="cell" data-execution_count="33">
<div class="sourceCode cell-code" id="cb55" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb55-1">severity_counts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (</span>
<span id="cb55-2">    train_labels.replace(</span>
<span id="cb55-3">        {</span>
<span id="cb55-4">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"severity"</span>: {</span>
<span id="cb55-5">                <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"1 (&lt;20,000)"</span>,</span>
<span id="cb55-6">                <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2 (20,000-100,000)"</span>,</span>
<span id="cb55-7">                <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"3 (100,000 - 1,000,000)"</span>,</span>
<span id="cb55-8">                <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"4 (1,00,000 - 10,000,000)"</span>,</span>
<span id="cb55-9">                <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"5 (&gt;10,000,00)"</span>,</span>
<span id="cb55-10">            }</span>
<span id="cb55-11">        }</span>
<span id="cb55-12">    )</span>
<span id="cb55-13">    .severity.value_counts()</span>
<span id="cb55-14">    .sort_index(ascending<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb55-15">)</span>
<span id="cb55-16"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(severity_counts)</span>
<span id="cb55-17">plt.barh(severity_counts.index, severity_counts.values)</span>
<span id="cb55-18">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Number of samples"</span>)</span>
<span id="cb55-19">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Severity (range in cells/mL)"</span>)</span>
<span id="cb55-20">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Train labels severity level counts"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>5 (&gt;10,000,00)                 58
4 (1,00,000 - 10,000,000)    3547
3 (100,000 - 1,000,000)      2719
2 (20,000-100,000)           3239
1 (&lt;20,000)                  7497
Name: severity, dtype: int64</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="33">
<pre><code>Text(0.5, 1.0, 'Train labels severity level counts')</code></pre>
</div>
<div class="cell-output cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/algaebloom-part1/index_files/figure-html/cell-34-output-3.png" class="img-fluid"></p>
</div>
</div>
<div class="cell" data-execution_count="34">
<div class="sourceCode cell-code" id="cb58" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb58-1">train_labels.density.describe()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="34">
<pre><code>count    1.706000e+04
mean     1.074537e+06
std      6.836693e+06
min      0.000000e+00
25%      4.066000e+03
50%      3.270975e+04
75%      4.849192e+05
max      8.046675e+08
Name: density, dtype: float64</code></pre>
</div>
</div>
<div class="cell" data-execution_count="35">
<div class="sourceCode cell-code" id="cb60" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb60-1">plt.hist(train_labels.density, bins<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000000</span>))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="35">
<pre><code>(array([8570., 1394.,  772.,  465.,  342.,  244.,  194.,  163.,  132.,
         140.,  101.,   85.,   78.,   73.,   77.,   65.,   55.,   59.,
          55.,   49.,   38.,   59.,   39.,   33.,   24.,   31.,   24.,
          34.,   28.,   32.]),
 array([      0.        ,   33333.33333333,   66666.66666667,
         100000.        ,  133333.33333333,  166666.66666667,
         200000.        ,  233333.33333333,  266666.66666667,
         300000.        ,  333333.33333333,  366666.66666667,
         400000.        ,  433333.33333333,  466666.66666667,
         500000.        ,  533333.33333333,  566666.66666667,
         600000.        ,  633333.33333333,  666666.66666667,
         700000.        ,  733333.33333333,  766666.66666667,
         800000.        ,  833333.33333333,  866666.66666667,
         900000.        ,  933333.33333333,  966666.66666667,
        1000000.        ]),
 &lt;BarContainer object of 30 artists&gt;)</code></pre>
</div>
<div class="cell-output cell-output-display">
<p><img src="https://bitsofanalytics.org/posts/algaebloom-part1/index_files/figure-html/cell-36-output-2.png" class="img-fluid"></p>
</div>
</div>
<p>How many records have a density value of 0?</p>
<div class="cell" data-execution_count="36">
<div class="sourceCode cell-code" id="cb62" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb62-1">train_labels[train_labels[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'density'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].groupby([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'region'</span>])[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'uid'</span>].count()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="36">
<pre><code>region
midwest      12
northeast    77
south         1
west          1
Name: uid, dtype: int64</code></pre>
</div>
</div>
</section>
<section id="explore-the-submission-data-submission_format.csv" class="level3">
<h3 class="anchored" data-anchor-id="explore-the-submission-data-submission_format.csv">Explore the submission data (<code>submission_format.csv</code>)</h3>
<div class="cell" data-execution_count="37">
<div class="sourceCode cell-code" id="cb64" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb64-1">submission_format <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(DATA_DIR <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"submission_format.csv"</span>, index_col<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb64-2">submission_format.head()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="37">

<div>

<table class="dataframe table table-sm table-striped">
  <thead>
    <tr>
      <th></th>
      <th>region</th>
      <th>severity</th>
    </tr>
    <tr>
      <th>uid</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>aabn</th>
      <td>west</td>
      <td>1</td>
    </tr>
    <tr>
      <th>aair</th>
      <td>west</td>
      <td>1</td>
    </tr>
    <tr>
      <th>aajw</th>
      <td>northeast</td>
      <td>1</td>
    </tr>
    <tr>
      <th>aalr</th>
      <td>midwest</td>
      <td>1</td>
    </tr>
    <tr>
      <th>aalw</th>
      <td>west</td>
      <td>1</td>
    </tr>
  </tbody>
</table>
</div>
</div>
</div>
<ul>
<li><code>uid (str)</code>: unique ID for each row. The uid maps each row in train_labels.csv to metadata.csv</li>
<li><code>region (str)</code>: US region in which the sample was taken. Scores are calculated separately for each of these regions, and then averaged. See the Problem Description page for details.</li>
<li><code>severity (int)</code>: placeholder for severity level based on the cyanobacteria density - all values are 0. This is the column that you will replace with your own predictions to create a submission. Participants should submit predictions for severity level, NOT for the raw cell density value in cells per mL.</li>
</ul>
<div class="cell" data-execution_count="38">
<div class="sourceCode cell-code" id="cb65" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb65-1">submission_format.shape</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="38">
<pre><code>(6510, 2)</code></pre>
</div>
</div>
</section>
</section>
<section id="a-good-place-to-stop-for-part-1" class="level2">
<h2 class="anchored" data-anchor-id="a-good-place-to-stop-for-part-1">A good place to stop for part 1</h2>
<p>We’ve got a Python conda environment for geospatial analysis and have used some of the libraries as we explored the project data files. Now we’re ready to tackle the satellite imagery data in part 2.</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div quarto-reuse="quarto-reuse" class="quarto-appendix-contents"><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</a></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{isken2023,
  author = {Mark Isken},
  title = {Algal Bloom Detection Extended Tutorial - {Part} 1:
    Geospatial Libraries},
  date = {2023-01-25},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-isken2023" class="csl-entry quarto-appendix-citeas">
Mark Isken. 2023. <span>“Algal Bloom Detection Extended Tutorial - Part
1: Geospatial Libraries.”</span> January 25, 2023.
</div></div></section></div> ]]></description>
  <category>geonewb</category>
  <category>geospatial</category>
  <category>python</category>
  <category>conda</category>
  <guid>https://bitsofanalytics.org/posts/algaebloom-part1/</guid>
  <pubDate>Wed, 25 Jan 2023 05:00:00 GMT</pubDate>
</item>
</channel>
</rss>
