Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
6 changes: 3 additions & 3 deletions 01-intro/jupyter-notebook.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ In order to run computer programs, we need a way to execute code written in a pr
The environment we will use is **Jupyter Notebook**, which allows us to write and run code within a single `.ipynb` document (i.e., **notebook**). They also allow us to embedded text and code.

:::{style="text-align: center"}
![An example of a Jupyter Notebook.](images/jupyter-oski.png){#fig-inflation fig-align=center width=90%}
![An example of a Jupyter Notebook.](images/jupyter-oski.png){#fig-inflation fig-align=center width=90% fig-alt=""}
:::

There's a lot going on in the above Jupyter Notebook screenshot: there is code, there is output from running code, there are pictures, and there is (non-code) text. We'll get to understanding all of these components in due time.
Expand Down Expand Up @@ -78,7 +78,7 @@ Jupyter Notebooks are made up of **cells**. There are two main types of cells:
When run, Python code cells are evaluated as a Python code snippet, one line at a time. The cell output displayed is the value of the _last_ evaluated expression:

:::{style="text-align: center"}
![Both expressions are evaluated, but the result of the last expression's evaluation is considered the output of the code cell.](images/jupyter-code-cell.png){#fig-inflation fig-align=center width=70%}
![Both expressions are evaluated, but the result of the last expression's evaluation is considered the output of the code cell.](images/jupyter-code-cell.png){#fig-inflation fig-align=center width=70% fig-alt=""}
:::

We will discuss this output/display phenomenon more in future notes.
Expand All @@ -88,7 +88,7 @@ To run a code cell, you can either hit the "Run" button in the Toolbar, or you c
**Markdown cells.** This is where you write text and images that aren’t Python code. Markdown is a language used for formatting text. A Markdown cell will always display its formatting when it is not in edit mode.

:::{style="text-align: center"}
![Left screenshot shows un-evaluated code cell and raw Markdown cell; right screenshot shows evaluated code cell and formatted text. To render formatted text for a selected markdown cell, exit editing mode for that cell. This screenshot starts with the code cell selected, then runs both that code cell and "runs" the markdown cell below.](images/jupyter-md-cell.png){#fig-inflation fig-align=center width=100%}
![Left screenshot shows un-evaluated code cell and raw Markdown cell; right screenshot shows evaluated code cell and formatted text. To render formatted text for a selected markdown cell, exit editing mode for that cell. This screenshot starts with the code cell selected, then runs both that code cell and "runs" the markdown cell below.](images/jupyter-md-cell.png){#fig-inflation fig-align=center width=100% fig-alt=""}
:::

Here is a [guide to Markdown formatting](https://www.markdownguide.org/cheat-sheet/). You’ll explore Markdown more in lab.
Expand Down
4 changes: 2 additions & 2 deletions 05-variables/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ It is challenging to use another person's data! The concepts have already been o
For now, we focus on variables as they exist in tabular data. In most of the tabular datasets we will examine, variables correspond to **columns** of features. Each row is a **record** of a datapoint, with different values of variables measured for that datapoint.

:::{style="text-align: center"}
![Variables as columns.](images/variable.png){#fig-inflation fig-align=center width=60%}
![Variables as columns.](images/variable.png){#fig-inflation fig-align=center width=60% fig-alt=""}
:::


Expand All @@ -49,7 +49,7 @@ Figure 2 has examples of each variable type.


:::{style="text-align: center"}
![Variable Types.](images/variable_types.png){#fig-inflation fig-align=center width=90%}
![Variable Types.](images/variable_types.png){#fig-inflation fig-align=center width=90% fig-alt=""}
:::

_What do we mean by "meaningful" arithmetic?_ From [Stat 20](https://www.stat20.org/1-questions-and-data/02-taxonomy-of-data/notes):
Expand Down
4 changes: 2 additions & 2 deletions 05-variables/units-of-analysis.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -53,13 +53,13 @@ Let's return to our American Community Survey (ACS) 2020 data. It shows educatio
From the [ACS webpage](https://www.census.gov/programs-surveys/acs/methodology/design-and-methodology.html), the American Community Survey (ACS) is an ongoing monthly survey that collects detailed housing and socioeconomic data.

:::{style="text-align: center"}
![ACS Household survey, which collects data on individual households.](images/acs_screenshot.png){#fig-inflation fig-align=center width=90%}
![ACS Household survey, which collects data on individual households.](images/acs_screenshot.png){#fig-inflation fig-align=center width=90% fig-alt=""}
:::

There are (at least) two datasets collected by the ACS: A private dataset of survey responses by household (Figure 1), and a public-facing dataset of responses by geographic region. The variables for the geographic region, a larger unit of analysis, are constructed via aggregation and estimation (Figure 2):

:::{style="text-align: center"}
![ACS reported public data, which reports aggregated data of households across a geographic region.](images/acs_aggregate.png){#fig-inflation fig-align=center width=90%}
![ACS reported public data, which reports aggregated data of households across a geographic region.](images/acs_aggregate.png){#fig-inflation fig-align=center width=90% fig-alt=""}
:::

Simple forms of aggregation are straightforward and involve counting and averaging---methods that are very possible using our limited Data Science toolkit thus far. However, disaggregation cannot be done without individual datapoints! There are various methods of estimating individuals from averages using statistics and distributions; we discuss this briefly in a few weeks, but you can take a statistics course for more information.
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion 06-variables-ii/sample-population.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The set of individuals we actually draw our sample from is the **sampling frame*
## Examples

:::{style="text-align: center"}
![A sampling frame may include individuals not in our population.](images/sampling-frame.png){#fig-inflation fig-align=center width=80%}
![A sampling frame may include individuals not in our population.](images/sampling-frame.png){#fig-inflation fig-align=center width=80% fig-alt=""}
:::

| Target Population | Collected sample |
Expand Down
8 changes: 4 additions & 4 deletions 07-visualizations/encoding.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Think of encoding as the bridge between your data and what people see on the scr
In bar charts, **length** can visually encode a numerical variable.

:::{style="text-align: center"}
![Bar Chart Example](images/barchart.png){#fig-barchart fig-align=center width=70%}
![Bar Chart Example](images/barchart.png){#fig-barchart fig-align=center width=70% fig-alt=""}
:::

This creates an intuitive mapping where the visual property (bar length) directly corresponds to the data value (average age).
Expand All @@ -27,7 +27,7 @@ This creates an intuitive mapping where the visual property (bar length) directl
Other visualizations can include multiple variables encoded simultaneously.

:::{style="text-align: center"}
![Multiple Encodings in a Scatter Plot](images/scatter.png){#fig-scatter fig-align=center width=80%}
![Multiple Encodings in a Scatter Plot](images/scatter.png){#fig-scatter fig-align=center width=80% fig-alt=""}
:::

### Quick Check: How Many Variables?
Expand All @@ -48,7 +48,7 @@ Look at the scatter plot above. How many different variables are being encoded?
As we learned when studying variables, different variable types (numerical vs. categorical, discrete vs. continuous, ordinal vs. nominal) have different properties. When creating visualizations, we need to match our encoding choices to these variable types.

:::{style="text-align: center"}
![Recall: Variable Types](images/variable_types.png){#fig-variable-types fig-align=center width=90%}
![Recall: Variable Types](images/variable_types.png){#fig-variable-types fig-align=center width=90% fig-alt=""}
:::

::: {.callout-important title="Key Principle"}
Expand All @@ -69,7 +69,7 @@ The table below summarizes which visual encodings work best for different types
### What's Wrong with This?

:::{style="text-align: center"}
![Problematic Car Manufacturer Chart](images/cars-graph.png){#fig-cars-graph fig-align=center width=70%}
![Problematic Car Manufacturer Chart](images/cars-graph.png){#fig-cars-graph fig-align=center width=70% fig-alt=""}
:::

**Problem**: This graph implies that Swedish cars are "greater" than cars from other countries in some sense, when they're not. If the variable is just "country of origin" (nominal categorical), using length encoding suggests an ordering that doesn't exist.
Expand Down
10 changes: 5 additions & 5 deletions 07-visualizations/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ To better understand these principles in action, let's examine how humans have u
What do you see when you look at this ancient artifact?

:::{style="text-align: center"}
![The World's First Map](images/world-map.jpg){#fig-ancient-map fig-align=center width=60%}
![The World's First Map](images/world-map.jpg){#fig-ancient-map fig-align=center width=60% fig-alt=""}
:::

This is a map depicting the town of Konya, Turkey - supposedly the world's first map, dating back to approximately 6200 BC. Even in prehistoric times, humans recognized the power of visual representation to communicate spatial relationships and important information.
Expand All @@ -47,7 +47,7 @@ One of the most famous examples of data visualization directly saving human live
**The Solution**: Dr. John Snow was skeptical of the miasma theory and suspected contaminated water. He created a revolutionary approach that became standard in epidemiology: **he drew a map**.

:::{style="text-align: center"}
![John Snow's Cholera Map](images/john-snow-cholera-map.png){#fig-cholera-map fig-align=center width=70%}
![John Snow's Cholera Map](images/john-snow-cholera-map.png){#fig-cholera-map fig-align=center width=70% fig-alt=""}
:::

**What the map revealed**:
Expand All @@ -64,7 +64,7 @@ One of the most famous examples of data visualization directly saving human live
Florence Nightingale wasn't just a pioneering nurse, she was also an innovative data visualizer. During the Crimean War, she created what's now called a "rose diagram" or "coxcomb chart" to visualize the causes of death among British soldiers.

:::{style="text-align: center"}
![Florence Nightingale's Rose Diagram](images/florence-nightingale-rose.png){#fig-rose-diagram fig-align=center width=60%}
![Florence Nightingale's Rose Diagram](images/florence-nightingale-rose.png){#fig-rose-diagram fig-align=center width=60% fig-alt=""}
:::

Her visualization revealed a shocking truth: more soldiers were dying from preventable diseases than from battle wounds. This wasn't just a pretty chart, it was a powerful argument that drove major reforms in military medical care. Nightingale understood that abstract statistics about mortality rates couldn't compete with the visual impact of her rose petals, where the size of each segment made the disparity impossible to ignore.
Expand All @@ -76,7 +76,7 @@ Her visualization revealed a shocking truth: more soldiers were dying from preve
Not all data visualization involves charts and graphs. Maya Lin's Vietnam War Memorial in Washington DC proves that data can be deeply emotional and memorial, not just analytical.

:::{style="text-align: center"}
![Vietnam War Memorial](images/veitnam-war-memorial.png){#fig-vietnam-memorial fig-align=center width=70%}
![Vietnam War Memorial](images/veitnam-war-memorial.png){#fig-vietnam-memorial fig-align=center width=70% fig-alt=""}
:::

Each of the 58,000+ names etched into the black granite represents one life lost. The chronological arrangement tells the story of the war's progression through time, while the reflective surface creates an intimate connection between viewers and the data you literally see yourself reflected among the names. This memorial demonstrates that the most powerful visualizations don't just inform us; they transform how we feel about the information.
Expand All @@ -86,7 +86,7 @@ Each of the 58,000+ names etched into the black granite represents one life lost
During the COVID-19 pandemic, data visualization became part of daily life. Suddenly, everyone from epidemiologists to elementary school students was reading line charts showing case trends and interpreting what those curves meant for their communities.

:::{style="text-align: center"}
![COVID-19 Case Tracking](images/coivd.png){#fig-covid-dashboard fig-align=center width=80%}
![COVID-19 Case Tracking](images/coivd.png){#fig-covid-dashboard fig-align=center width=80% fig-alt=""}
:::

Google's COVID tracking dashboard exemplified how modern visualization must be both accessible and updateable in real-time. The time series charts showed trends over months with clear visual indicators of peaks and valleys, but more importantly, they translated complex epidemiological data into something any concerned citizen could understand.
Expand Down
10 changes: 10 additions & 0 deletions 08-histograms/exercises.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -121,12 +121,14 @@ studio_distribution.show(6)
```

```{python}
#| fig-alt: "Distribution of studios responsible for the highest grossing movies as of 2017"
studio_distribution.barh('Studio')
```

Let's revisualize this barchart to display just the top five studios. In the below code, note how `.take` is used with `np.arange`:

```{python}
#| fig-alt: "Distribution of studios responsible for the top five highest grossing movies as of 2017"
studio_distribution.sort('count', descending=True).take(np.arange(5)).barh('Studio')
print("Five studios are largely responsible for the highest grossing movies")
```
Expand Down Expand Up @@ -157,12 +159,15 @@ min(ages), max(ages)

If you want to make equally sized bins, `np.arange()` is a great tool to help you.
```{python}
#| fig-alt: "Histogram of the age of the top grossing movies as of 2017 with equally sized bins and count on the y-axis"
top_movies.hist('Age', bins = np.arange(0, 110, 10), unit = 'Year', density=False)
```

## Histograms: Density

```{python}
#| fig-alt: "Histogram of the age of the top grossing movies as of 2017 with equally sized bins and 'Percent per Year' on the y-axis"

# default is density=True
top_movies.hist('Age', bins = np.arange(0, 110, 10), unit = 'Year')
```
Expand Down Expand Up @@ -196,6 +201,7 @@ binned_data
### Now, plot the histogram!

```{python}
#| fig-alt: "Histogram of the age of the top grossing movies as of 2017 using custom bins"
top_movies.hist('Age', bins = my_bins, unit = 'Year')
```

Expand Down Expand Up @@ -276,6 +282,7 @@ To check our work one last time, let's see if the numbers in the last column mat


```{python}
#| fig-alt: "Histogram of the age of the top grossing movies as of 2017 using custom bins"
top_movies.hist('Age', bins = my_bins, unit = 'Year')
```

Expand All @@ -296,6 +303,7 @@ flavor_table


```{python}
#| fig-alt: "Distribution of ice cream flavors"
flavor_table.barh('Flavor')
```

Expand All @@ -309,6 +317,7 @@ cone_average_price_table


```{python}
#| fig-alt: "Plot with one categorical attribute and one numerical attribute."
cone_average_price_table.barh('Flavor')
```

Expand All @@ -324,5 +333,6 @@ cones_pivot_table


```{python}
#| fig-alt: "Plot with two categorical attributes."
cones_pivot_table.barh('Color')
```
4 changes: 2 additions & 2 deletions 17-dictionaries/file-formats.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ We can use files to generate tables, or other useful data structures
Files are often stored in **folders**.

:::{style="text-align: center"}
![Within files are folders. A file can be loaded into Python.](images/directory_structure.png){#fig-inflation fig-align=center width=90%}
![Within files are folders. A file can be loaded into Python.](images/directory_structure.png){#fig-inflation fig-align=center width=90% fig-alt=""}
:::

We can categorize data as being in one of two broad categories:
Expand Down Expand Up @@ -71,7 +71,7 @@ In our example `pups` case, the `pups.csv` file is located in the `data` directo
What kinds of data can’t be stored in a tabular format? Lots of things: music, videos, maps, etc. Graph data and hierarchical data, like family trees, might also be non-tabular.

:::{style="text-align: center"}
![A family tree graph structure. At the root is Grandma, who has children Dad and Aunt. Dad has children Brother and Me, and Aunt has children Cousin 1 and Cousin 2. Cousin 2 has a one child, Cousin 2 Jr.](images/trees.png){#fig-inflation fig-align=center width=75%}
![A family tree graph structure. At the root is Grandma, who has children Dad and Aunt. Dad has children Brother and Me, and Aunt has children Cousin 1 and Cousin 2. Cousin 2 has a one child, Cousin 2 Jr.](images/trees.png){#fig-inflation fig-align=center width=75% fig-alt=""}
:::

### JSON
Expand Down
2 changes: 1 addition & 1 deletion 17-dictionaries/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ dog
dog.pop(4, None)
```

Why? Find out the answer in the official [Python documentation on `pop`](https://docs.python.org/3/library/stdtypes.html#dict.pop)!
Why? Find out the answer in the official [Python documentation](https://docs.python.org/3/library/stdtypes.html#dict.pop) on `pop`!

## Dictionary Properties

Expand Down
4 changes: 2 additions & 2 deletions 18-html/genius.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ To use the [Genius Lyrics](http://genius.com/) API, you need a special API key,


:::{style="text-align: center"}
![Genius API webpage.](images/Genius-API.png){#fig-inflation fig-align=center width=90%}
![Genius API webpage.](images/Genius-API.png){#fig-inflation fig-align=center width=90% fig-alt=""}
:::

You'll be prompted to sign up for [a Genius account](https://genius.com/signup_or_login), which is required to gain API access. Signing up for a Genius account is free and easy. You just need a Genius nickname (which must be one word), an email address, and a password.
Expand All @@ -38,7 +38,7 @@ Once you're signed in, you should be taken to [https://genius.com/api-clients](h


:::{style="text-align: center"}
![New API Client button.](images/Genius-New-API.png){#fig-inflation fig-align=center width=90%}
![New API Client button.](images/Genius-New-API.png){#fig-inflation fig-align=center width=90% fig-alt=""}
:::

After clicking "New API Client," you'll be prompted to fill out a short form about the "App" that you need the Genius API for. You only need to fill out "App Name" and "App Website URL."
Expand Down
2 changes: 1 addition & 1 deletion 18-html/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Your screen should look (something) like this:


:::{style="text-align: center"}
![Kittens Dev Tools.](http://static.decontextualize.com/snaps/kittens-dev-tools.png){#fig-inflation fig-align=center width=100%}
![Kittens Dev Tools.](http://static.decontextualize.com/snaps/kittens-dev-tools.png){#fig-inflation fig-align=center width=100% fig-alt=""}
:::

In the upper panel, you see the web page you're inspecting. In the lower panel, you see a version of the HTML source code, with little arrows next to some of the lines. (The little arrows allow you to collapse parts of the HTML source that are hierarchically related.) As you move your mouse over the elements in the top panel, different parts of the source code will be highlighted. Chrome is showing you which parts of the source code are causing which parts of the page to show up. Pretty spiffy!
Expand Down
2 changes: 1 addition & 1 deletion 21-genai/gemini.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ Consider the chat prompt shown in the screenshot, as well as (the start of) the


:::{style="text-align: center"}
![A screenshot of a Google Gemini chat conversation. Prompt is 'Explain how AI works in a few words.'. Response from Gemini chat is long but gets at the idea..](images/prompt-chat.png){#fig-inflation fig-align=center width=90%}
![A screenshot of a Google Gemini chat conversation. Prompt is 'Explain how AI works in a few words.'. Response from Gemini chat is long but gets at the idea..](images/prompt-chat.png){#fig-inflation fig-align=center width=90% fig-alt=""}
:::

We define three pieces of terminology to describe what is happening in the above screenshot:
Expand Down
8 changes: 6 additions & 2 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -173,10 +173,14 @@ website:

format:
html:
theme: lumen
theme: cosmos
fontsize: 1em
css: assets/styles.css
css:
- assets/styles.css
- assets/custom-error-colors.css
toc: true
include-in-header:
file: siteimprove.html
include-after-body:
- assets/a11y-fixes.html

Loading