IELE 2025 1
Prof: Leo Ferres
Overview
Curious how scientists predict outbreaks before they
happen? Wonder what it takes to build models that inform real-world
public health decisions? Want to use Bayesian statistics, geospatial
tools, and real data to tackle an actual crisis? Like the idea of
working in teams to solve a fast-moving, high-stakes epidemiological
puzzle?
In this course, you'll take on the role of data scientists racing to
model the spread of dengue in Chile, a disease on the move thanks to
climate change, urbanization, and global travel. You'll collect messy,
real-world data, learn modern tools like Python, Stan, and QGIS, and
build predictive models that aim to forecast where the next outbreak
will hit. This is not a toy problem. It's the kind of challenge public
health agencies face every day. By the end, you'll not only understand
the science of prediction, you’ll have practiced it.
Teaching philosophy
This is not a typical course and that's by design.
Rather than traditional lectures, this is a project-driven,
skill-based course designed to immerse you in the actual practice
of scientific modeling and data analysis. You won't be sitting through
lectures while I walk you through material. Instead, you'll learn by
doing, tackling real-world problems using tools and methods drawn from
contemporary data science and epidemiology.
Every week (or sometimes every two weeks, depending on the
complexity), you'll receive an assignment that introduces a specific
type of skill or analytical technique. These are not isolated
exercises—they build toward a larger goal: constructing a functioning
predictive model of dengue spread in Chile. Think of each assignment
as a piece of that puzzle.
The structure is intentionally inverted: I won't be lecturing
on each topic in advance. Instead, I'll guide you to high-quality
external tutorials, documentation, and video content (e.g., from
YouTube or official project pages). You'll study the material
independently before class. Then, during class, we'll use our
time together to dive deeper, discuss the content, troubleshoot your
work, and collaboratively address any roadblocks.
To make this work, your engagement is essential. You're
expected to come to class having reviewed the assigned materials,
having tried the assignment, and critically having real
questions. The learning will happen through interaction,
collaboration, and problem-solving. My role is not to lecture, but to
help you think, guide you through complexity, and learn alongside you
as a team.
This course is designed to feel closer to a research lab or
collaborative project environment than a lecture hall. If you're
curious, self-directed, and ready to take on challenging, meaningful
problems, you'll thrive here.
Topics
This is a data-driven course on modeling spatial, environmental, and
epidemiological phenomena using Bayesian methods. Emphasis is on
integrating diverse datasets and implementing models in Stan and
Python. Topics include:
- Tools for data science workflows
- Jupyter, VSCode, QGIS
- Python:
pandas
, geopandas
, seaborn
- Obsidian, Zotero, Overleaf, GitHub
- GIS and spatial data
- Administrative boundaries vs. grid systems
- Population estimation
- Census vs. satellite-based proxies (e.g., nighttime lights)
- Environmental monitoring
- Ground sensors vs. satellite data
- Air quality
- Data fusion from sensors and satellites
- Epidemiological and socio-economic datasets
- Bayesian data analysis
- Introduction to Bayesian inference
- Prior specification and prior predictive checks
- Bayesian workflow in Stan
- Regression modeling
- Linear regression in Stan
- Multiple regression with post-stratification (MRP)
- Negative binomial regression for count data (e.g., dengue cases)
- Compartmental disease models
- SIR and SEIR models in Stan
- Model building and validation workflows
- Applications
- Mobility and transportation data
- Health, pollution, and socio-economic integration
- Modeling dengue dynamics in Chile
Assignments
General Description: This course is structured around five
cumulative assignments that progressively build technical and
analytical skills in geospatial analysis, environmental modeling, and
Bayesian inference, with a focus on public health and
socio-environmental data. Each task introduces new tools and concepts
while reinforcing prior work, culminating in a final project involving
Bayesian disease modeling with real-world data.
- Assignment 1: QGIS Project Setup and Spatial Grids [markdown, pdf, in Spanish]
Students install QGIS, download administrative and infrastructure
data for a South American country, and create two spatial grids
(500m and 250m resolution). This forms the foundation for all
subsequent analyses.
- Assignment 2: Population, Airports, and Nightlights [markdown, pdf, in Spanish]
Using Python, students build choropleth maps of population, overlay
airport data, and correlate official population statistics with
satellite nightlight data to explore proxies for human activity.
- Assignment 3: Climate Data and Temporal Trends
[markdown, pdf, in Spanish]
Students analyze
multiyear climatic variables (temperature, rainfall, humidity, wind)
using netCDF data. They map spatial patterns, identify climatic
zones, and detect long-term trends and extremes through temporal
analysis.
- Assignment 4: Air Quality and Pollution Modeling [markdown, pdf, in Spanish]
This
task focuses on analyzing air pollution data (PM2.5, PM10, NO₂, SO₂,
O₃). Students examine temporal dynamics, model pollution episodes,
and correlate pollution with demographic and meteorological factors,
culminating in predictive models.
- Assignment 5: Bayesian Modeling of Disease and Inequality
[markdown, pdf, in Spanish]
Students
integrate socioeconomic indicators and epidemiological data (ideally
dengue) to fit Bayesian regression models using Stan. The goal is to
identify key predictors of disease presence and quantify spatial
risk and uncertainty.