IELE 2025 1

Prof: Leo Ferres

Overview

Curious how scientists predict outbreaks before they happen? Wonder what it takes to build models that inform real-world public health decisions? Want to use Bayesian statistics, geospatial tools, and real data to tackle an actual crisis? Like the idea of working in teams to solve a fast-moving, high-stakes epidemiological puzzle?

In this course, you'll take on the role of data scientists racing to model the spread of dengue in Chile, a disease on the move thanks to climate change, urbanization, and global travel. You'll collect messy, real-world data, learn modern tools like Python, Stan, and QGIS, and build predictive models that aim to forecast where the next outbreak will hit. This is not a toy problem. It's the kind of challenge public health agencies face every day. By the end, you'll not only understand the science of prediction, you’ll have practiced it.

Teaching philosophy

This is not a typical course and that's by design. Rather than traditional lectures, this is a project-driven, skill-based course designed to immerse you in the actual practice of scientific modeling and data analysis. You won't be sitting through lectures while I walk you through material. Instead, you'll learn by doing, tackling real-world problems using tools and methods drawn from contemporary data science and epidemiology. Every week (or sometimes every two weeks, depending on the complexity), you'll receive an assignment that introduces a specific type of skill or analytical technique. These are not isolated exercises—they build toward a larger goal: constructing a functioning predictive model of dengue spread in Chile. Think of each assignment as a piece of that puzzle. The structure is intentionally inverted: I won't be lecturing on each topic in advance. Instead, I'll guide you to high-quality external tutorials, documentation, and video content (e.g., from YouTube or official project pages). You'll study the material independently before class. Then, during class, we'll use our time together to dive deeper, discuss the content, troubleshoot your work, and collaboratively address any roadblocks. To make this work, your engagement is essential. You're expected to come to class having reviewed the assigned materials, having tried the assignment, and critically having real questions. The learning will happen through interaction, collaboration, and problem-solving. My role is not to lecture, but to help you think, guide you through complexity, and learn alongside you as a team. This course is designed to feel closer to a research lab or collaborative project environment than a lecture hall. If you're curious, self-directed, and ready to take on challenging, meaningful problems, you'll thrive here.

Topics

This is a data-driven course on modeling spatial, environmental, and epidemiological phenomena using Bayesian methods. Emphasis is on integrating diverse datasets and implementing models in Stan and Python. Topics include:

Tools for data science workflows
- Jupyter, VSCode, QGIS
- Python: pandas, geopandas, seaborn
- Obsidian, Zotero, Overleaf, GitHub
GIS and spatial data
- Administrative boundaries vs. grid systems
Population estimation
- Census vs. satellite-based proxies (e.g., nighttime lights)
Environmental monitoring
- Ground sensors vs. satellite data
Air quality
- Data fusion from sensors and satellites
Epidemiological and socio-economic datasets
Bayesian data analysis
- Introduction to Bayesian inference
- Prior specification and prior predictive checks
- Bayesian workflow in Stan
Regression modeling
- Linear regression in Stan
- Multiple regression with post-stratification (MRP)
- Negative binomial regression for count data (e.g., dengue cases)
Compartmental disease models
- SIR and SEIR models in Stan
- Model building and validation workflows
Applications
- Mobility and transportation data
- Health, pollution, and socio-economic integration
- Modeling dengue dynamics in Chile

Assignments

General Description: This course is structured around five cumulative assignments that progressively build technical and analytical skills in geospatial analysis, environmental modeling, and Bayesian inference, with a focus on public health and socio-environmental data. Each task introduces new tools and concepts while reinforcing prior work, culminating in a final project involving Bayesian disease modeling with real-world data.

Assignment 1: QGIS Project Setup and Spatial Grids [markdown, pdf, in Spanish]
Students install QGIS, download administrative and infrastructure data for a South American country, and create two spatial grids (500m and 250m resolution). This forms the foundation for all subsequent analyses.
Assignment 2: Population, Airports, and Nightlights [markdown, pdf, in Spanish]
Using Python, students build choropleth maps of population, overlay airport data, and correlate official population statistics with satellite nightlight data to explore proxies for human activity.
Assignment 3: Climate Data and Temporal Trends [markdown, pdf, in Spanish]
Students analyze multiyear climatic variables (temperature, rainfall, humidity, wind) using netCDF data. They map spatial patterns, identify climatic zones, and detect long-term trends and extremes through temporal analysis.
Assignment 4: Air Quality and Pollution Modeling [markdown, pdf, in Spanish]
This task focuses on analyzing air pollution data (PM2.5, PM10, NO₂, SO₂, O₃). Students examine temporal dynamics, model pollution episodes, and correlate pollution with demographic and meteorological factors, culminating in predictive models.
Assignment 5: Bayesian Modeling of Disease and Inequality [markdown, pdf, in Spanish]
Students integrate socioeconomic indicators and epidemiological data (ideally dengue) to fit Bayesian regression models using Stan. The goal is to identify key predictors of disease presence and quantify spatial risk and uncertainty.