Citizen Science and STEM Education with R
Teaching and Research Companion Notebook — Open Urban Air Data from Madrid (2020–2024)
1 💡 Purpose of the Notebook
This Quarto Notebook serves as a teaching and research companion to the article
Citizen Science and STEM Education with R: Reproducible Learning from Open Urban Air Quality Data (Applied Sciences, 2025).
It reproduces the main analytical workflow implemented in the study and illustrates how R and Quarto can be integrated into STEM education to foster data literacy, environmental awareness, and methodological transparency.
2 🔁 Reproducible Data Workflow
The complete workflow integrates open data, computational reproducibility, and STEM learning.
It can be applied to other urban contexts or courses focusing on environmental informatics, statistical modelling, or sustainability transitions.
3 🗂️️ Data Sources and Structure
3.1 Air Quality Data
Air quality datasets were retrieved from the Madrid Open Data Portal (Portal de Datos Abiertos del Ayuntamiento de Madrid).
Measurements include nitrogen dioxide (NO₂), ozone (O₃), particulate matter (PM₁₀, PM₂.₅), sulphur dioxide (SO₂), and carbon monoxide (CO) recorded hourly across 24 urban stations (2020–2024).
3.2 Pollutant Coverage
Each station has different pollutant coverage and measurement frequency, which provides an excellent example for students to explore data completeness and measurement uncertainty in open environmental datasets.
3.3 Data Processing Workflow
Data from both sources were processed in R through three main stages:
- Reading and cleaning monthly CSVs (removing redundant columns and correcting data types).
- Validating records with confirmed measurements (
VAL
flag).
- Pivoting and compressing results into
parquet
format for efficiency and consistency.
4 📊 Exploratory Analysis
Exploratory analysis introduces students to descriptive statistics and visual analytics in R using open environmental data.
The focus is on NO₂ (a primary traffic-related pollutant) and O₃ (a secondary pollutant formed photochemically), both key indicators of urban air quality dynamics.
4.1 Annual and Seasonal Variability
4.2 Distribution Analysis
Boxplots provide a powerful visual tool to discuss dispersion, central tendency, and outliers across pollutants.
In this context, students learn how descriptive statistics translate into environmental interpretation, reinforcing quantitative reasoning with real data.
#🔮 Forecasting with Prophet
Time-series forecasting introduces students to predictive modelling using open environmental data.
The Prophet model (Taylor & Letham, 2018) was selected for its interpretability, decomposition structure, and robustness to missing values — key features for teaching reproducible forecasting in R.
4.3 Model for NO₂
Students can visualise how additive components — trend, seasonality, and residuals — reveal the influence of human activity and meteorological cycles on pollutant evolution. This exercise supports reproducible experimentation with forecasting horizons, cross-validation, and performance metrics such as RMSE or MAE.
5 🌦 Meteorological Integration
Meteorological factors shape pollutant behaviour and are fundamental in understanding atmospheric processes.
By integrating temperature, solar radiation, and wind speed data from AEMET, learners can explore multivariate relationships within an urban ecosystem.
5.1 Integration Workflow
6 🎓 Learning and Reproducibility Framework
Reproducibility is both a scientific and pedagogical value.
This framework unifies open data, transparent computation, and educational innovation, reinforcing the culture of open science.
7 🧑🏫 Educational Applications
This Notebook can be directly incorporated into undergraduate or postgraduate STEM courses focused on data analysis, environmental informatics, or sustainability.
Suggested learning activities: 1. Reproduce pollutant forecasts with modified training periods.
2. Explore correlations between additional meteorological variables.
3. Design inquiry-based projects connecting data to local environmental policies.
4. Document and publish reproducible reports using Quarto and GitHub.
Through these exercises, students not only practise coding but also embrace scientific integrity and civic engagement through data.
8 🌐 Repository and Citation
All code, figures, and harmonised datasets are openly available at:
https://github.com/jcaceres-academico/OpenUrbanAirandMeteorological
When citing this educational resource, please use:
Cáceres-Tello, J.; Galán-Hernández, J.J. (2025). Citizen Science and STEM Education with R: Reproducible Learning from Open Urban Air Quality Data. Applied Sciences, 15(x), xxxx.
DOI: [placeholder]
This ensures traceability and recognition for open-source academic contributions.