Thursday, October 16, 2025

Disaggregating Aggregated [Socioeconomic] Data into Grid-Level Representations Using PyInterpolate and Complementary Approaches for Humanitarian Applications

Suggested By: Khizer Zakir, Lorenz Wendt
 

Objective:
To investigate how geostatistical interpolation methods, as implemented in PyInterpolate, can be combined with complementary disaggregation approaches (dasymetric mapping, population weighting, machine learning-based downscaling) to transform aggregated socioeconomic data, such as demographics, education, health, or disease prevalence into fine-grained, grid-level representations. These representations may take the form of regular pixels or hexagonal H3 cells, depending on the resolution and structure most suitable for machine learning models. The thesis will quantitatively evaluate how such disaggregation improves the applicability and performance of ML models in humanitarian scenarios. In addition, this activity could include the uncertainties and explainability of such approaches.
 

Short Description:
Many critical datasets relevant to humanitarian decision-making, such as population demographics, education indicators, healthcare access, or disease spread are typically available only at coarse administrative levels (country, province, district). However, state-of-the-art machine learning models for spatial analysis generally operate on high-resolution gridded data, especially when integrating with environmental or remote sensing datasets. This mismatch in spatial resolution poses a barrier to building comprehensive, data-driven humanitarian models.

This thesis proposes to bridge this gap by studying the use of PyInterpolate and related interpolation/disaggregation systems to generate grid-level approximations of aggregated socioeconomic data. Both pixel grids and H3 hexagonal grids will be evaluated for their suitability in integrating heterogeneous datasets. The study will further assess the uncertainty of disaggregation outputs and their downstream impact on ML-based predictions.

Such an approach can be particularly important in humanitarian applications, where access to high-resolution socioeconomic data is scarce or delayed. Potential applications include:

  • Disease spread modeling, where fine-scale integration of demographic and health data can improve outbreak prediction.
  • Migration and human mobility studies, where disaggregated socioeconomic data can be compared and integrated with environmental drivers (floods, droughts, land degradation) at grid level to better understand displacement dynamics and population movements during crises.
  • Disaster preparedness and response, where combining socioeconomic vulnerability layers with hazard data enables better risk assessments.
  • Resource allocation and crisis monitoring, where timely, high-resolution information supports more equitable and effective humanitarian interventions.

 

Suggested Methodology:

  • Geostatistical interpolation with PyInterpolate (kriging-based techniques).
  • Covariate-driven disaggregation, using auxiliary layers such as land use, night-time lights, road networks, or population density as predictors of within-unit variation.
  • Uncertainty quantification, with a particular focus on Bayesian approaches (Bayesian kriging, Bayesian hierarchical models, or probabilistic ML) to explicitly model uncertainty in disaggregation outputs and evaluate their downstream impact on ML-based predictions.

Start: Anytime

Relevant Studies:

  1. Moliński, S., (2022). Pyinterpolate: Spatial interpolation in Python for point measurements and aggregated datasets. Journal of Open Source Software, 7(70), 2869, https://doi.org/10.21105/joss.02869 
  2. Stevens, Forrest R., Andrea E. Gaughan, Catherine Linard, and Andrew J. Tatem., (2015). “Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data.” PLoS ONE 10 (2): e0107042. https://doi.org/10.1371/journal.pone.0107042 
  3. Wardrop, N. A., et al., (2018). “Spatially Disaggregated Population Estimates in the Absence of National Population and Housing Census Data.” Proceedings of the National Academy of Sciences 115 (14): 3529–37. https://doi.org/10.1073/pnas.1715305115 


No comments: