ProDiG: Progressive Diffusion-Guided Gaussian Splatting for Aerial to Ground Reconstruction

CVPR Findings 2026

Sirshapan Mitra1, Yogesh S. Rawat1

1Center for Research in Computer Vision, University of Central Florida

Abstract

Overview of ProDiG
ProDiG: Site Reconstruction only using Aerial Images

ProDiG addresses aerial-to-ground reconstruction when only aerial imagery is available. Rather than directly jumping from aerial views to ground-level rendering, it progressively synthesizes intermediate altitudes and refines the Gaussian scene representation at each stage. The method combines a geometry-aware causal attention module for diffusion-based refinement with a distance-adaptive Gaussian module that adjusts scale and opacity based on camera distance. This enables more stable, coherent, and realistic ground-level reconstructions under extreme viewpoint gaps.

Method Overview

Overview of ProDiG
Figure placeholder: use the overview figure from page 2 of the paper.
Progressive Altitude Refinement Causal Attention Mixing Epipolar Geometry Conditioning Distance-Adaptive Gaussian Module

ProDiG progressively lowers the viewpoint altitude, renders noisy novel views, refines them with aeroFix, and adds the fixed views back into training. The diffusion module uses pose-aware conditioning, Plücker ray embeddings, and epipolar-constrained causal attention to preserve structural consistency across large viewpoint changes.

aeroFix: Geometry-Aware Diffusion Refinement

aeroFix block diagram
Figure placeholder: use the aeroFix diagram from page 4 of the paper.

aeroFix refines noisy novel renders using a reference view while explicitly constraining cross-view attention. The model masks novel-query/reference-key interactions using epipolar lines, blocks reference-query/novel-key attention to preserve causality, and injects pose difference information into diffusion conditioning. Multi-scale Sobel weighting and DSSIM further preserve edges and perceptual consistency.

Qualitative Results

Qualitative aeroFix results
Figure placeholder: use the refinement comparison from page 6.

Compared with Difix3D+, aeroFix better preserves structure under large viewpoint differences and reduces reference-copying artifacts.

Aerial-to-Ground Reconstruction

Qualitative ProDiG reconstruction results
Figure placeholder: use the main qualitative comparison from page 6.

ProDiG produces more coherent geometry and more realistic ground-level renderings than 3DGS and Difix3D+ in challenging aerial-to-ground settings.

aeroFix Comparison

Method DreamSim ↓ PSNR ↑ SSIM ↑ LPIPS ↓
Difix3D+0.1520.470.540.42
Difix (LoRA)0.0721.450.590.30
Pose + Plücker0.0622.300.640.27
Pose + Plücker + Causal0.0323.350.680.24
aeroFix (Ours) 0.03 23.68 0.69 0.24

WRIVA Results

Site DreamSim ↓ PSNR ↑ SSIM ↑ LPIPS ↓
S060.5011.260.330.67
S010.2913.100.450.58

On WRIVA, ProDiG improves both structural and perceptual quality over strong Gaussian Splatting and diffusion-guided baselines.

Matrix City

Method DreamSim ↓ PSNR ↑ SSIM ↑ LPIPS ↓
3DGS0.5110.710.400.77
2DGS0.629.290.280.81
3DGS-MCMC0.4910.840.410.77
Scaffold-GS0.5410.190.350.75
Difix3D+0.4811.380.380.63
Ours 0.39 12.39 0.41 0.50

Generalization and Ablations

Generalization and ablation figures
Figure placeholder: use page 8 figures for varying-altitude generalization and ablations.

The paper reports that the Original Noisy Closer progressive strategy is the most stable overall, and that the distance-adaptive Gaussian module improves PSNR and SSIM especially when camera distances vary widely.

Key Contributions

1. Causal Attention Mixing

Epipolar-constrained attention makes diffusion refinement more geometrically grounded under large viewpoint changes.

2. Distance-Adaptive Gaussian Module

Gaussian scale and opacity are modulated using per-Gaussian features and camera distance for stable refinement across altitudes.

3. Progressive Altitude Refinement

Intermediate-altitude synthesis gradually bridges the aerial-to-ground distribution gap instead of making a single large leap.

BibTeX

@inproceedings{mitra2026prodig,
  title={ProDiG: Progressive Diffusion-Guided Gaussian Splatting for Aerial to Ground Reconstruction},
  author={Mitra, Sirshapan and Rawat, Yogesh S},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={22--32},
  year={2026}
}