ProDiG: Progressive Diffusion-Guided Gaussian Splatting for Aerial to Ground Reconstruction
1Center for Research in Computer Vision, University of Central Florida
Abstract
ProDiG addresses aerial-to-ground reconstruction when only aerial imagery is available. Rather than directly jumping from aerial views to ground-level rendering, it progressively synthesizes intermediate altitudes and refines the Gaussian scene representation at each stage. The method combines a geometry-aware causal attention module for diffusion-based refinement with a distance-adaptive Gaussian module that adjusts scale and opacity based on camera distance. This enables more stable, coherent, and realistic ground-level reconstructions under extreme viewpoint gaps.
Method Overview
ProDiG progressively lowers the viewpoint altitude, renders noisy novel views, refines them with aeroFix, and adds the fixed views back into training. The diffusion module uses pose-aware conditioning, Plücker ray embeddings, and epipolar-constrained causal attention to preserve structural consistency across large viewpoint changes.
aeroFix: Geometry-Aware Diffusion Refinement
aeroFix refines noisy novel renders using a reference view while explicitly constraining cross-view attention. The model masks novel-query/reference-key interactions using epipolar lines, blocks reference-query/novel-key attention to preserve causality, and injects pose difference information into diffusion conditioning. Multi-scale Sobel weighting and DSSIM further preserve edges and perceptual consistency.
Qualitative Results
Compared with Difix3D+, aeroFix better preserves structure under large viewpoint differences and reduces reference-copying artifacts.
Aerial-to-Ground Reconstruction
ProDiG produces more coherent geometry and more realistic ground-level renderings than 3DGS and Difix3D+ in challenging aerial-to-ground settings.
aeroFix Comparison
| Method | DreamSim ↓ | PSNR ↑ | SSIM ↑ | LPIPS ↓ |
|---|---|---|---|---|
| Difix3D+ | 0.15 | 20.47 | 0.54 | 0.42 |
| Difix (LoRA) | 0.07 | 21.45 | 0.59 | 0.30 |
| Pose + Plücker | 0.06 | 22.30 | 0.64 | 0.27 |
| Pose + Plücker + Causal | 0.03 | 23.35 | 0.68 | 0.24 |
| aeroFix (Ours) | 0.03 | 23.68 | 0.69 | 0.24 |
WRIVA Results
| Site | DreamSim ↓ | PSNR ↑ | SSIM ↑ | LPIPS ↓ |
|---|---|---|---|---|
| S06 | 0.50 | 11.26 | 0.33 | 0.67 |
| S01 | 0.29 | 13.10 | 0.45 | 0.58 |
On WRIVA, ProDiG improves both structural and perceptual quality over strong Gaussian Splatting and diffusion-guided baselines.
Matrix City
| Method | DreamSim ↓ | PSNR ↑ | SSIM ↑ | LPIPS ↓ |
|---|---|---|---|---|
| 3DGS | 0.51 | 10.71 | 0.40 | 0.77 |
| 2DGS | 0.62 | 9.29 | 0.28 | 0.81 |
| 3DGS-MCMC | 0.49 | 10.84 | 0.41 | 0.77 |
| Scaffold-GS | 0.54 | 10.19 | 0.35 | 0.75 |
| Difix3D+ | 0.48 | 11.38 | 0.38 | 0.63 |
| Ours | 0.39 | 12.39 | 0.41 | 0.50 |
Generalization and Ablations
The paper reports that the Original Noisy Closer progressive strategy is the most stable overall, and that the distance-adaptive Gaussian module improves PSNR and SSIM especially when camera distances vary widely.
Key Contributions
1. Causal Attention Mixing
Epipolar-constrained attention makes diffusion refinement more geometrically grounded under large viewpoint changes.
2. Distance-Adaptive Gaussian Module
Gaussian scale and opacity are modulated using per-Gaussian features and camera distance for stable refinement across altitudes.
3. Progressive Altitude Refinement
Intermediate-altitude synthesis gradually bridges the aerial-to-ground distribution gap instead of making a single large leap.
BibTeX
@article{mitra2026prodig,
title={ProDiG: Progressive Diffusion-Guided Gaussian Splatting for Aerial to Ground Reconstruction},
author={Mitra, Sirshapan and Rawat, Yogesh S},
journal=CVPR Findings 2026},
year={2026}
}