ProDiG: Progressive Diffusion-Guided Gaussian Splatting for Aerial to Ground Reconstruction

CVPR Findings 2026

Sirshapan Mitra1, Yogesh S. Rawat1

1Center for Research in Computer Vision, University of Central Florida

Abstract

ProDiG addresses aerial-to-ground reconstruction when only aerial imagery is available. Rather than directly jumping from aerial views to ground-level rendering, it progressively synthesizes intermediate altitudes and refines the Gaussian scene representation at each stage. The method combines a geometry-aware causal attention module for diffusion-based refinement with a distance-adaptive Gaussian module that adjusts scale and opacity based on camera distance. This enables more stable, coherent, and realistic ground-level reconstructions under extreme viewpoint gaps.

Method Overview

Overview of ProDiG
Figure placeholder: use the overview figure from page 2 of the paper.
Progressive Altitude Refinement Causal Attention Mixing Epipolar Geometry Conditioning Distance-Adaptive Gaussian Module

ProDiG progressively lowers the viewpoint altitude, renders noisy novel views, refines them with aeroFix, and adds the fixed views back into training. The diffusion module uses pose-aware conditioning, Plücker ray embeddings, and epipolar-constrained causal attention to preserve structural consistency across large viewpoint changes.

aeroFix: Geometry-Aware Diffusion Refinement

aeroFix block diagram
Figure placeholder: use the aeroFix diagram from page 4 of the paper.

aeroFix refines noisy novel renders using a reference view while explicitly constraining cross-view attention. The model masks novel-query/reference-key interactions using epipolar lines, blocks reference-query/novel-key attention to preserve causality, and injects pose difference information into diffusion conditioning. Multi-scale Sobel weighting and DSSIM further preserve edges and perceptual consistency.

Qualitative Results

Qualitative aeroFix results
Figure placeholder: use the refinement comparison from page 6.

Compared with Difix3D+, aeroFix better preserves structure under large viewpoint differences and reduces reference-copying artifacts.

Aerial-to-Ground Reconstruction

Qualitative ProDiG reconstruction results
Figure placeholder: use the main qualitative comparison from page 6.

ProDiG produces more coherent geometry and more realistic ground-level renderings than 3DGS and Difix3D+ in challenging aerial-to-ground settings.

aeroFix Comparison

Method DreamSim ↓ PSNR ↑ SSIM ↑ LPIPS ↓
Difix3D+0.1520.470.540.42
Difix (LoRA)0.0721.450.590.30
Pose + Plücker0.0622.300.640.27
Pose + Plücker + Causal0.0323.350.680.24
aeroFix (Ours) 0.03 23.68 0.69 0.24

WRIVA Results

Site DreamSim ↓ PSNR ↑ SSIM ↑ LPIPS ↓
S060.5011.260.330.67
S010.2913.100.450.58

On WRIVA, ProDiG improves both structural and perceptual quality over strong Gaussian Splatting and diffusion-guided baselines.

Matrix City

Method DreamSim ↓ PSNR ↑ SSIM ↑ LPIPS ↓
3DGS0.5110.710.400.77
2DGS0.629.290.280.81
3DGS-MCMC0.4910.840.410.77
Scaffold-GS0.5410.190.350.75
Difix3D+0.4811.380.380.63
Ours 0.39 12.39 0.41 0.50

Generalization and Ablations

Generalization and ablation figures
Figure placeholder: use page 8 figures for varying-altitude generalization and ablations.

The paper reports that the Original Noisy Closer progressive strategy is the most stable overall, and that the distance-adaptive Gaussian module improves PSNR and SSIM especially when camera distances vary widely.

Key Contributions

1. Causal Attention Mixing

Epipolar-constrained attention makes diffusion refinement more geometrically grounded under large viewpoint changes.

2. Distance-Adaptive Gaussian Module

Gaussian scale and opacity are modulated using per-Gaussian features and camera distance for stable refinement across altitudes.

3. Progressive Altitude Refinement

Intermediate-altitude synthesis gradually bridges the aerial-to-ground distribution gap instead of making a single large leap.

BibTeX

@article{mitra2026prodig,
  title={ProDiG: Progressive Diffusion-Guided Gaussian Splatting for Aerial to Ground Reconstruction},
  author={Mitra, Sirshapan and Rawat, Yogesh S},
  journal=CVPR Findings 2026},
  year={2026}
}