This week was mostly about evaluating the current model, comparing it naiively with Marigold - always scale&shifting the relative depth estimation output of both models to be comparable with the metric ground truth.
Caveat: All models where scale&shifted w.r.t. the valid sky region. Even though the datasets are synthetic, the were remnants of "sky pixels" which threw off the scale&shift fitting - especially in the case of the "real-drive-sim" dataset.
For VKITTI2 (evaluated on 1.4k samples) we have the following scores:
MethodUncertMarigoldRGBFID↓11.31−PSNR↑25.30−SSIM↑0.77−DepthAbsRel↓0.510.72RMSE↓11.9824.37RMSEp90↓157.23226.37PSNR↑19.8617.77SSIM↑0.700.64
Comparing (green: improvement, red: worse) how the delta error between the uncert and marigold model behaves.
For the real-drive-sim dataset (evaluated on just 800 samples) we get:
MethodUncertMarigoldRGBFID↓9.48−PSNR↑24.66−SSIM↑0.86−DepthAbsRel↓0.995.00RMSE↓110.38295.36RMSEp90↓50194756PSNR↑19.1118.99SSIM↑0.740.73
Comparing (green: improvement, red: worse) how the delta error between the uncert and marigold model behaves.