This week was mostly about evaluating the current model, comparing it naiively with Marigold - always scale&shifting the relative depth estimation output of both models to be comparable with the metric ground truth.

Caveat: All models where scale&shifted w.r.t. the valid sky region. Even though the datasets are synthetic, the were remnants of "sky pixels" which threw off the scale&shift fitting - especially in the case of the "real-drive-sim" dataset.

For VKITTI2 (evaluated on 1.4k samples) we have the following scores: RGBDepthMethodFIDPSNRSSIMAbsRelRMSERMSEp90PSNRSSIMUncert11.3125.300.770.5111.98157.2319.860.70Marigold0.7224.37226.3717.770.64\def\arraystretch{1.3} \begin{array}{l|ccc|ccccc} \hline & \textbf{RGB} & & & \textbf{Depth} & & & & \\ \textbf{Method} & \text{FID}{\downarrow} & \text{PSNR}{\uparrow} & \text{SSIM}{\uparrow} & \text{AbsRel}{\downarrow} & \text{RMSE}{\downarrow} & \text{RMSE}_{p90}{\downarrow} & \text{PSNR}{\uparrow} & \text{SSIM}{\uparrow} \\ \hline \text{Uncert} & 11.31 & 25.30 & 0.77 & 0.51 & 11.98 & 157.23 & 19.86 & 0.70 \\ \text{Marigold} & - & - & - & 0.72 & 24.37 & 226.37 & 17.77 & 0.64 \\ \hline \end{array}

image.png
Comparing (green: improvement, red: worse) how the delta error between the uncert and marigold model behaves.

For the real-drive-sim dataset (evaluated on just 800 samples) we get: RGBDepthMethodFIDPSNRSSIMAbsRelRMSERMSEp90PSNRSSIMUncert9.4824.660.860.99110.38501919.110.74Marigold5.00295.36475618.990.73\def\arraystretch{1.3} \begin{array}{l|ccc|ccccc} \hline & \textbf{RGB} & & & \textbf{Depth} & & & & \\ \textbf{Method} & \text{FID}{\downarrow} & \text{PSNR}{\uparrow} & \text{SSIM}{\uparrow} & \text{AbsRel}{\downarrow} & \text{RMSE}{\downarrow} & \text{RMSE}_{p90}{\downarrow} & \text{PSNR}{\uparrow} & \text{SSIM}{\uparrow} \\ \hline \text{Uncert} & 9.48 & 24.66 & 0.86 & 0.99 & 110.38 & 5019 & 19.11 & 0.74 \\ \text{Marigold} & - & - & - & 5.00 & 295.36 & 4756 & 18.99 & 0.73 \\ \hline \end{array}

image.png
Comparing (green: improvement, red: worse) how the delta error between the uncert and marigold model behaves.