1 pending

Run Dossier

Training run for SD 2.1. Mari

21 metrics · 35,404 train · 78 val

Chart Overlays

Compare this run against its peers

0/48
Add peer runs here to layer their train and val traces onto the current metric charts.
crashed Apr 19, 18:54 Duration: 15h 42m Python 3.11.7 train_joint.py --config /cluster/home/drothenpiele/models/stable_diffusion_mari/mari/.euler-launches/euler_launch_03f0378a-f172-4372-afe2-a8f002ff06cd/joint_vkitti.yaml --resume /cluster/scratch/drothenpiele/euler_train/mari/checkpoints/joint-dehaze-metric-2026-04-18-08-37-01-02cc/checkpoint-9
lr 4e-5batch 14epochs 200precision fp16model stable-diffusion-2-1 seed 42wd 0grad accum 1warmup 0.05000ema nograd clip 1
ID 64070686
mari/runs/2026-04-19_18-54-29_b4e6
Error
BadZipFile: Caught BadZipFile in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/cluster/home/drothenpiele/.cache/venv/mari/lib/python3.11/site-packages/torch/utils/d...
Metrics 21/21
depth.train 4
diag 3
depth_abs_rel
depth_delta1
depth_mae
loss 1
depth
rgb.train 11
diag 2
image_psnr
image_ssim
loss 7
dehaze
preserve
total
visibility
visibility_aux
visibility_rank
visibility_tv
stat 2
visibility_mean
visibility_std
sys.train 6
gpu_mem_total_gb
gpu_mem_used_gb
gpu_mem_util_pct
gpu_util_pct
lr
visibility_aux_scale
X-Axis
Y-Scale
Series
Group
Namespace

Depth Abs Rel

kind=diag
V
2026-04-19_18-54-29_b4e6 val (smoothed)
2026-04-19_18-54-29_b4e6 val (raw)

Depth Delta1

kind=diag
V
2026-04-19_18-54-29_b4e6 val (smoothed)
2026-04-19_18-54-29_b4e6 val (raw)

Depth Mae

kind=diag
V
2026-04-19_18-54-29_b4e6 val (smoothed)
2026-04-19_18-54-29_b4e6 val (raw)

Depth

kind=loss
T
2026-04-19_18-54-29_b4e6 train (smoothed)
2026-04-19_18-54-29_b4e6 train (raw)

Image Psnr

dB kind=diag
V
2026-04-19_18-54-29_b4e6 val (smoothed)
2026-04-19_18-54-29_b4e6 val (raw)

Image Ssim

kind=diag
V
2026-04-19_18-54-29_b4e6 val (smoothed)
2026-04-19_18-54-29_b4e6 val (raw)

Dehaze

kind=loss
T
2026-04-19_18-54-29_b4e6 train (smoothed)
2026-04-19_18-54-29_b4e6 train (raw)

Preserve

kind=loss
T
2026-04-19_18-54-29_b4e6 train (smoothed)
2026-04-19_18-54-29_b4e6 train (raw)

Total

kind=loss
T
2026-04-19_18-54-29_b4e6 train (smoothed)
2026-04-19_18-54-29_b4e6 train (raw)

Visibility

kind=loss
T
2026-04-19_18-54-29_b4e6 train (smoothed)
2026-04-19_18-54-29_b4e6 train (raw)

Visibility Aux

kind=loss
T
2026-04-19_18-54-29_b4e6 train (smoothed)
2026-04-19_18-54-29_b4e6 train (raw)

Visibility Rank

kind=loss
T
2026-04-19_18-54-29_b4e6 train (smoothed)
2026-04-19_18-54-29_b4e6 train (raw)

Visibility Tv

kind=loss
T
2026-04-19_18-54-29_b4e6 train (smoothed)
2026-04-19_18-54-29_b4e6 train (raw)

Visibility Mean

kind=stat
T V
2026-04-19_18-54-29_b4e6 train (smoothed)
2026-04-19_18-54-29_b4e6 val (smoothed)
2026-04-19_18-54-29_b4e6 train (raw)
2026-04-19_18-54-29_b4e6 val (raw)

Visibility Std

kind=stat
T V
2026-04-19_18-54-29_b4e6 train (smoothed)
2026-04-19_18-54-29_b4e6 val (smoothed)
2026-04-19_18-54-29_b4e6 train (raw)
2026-04-19_18-54-29_b4e6 val (raw)

Gpu Mem Total Gb

T V
2026-04-19_18-54-29_b4e6 train (smoothed)
2026-04-19_18-54-29_b4e6 val (smoothed)
2026-04-19_18-54-29_b4e6 train (raw)
2026-04-19_18-54-29_b4e6 val (raw)

Gpu Mem Used Gb

T V
2026-04-19_18-54-29_b4e6 train (smoothed)
2026-04-19_18-54-29_b4e6 val (smoothed)
2026-04-19_18-54-29_b4e6 train (raw)
2026-04-19_18-54-29_b4e6 val (raw)

Gpu Mem Util Pct

T V
2026-04-19_18-54-29_b4e6 train (smoothed)
2026-04-19_18-54-29_b4e6 val (smoothed)
2026-04-19_18-54-29_b4e6 train (raw)
2026-04-19_18-54-29_b4e6 val (raw)

Gpu Util Pct

T V
2026-04-19_18-54-29_b4e6 train (smoothed)
2026-04-19_18-54-29_b4e6 val (smoothed)
2026-04-19_18-54-29_b4e6 train (raw)
2026-04-19_18-54-29_b4e6 val (raw)

Lr

T
2026-04-19_18-54-29_b4e6 train (smoothed)
2026-04-19_18-54-29_b4e6 train (raw)

Visibility Aux Scale

T
2026-04-19_18-54-29_b4e6 train (smoothed)
2026-04-19_18-54-29_b4e6 train (raw)

No output snapshots found for this run.

Outputs are generated during training and saved to outputs/epoch_N_step_M/ directories.

epoch 11 / step 19070 Checkpoint #3009
4/20/2026, 7:23:38 AM
/cluster/scratch/drothenpiele/euler_train/mari/checkpoints/joint-dehaze-metric-2026-04-19-18-54-29-b4e6/checkpoint-11
epoch 13 / step 21860 Checkpoint #3010
4/20/2026, 7:23:38 AM
/cluster/scratch/drothenpiele/euler_train/mari/checkpoints/joint-dehaze-metric-2026-04-19-18-54-29-b4e6/checkpoint-13
epoch 15 / step 24650 Checkpoint #3011
4/20/2026, 7:23:38 AM
/cluster/scratch/drothenpiele/euler_train/mari/checkpoints/joint-dehaze-metric-2026-04-19-18-54-29-b4e6/checkpoint-15
epoch 17 / step 27440 Checkpoint #3012
4/20/2026, 7:23:38 AM
/cluster/scratch/drothenpiele/euler_train/mari/checkpoints/joint-dehaze-metric-2026-04-19-18-54-29-b4e6/checkpoint-17
epoch 19 / step 30230 Checkpoint #3013
4/20/2026, 7:23:38 AM
/cluster/scratch/drothenpiele/euler_train/mari/checkpoints/joint-dehaze-metric-2026-04-19-18-54-29-b4e6/checkpoint-19
epoch 21 / step 33020 Checkpoint #3014
4/20/2026, 7:23:38 AM
/cluster/scratch/drothenpiele/euler_train/mari/checkpoints/joint-dehaze-metric-2026-04-19-18-54-29-b4e6/checkpoint-21
epoch 23 / step 35810 Checkpoint #3015
4/20/2026, 7:23:38 AM
/cluster/scratch/drothenpiele/euler_train/mari/checkpoints/joint-dehaze-metric-2026-04-19-18-54-29-b4e6/checkpoint-23
epoch 25 / step 38600 Checkpoint #3016
4/20/2026, 7:23:38 AM
/cluster/scratch/drothenpiele/euler_train/mari/checkpoints/joint-dehaze-metric-2026-04-19-18-54-29-b4e6/checkpoint-25
epoch 27 / step 41390 Checkpoint #3017
4/20/2026, 7:23:38 AM
/cluster/scratch/drothenpiele/euler_train/mari/checkpoints/joint-dehaze-metric-2026-04-19-18-54-29-b4e6/checkpoint-27
epoch 29 / step 44180 Checkpoint #3027
4/20/2026, 8:17:58 AM
/cluster/scratch/drothenpiele/euler_train/mari/checkpoints/joint-dehaze-metric-2026-04-19-18-54-29-b4e6/checkpoint-29
Producer launch exports are available. Manage launch-owned exports here for quick reference.
Open Launch

Inherited Launch Exports

These exports are published by the run's producer launch.

Published

The producer launch does not publish any exports yet.

Run-Owned Exports

Publish direct filesystem paths here.

Published

No run-owned exports are published yet.

Raw Artifacts

Run-owned exports are typically direct paths, so there are no captured artifacts to publish from here.

Euler View - ML Experiment Monitor