1 pending
Back to SD 2.1. Mari

Mari.v1 686eeea4

training failed

Output

pending 686eeea4
Launch Output
Waiting for SLURM job to start...

Failure Metadata

failed
slurm.monitor SBATCH_RUNTIME_FAILED

SLURM job failed

Context JSON
{
  "slurmJobId": 58152507,
  "monitorMode": "slurm",
  "interactiveAllocationJobId": null
}

Datasets

4
Dataset clear_root
Dataset VKITTI2
Type rgb
Split full
Path /cluster/work/igp_psr/drothenpiele/data/vkitti_2.0.3_rgb.zip
Dataset depth_root
Dataset VKITTI2
Type depth
Split full
Path /cluster/work/igp_psr/drothenpiele/data/vkitti_2.0.3_depth.zip
Dataset train_hazy_root
Type rgb
Split train
Path /cluster/work/igp_psr/drothenpiele/data/out/VKITTI_NO_06/VKITTI_NO_06.zip
Dataset val_hazy_root
Type rgb
Split val
Path /cluster/work/igp_psr/drothenpiele/data/out/VKITTI_ONLY_06/VKITTI_ONLY_06.zip

Execution Artifacts

2
run.sh

Published Exports

Launch-Owned Exports

These semantic handles are what downstream pipelines should resolve against. Auto-published exports come from config-template metadata captured on this launch.

Published

This launch does not publish any exports yet.

Raw Artifacts

Parameters

84

Typed Parameters

Dataset clear_root
VKITTI2 / full (1)
Dataset depth_root
VKITTI2 / full (2)
Dataset train_hazy_root
VKITTI2 / train (8)
Dataset val_hazy_root
VKITTI2 / val (9)

Simple Parameters

clip
true
seed
42
tracker
wandb
use_ema
false
gpu_type
rtx_4090:1
job_name
sd_train_concat
norm_max
1
norm_min
-1
run_name
joint-dehaze-depth-vkitti-mari
tmp_size
10G
data_kind
vkitti2
ema_decay
0.9999
ema_dtype
float16
log_every
50
max_depth
300
min_depth
0.00001
adam_beta1
0.9
adam_beta2
0.999
batch_size
1
image_size
[375, 1242]
log_images
true
num_epochs
200
output_dir
/cluster/scratch/drothenpiele/SD21/mari1
time_limit
2-00:00:00
mem_per_cpu
8G
num_workers
4
adam_epsilon
1e-8
conditioning
concat
lambda_depth
1
min_lr_ratio
0.01
project_name
joint-dehazing
warmup_ratio
0.05
weight_decay
0
cpus_per_task
8
lambda_dehaze
1
learning_rate
0.00003
max_grad_norm
1
freeze_encoder
false
num_log_images
4
val_batch_size
1
enable_xformers
true
euler_train_dir
/cluster/work/igp_psr/drothenpiele/data/out/train/mari
mixed_precision
fp16
prediction_type
v_prediction
depth_noise_type
annealed_multires
lr_schedule_type
constant_with_warmup
min_max_quantile
0.02
no_decay_enabled
false
pretrained_model
sd2-community/stable-diffusion-2-1
conv_in_init_mode
marigold
joint_weight_psnr
1
joint_weight_ssim
10
depth_ensemble_tol
0.001
val_every_n_epochs
10
depth_ensemble_size
4
num_inference_steps
25
save_every_n_epochs
10
warmup_start_factor
0.001
depth_multires_levels
4
encoder_learning_rate
0.000015
weight_decay_backbone
0
weight_decay_no_decay
0
depth_ensemble_max_res
1024
gradient_checkpointing
true
depth_ensemble_max_iter
2
depth_multires_strength
0.9
joint_weight_delta1_pct
0.5
keep_last_n_checkpoints
3
depth_ensemble_reduction
median
depth_normalization_type
scale_shift_depth
joint_weight_abs_rel_pct
0.5
weight_decay_depth_decoder
0
checkpoint_selection_metric
"joint" # "joint" or single metric: psnr/ssim/delta1/abs_rel/val/...
depth_decoder_learning_rate
0.000036
gradient_accumulation_steps
16
weight_decay_dehaze_decoder
0
dehaze_decoder_learning_rate
0.00003
checkpoint_selection_direction
"auto" # used for single-metric selection: auto|max|min
depth_multires_downscale_factor
2
depth_ensemble_regularizer_strength
0.02
Raw JSON
{
  "clip": "true",
  "seed": 42,
  "tracker": "wandb",
  "use_ema": "false",
  "gpu_type": "rtx_4090:1",
  "job_name": "sd_train_concat",
  "norm_max": 1,
  "norm_min": -1,
  "run_name": "joint-dehaze-depth-vkitti-mari",
  "tmp_size": "10G...

Events

Launch Events

0
No launch events recorded.
Euler View - ML Experiment Monitor