Back to SD 2.1. Mari mari-uncert
mari-uncert 48986b3d
training failed
Output
pending 48986b3d
Waiting for SLURM job to start...
Failure Metadata
failedslurm.monitor SBATCH_RUNTIME_FAILED
Context JSON
{
"slurmJobId": 61669783,
"monitorMode": "slurm",
"interactiveAllocationJobId": null
}Datasets
4Dataset
clear_rootDataset real-drive-sim
Type rgb
Split full
Path
/cluster/work/igp_psr/drothenpiele/data/rds/rgb.zip/Dataset
depth_rootDataset real-drive-sim
Type depth
Split full
Path
/cluster/work/igp_psr/drothenpiele/data/rds/depth_sky_2k.zip/Dataset
train_hazy_rootType rgb
Split train
Path
/cluster/work/igp_psr/drothenpiele/data/rds/foggy_rgb.zip/Dataset
val_hazy_rootType rgb
Split val
Path
/cluster/work/igp_psr/drothenpiele/data/rds/foggy_rgb.zip/Execution Artifacts
2Published Exports
Launch-Owned Exports
These semantic handles are what downstream pipelines should resolve against. Auto-published exports come from config-template metadata captured on this launch.
Published
This launch does not publish any exports yet.
Raw Artifacts
Parameters
100Typed Parameters
Dataset
clear_rootreal-drive-sim / full
(4)Dataset
depth_rootreal-drive-sim / full
(32)Dataset
train_hazy_rootreal-drive-sim / train
(34)Dataset
val_hazy_rootreal-drive-sim / val
(35)Simple Parameters
cliptrueseed42n_gpus1trackerwandbuse_emafalsegpu_typertx_4090job_namesd_train_concatnorm_max1norm_min-1run_namejoint-dehaze-depth-vkittitmp_size10Gadam_8bittruedata_kindvkitti2ema_decay0.9999ema_dtypefloat16log_every50max_depth2000min_depth0.00001adam_beta10.9adam_beta20.999batch_size1image_size[512, 512]log_imagestruenum_epochs200output_dir/cluster/scratch/drothenpiele/SD21/exp_1time_limit2-00:00:00mem_per_cpu8Gnum_workers4adam_epsilon1e-8conditioningconcatenv_activatesource ~/.cache/venv/mari/bin/activatelambda_depth1min_lr_ratio0.01project_namejoint-dehazingwarmup_ratio0.05weight_decay0.01cpus_per_task8lambda_dehaze1learning_rate0.00003max_grad_norm1freeze_encoderfalsenum_log_images4val_batch_size1enable_xformerstrueeuler_train_dir/cluster/work/igp_psr/drothenpiele/data/out/train/sd21-jointmixed_precisionfp16prediction_typev_predictiondepth_noise_typeannealed_multireslr_schedule_typeconstant_with_warmupmin_max_quantile0.02no_decay_enabledfalsepretrained_modelsd2-community/stable-diffusion-2-1conv_in_init_modemarigoldjoint_weight_psnr1joint_weight_ssim10lambda_visibility0.05depth_ensemble_tol0.001val_every_n_epochs10depth_ensemble_size4num_inference_steps25save_every_n_epochs10use_visibility_headtruewarmup_start_factor0.001lambda_visibility_tv0.005depth_multires_levels4encoder_learning_rate0.000015visibility_rank_pairs128weight_decay_backbone0.01weight_decay_no_decay0depth_ensemble_max_res1024gradient_checkpointingtruelambda_visibility_rank0.02visibility_rank_margin0.05depth_ensemble_max_iter2depth_multires_strength0.9joint_weight_delta1_pct0.5keep_last_n_checkpoints3visibility_target_gamma4visibility_warmup_steps1000depth_ensemble_reductionmediandepth_normalization_typescale_shift_depthjoint_weight_abs_rel_pct0.5num_inference_steps_depth5enable_efficient_attentiontruelambda_visibility_preserve0.05visibility_hidden_channels32visibility_target_quantile0.9weight_decay_depth_decoder0.01checkpoint_selection_metric"joint" # "joint" or single metric: psnr/ssim/delta1/abs_rel/val/...depth_decoder_learning_rate0.000036gradient_accumulation_steps16weight_decay_dehaze_decoder0.01dehaze_decoder_learning_rate0.00003checkpoint_selection_direction"auto" # used for single-metric selection: auto|max|mindepth_multires_downscale_factor2depth_ensemble_regularizer_strength0.02Raw JSON
{
"clip": "true",
"seed": 42,
"n_gpus": "1",
"tracker": "wandb",
"use_ema": "false",
"gpu_type": "rtx_4090",
"job_name": "sd_train_concat",
"norm_max": 1,
"norm_min": -1,
"run_name": "joint-dehaze-depth-vkitti",
"tmp_s...Events
Launch Events
0No launch events recorded.