Back to SD 2.1. Mari Launch Train: V2
Launch Train: V2 f81d5ea9
training failed
Output
pending f81d5ea9
Waiting for SLURM job to start...
Failure Metadata
failedinteractive.monitor.poll INT_EXIT_CODE_NONZERO
Context JSON
{
"reason": null,
"exitCode": 255,
"remotePid": 616963,
"allocationJobId": 63568990
}Datasets
4Dataset
clear_rootDataset real-drive-sim
Type rgb
Split full
Path
/cluster/work/igp_psr/drothenpiele/data/rds/rgb.zip/Dataset
depth_rootDataset real-drive-sim
Type depth
Split full
Path
/cluster/work/igp_psr/drothenpiele/data/rds/depth.zip/Dataset
train_hazy_rootType rgb
Split train
Path
/cluster/work/igp_psr/drothenpiele/data/rds/foggy_rgb.zip/Dataset
val_hazy_rootType rgb
Split val
Path
/cluster/work/igp_psr/drothenpiele/data/rds/foggy_rgb.zip/Execution Artifacts
2Published Exports
Launch-Owned Exports
These semantic handles are what downstream pipelines should resolve against. Auto-published exports come from config-template metadata captured on this launch.
Published
This launch does not publish any exports yet.
Raw Artifacts
Parameters
100Typed Parameters
Dataset
clear_rootreal-drive-sim / full
(4)Dataset
depth_rootreal-drive-sim / full
(5)Dataset
train_hazy_rootreal-drive-sim / train
(34)Dataset
val_hazy_rootreal-drive-sim / val
(35)Simple Parameters
cliptrueseed42trackerwandbuse_emafalsegpu_typertx_4090:1job_namesd_train_concatnorm_max1norm_min-1run_namejoint-dehaze-metrictmp_size10Gadam_8bitfalsedata_kindreal_drive_simema_decay0.9999ema_dtypefloat32log_every10max_depth800min_depth0.00001adam_beta10.9adam_beta20.999batch_size20depth_modemetric_logimage_size[384, 768]log_imagestruenum_epochs200output_dir/cluster/scratch/drothenpiele/SD21/exp_1time_limit2-00:00:00mem_per_cpu8Gnum_workers4adam_epsilon1e-8conditioningconcatlambda_depth1min_lr_ratio0.01project_namejoint-dehazingwarmup_ratio0.05weight_decay0cpus_per_task8lambda_dehaze1learning_rate0.00004max_grad_norm1freeze_encoderfalsenum_log_images4val_batch_size16enable_xformerstrueeuler_train_dir/cluster/work/igp_psr/drothenpiele/data/out/train/sd21-jointmixed_precisionfp16prediction_typev_predictiondepth_noise_typeannealed_multireslr_schedule_typeconstant_with_warmupmin_max_quantile0.02no_decay_enabledfalsepretrained_modelsd2-community/stable-diffusion-2-1conv_in_init_modemarigoldjoint_weight_psnr1joint_weight_ssim10lambda_visibility0.05depth_ensemble_tol0.001val_every_n_epochs2depth_ensemble_size4num_inference_steps25save_every_n_epochs2use_visibility_headtruewarmup_start_factor0.001lambda_visibility_tv0.005depth_multires_levels4encoder_learning_rate0.000015visibility_rank_pairs128weight_decay_backbone0weight_decay_no_decay0depth_ensemble_max_res1024gradient_checkpointingfalselambda_visibility_rank0.02visibility_rank_margin0.05depth_ensemble_max_iter2depth_multires_strength0.9joint_weight_delta1_pct0.5keep_last_n_checkpoints3visibility_target_gamma4visibility_warmup_steps1000depth_ensemble_reductionmediandepth_normalization_typelog_depthjoint_weight_abs_rel_pct0.5num_inference_steps_depth5enable_efficient_attentiontruelambda_visibility_preserve0.05visibility_hidden_channels32visibility_target_quantile0.9weight_decay_depth_decoder0checkpoint_selection_metric"joint" # "joint" or single metric: psnr/ssim/delta1/abs_rel/val/...depth_decoder_learning_rate0.00004gradient_accumulation_steps1weight_decay_dehaze_decoder0dehaze_decoder_learning_rate0.00004visibility_decoder_grad_scale1.0checkpoint_selection_direction"auto" # used for single-metric selection: auto|max|mindepth_multires_downscale_factor2depth_ensemble_regularizer_strength0.02Raw JSON
{
"clip": "true",
"seed": 42,
"tracker": "wandb",
"use_ema": "false",
"gpu_type": "rtx_4090:1",
"job_name": "sd_train_concat",
"norm_max": 1,
"norm_min": -1,
"run_name": "joint-dehaze-metric",
"tmp_size": "10G",
"adam_...Events
Launch Events
0No launch events recorded.