Timbre similarity Resemblyzer similarity Cosine similarity between target and generated Resemblyzer embeddings. higher is better
Melody preservation F0 RMSE Root-mean-square error between aligned source and generated F0 contours. lower is better
Melody preservation F0 correlation Pearson correlation between aligned source and generated F0 contours. higher is better
Naturalness SingMOS naturalness Mean SingMOS-Pro score across 5-second generated-audio chunks. higher is better
Run A ./checkpoints/models--Plachta--Seed-VC/snapshots/257283f9f41585055e8f858fba4fd044e5caed6e/DiT_seed_v2_uvit_whisper_base_f0_44k_bigvgan_pruned_ft_ema.pth Cache: 20260331-110657 Generated: 37 Total: 37 metrics_manifest.json results_manifest.json Timbre 0.7829 Rows scored: 37 F0 RMSE 21.764 Rows scored: 37 F0 corr 0.9336 Rows scored: 37 SingMOS 3.7580 Rows scored: 36
Run B runs/my_run-hydra-nfdsnflksdjflkds_2026-03-31_09-33-42/DiT_epoch_00055_step_09500.pth Cache: 20260331-105848 Generated: 37 Total: 37 metrics_manifest.json results_manifest.json Timbre 0.8903 Rows scored: 37 F0 RMSE 23.197 Rows scored: 37 F0 corr 0.9220 Rows scored: 37 SingMOS 3.5421 Rows scored: 36