TutorialsFrance
OpenHow to prototype a token-level confidence-weighted LLM ensemble
Step-by-step prototype to run multiple LLMs in parallel, use token-level confidence (logprobs/entropy) to weight and stitch outputs, and reproduce Sup AI's HLE gain (52.15% vs 44.74%).