Interactive visualization of the Heteroscedastic GP Surrogate model
What this model does
A Gaussian Process (GP) is a prior over functions. Given a few observations, it infers a distribution over all possible functions consistent with those observations — giving both a mean prediction and uncertainty.
This model is heteroscedastic: the noise level varies across inputs rather than being a fixed constant. A second GP models the log-noise variance.
Five independent GPs run in parallel, one per Pareto objective: tokens τcost cquality q and two more.
Feature space
A package p is encoded as:
x = φ(p) ∈ ℝᵈ
φ concatenates one-hot model slot, binary skill presence, and one-hot prompt/template variants.
Observation structure
yᵢ = (τ̄ᵢ, c̄ᵢ, sᵢ, q̄ᵢ, rᵢ)
Trials without a subjective score rᵢ contribute only the first 4 components to the GP.
RBF-ARD kernel
k(x,x') = σ²_f · exp(−½ (x−x')ᵀ Λ⁻¹ (x−x'))
Λ = diag(ℓ₁², …, ℓ_d²) gives each input dimension its own length-scale — automatic relevance determination. A short ℓ means that dimension strongly influences the output.
1.00
1.00
Kernel value k(0, x') as a function of distance x'. Wider ℓ → smoother functions.
Input-dependent noise (heteroscedastic)
g(x) ~ GP(0, k_ε(x,x'))
σ²(x) = exp(g(x))
Instead of a fixed noise floor, a second GP models log-variance. This lets the model be confident in low-noise regions and uncertain where observations are noisy.
Illustrative heteroscedastic noise: σ²(x) varies continuously across x.
Posterior mean & variance
μ_N(x*) = k*ᵀ [K + Σ]⁻¹ y
σ²_N(x*) = k(x*,x*) − k*ᵀ [K + Σ]⁻¹ k*
K is the N×N kernel matrix of training points. Σ = diag(σ²(x₁),…,σ²(x_N)) is the heteroscedastic noise matrix. k* is the vector of kernel values between x* and all training points.
5
0.15
Posterior mean
95% credible band
Observations
Proposer regime
With fewer than N₀ ≈ 10 trials the GP doesn't have enough data to be reliable. The proposer switches strategy based on trial count.
5
EHVI acquisition function
α_EHVI(x; D_N)
Expected Hypervolume Improvement measures how much a new candidate x is expected to expand the 5-dimensional Pareto frontier of (tokens, cost, scaling, quality, subjective) scores.
Once N ≥ N₀, the proposer picks whichever unevaluated package maximises EHVI — balancing exploitation (known good regions) with exploration (high uncertainty).