Michel wrote:
It would nice to understand this for evaluation functions depending on more parameters. The model would be a "true" (unknown) evaluation function E(x1,...,xn) predicting the correct w/l ratios (using the logistic function) depending on measurable parameters x1,..,xn and a heuristic evaluation function H(x1,...,xn).
I guess in first approximation E and H can be assumed to be linear in x1,...,xn. The challenge is to match up the coefficients of H with those of E.
In a realistic analysis draws should also be incorporated. The presence of draws may smooth out the effect of evaluation errors.
OK, I did some more thorough analyses. Below a code snippet in R (Works On My Machine).
What I do is generate a 4-term evaluation function. Two terms are 0/1 variables, generated by a binomial process. The other two are normally distributed variables.
I fix the "true" eval parameters to be -0.6190392 0.4054651 0.1000000 -0.1500000 respectively. I then generate linear predictors (called "eta", as in the literature on generalized models). I encode this as 0, 0.5 and 1 depending into which section between the theta thresholds the linear predictor falls. I also fix the "true" theta parameters to -1.386294 1.386294 (this corresponds to a 60% draw rate).
After that, I run two models: 1) Non-Linear Squares using a continuous LHS (y1 in my program) that has 0, 0.5 and 1 as values, and regress it on the eval features, 2) Cumulative Link Model using a one-hot encoded triple LHS (y3 in my program), and regress it on the eval features and also fitting the draw threshold parameters. I extract the fitted coefficients and compute the linear predictors (the evals used during search).
It turns out that there is perfect linear correlation between the two evals!! The NLS estimates are 65% smaller than the CLM estimates, in order to "squeeze" the eval score towards the value of a draw. The CLM estimates don't have to because this model already has the theta parameters as generalized intercepts.
Code: Select all
library(ordinal)
# make this reproducible
set.seed(47110815)
# Generate N logistically distributed values
N = 1e5
x = cbind(rbinom(N, 1, .5), rbinom(N, 1, .5), rnorm(N), rnorm(N))
(w_star = c(qlogis(.35), qlogis(.6), .1, -.15))
theta_star = c(qlogis(.2), qlogis(.8))
eta = as.vector(x %*% w_star)
y_latent = eta + rlogis(N, 0, 1)
# 0/0.5/1 encoded
y1 = ifelse(y_latent > theta_star[2], 1,
ifelse(y_latent > theta_star[1], 1/2, 0)
)
# one-hot encoded of length 3
y3 = apply(cbind(y1 == 0, y1 == 1/2 , y1 == 1), c(1,2), as.integer)
# Use R library functions to get the most accurate results
w0 = rep(0, length(w_star))
(nls1.est = nls(y1 ~ plogis(x %*% w), start = list(w = w0)))
(-logLik(nls1.est) / N)
(clm1.est = clm(factor(y1) ~ x))
(-logLik(clm1.est) / N)
nls1.eta = x %*% coef(nls1.est)
clm1.eta = x %*% coef(clm1.est)[-(1:2)]
plot(clm1.eta, nls1.eta)
summary(lm(nls1.eta ~ clm1.eta))
Output:
Code: Select all
> (nls1.est = nls(y1 ~ plogis(x %*% w), start = list(w = w0)))
Nonlinear regression model
model: y1 ~ plogis(x %*% w)
data: parent.frame()
w1 w2 w3 w4
-0.39761 0.25896 0.06154 -0.10136
residual sum-of-squares: 9977
Number of iterations to convergence: 2
Achieved convergence tolerance: 5.337e-06
> (-logLik(nls1.est) / N)
'log Lik.' 0.2664951 (df=5)
> (clm1.est = clm(factor(y1) ~ x))
formula: factor(y1) ~ x
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 1e+05 -94636.09 189284.17 6(0) 1.47e-12 1.1e+01
Coefficients:
x1 x2 x3 x4
-0.60832 0.40026 0.09369 -0.15488
Threshold coefficients:
0|0.5 0.5|1
-1.380 1.387
> (-logLik(clm1.est) / N)
'log Lik.' 0.9463609 (df=6)
>
> nls1.eta = x %*% coef(nls1.est)
> clm1.eta = x %*% coef(clm1.est)[-(1:2)]
>
> plot(clm1.eta, nls1.eta)
> summary(lm(nls1.eta ~ clm1.eta))
Call:
lm(formula = nls1.eta ~ clm1.eta)
Residuals:
Min 1Q Median 3Q Max
-0.0041136 -0.0010737 0.0000032 0.0010759 0.0034799
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.467e-03 4.136e-06 -354.7 <2e-16 ***
clm1.eta 6.523e-01 9.861e-06 66150.5 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.001266 on 99998 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 4.376e+09 on 1 and 99998 DF, p-value: < 2.2e-16
For completeness (code not shown), I also reproduced the above R library functions with hand-written mean-squared error and log-likelihood formulas, and the outcomes agreed perfectly.