I've tried this a couple of times.Then I compared mi evaluation with the two engines I used and this was my conclusion. This work fine but your evaluation need to be similar to the mentor engine, philosophy and/or implementation.How about getting a strong engine and scoring all of the position, then converting this into a probability using the sigmoid function?
Surprisingly, a run with the similarity tool of Don D. don't put these mentor engines on top of similarity.
So the current implementation of the evaluation function will determine which mentor is best and I start to think the maximum elo this evaluation can reach.(tuning parameters only).Improving an evaluation must be a sequence of modifying the evaluation, then tune and so on.