Symbolic: Status Report 2007.04.25

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Symbolic: Status Report 2007.04.25

Post by sje »

Symbolic: Status Report 2007.04.25

The problem with repetition detection in the Lisp source has been fixed. There were actually three separate bugs, each caused by too much of a desire for efficiency.

--------

I should have the first cognitive only version of Symbolic running on FICS by the end of the month. The FICS finger notes will identify the change in search. The changeover will be obvious as the program's rating will quickly drop into the 800 elo range. The ICC version with continue to use the modified traditional A/B searcher for the present.

Weak? Yes, at least for now. Late in coming? Agreed. But after its first cognitive search game and a nearly sure loss, it will have played more chess than did Bovinnik's Pioneer.

--------

There was further refinement of the TTState (tree traversal state) object definition that supports iterative (vs recursive) depth first traversal of a subtree. This work is motivated by the earlier reported failure of the ChessLisp interpreter's recursive descent evaluator when applied to trees that were a couple hundred of ply in depth. Most routines that did do recursive tree walks have been converted to use the TTState mechanism. An exception is the somewhat ornate function that generates the HTML text for the abbreviated variation map of the entire search tree for the (optional) tree/node report web page. I can live with this for now.

There's also a plain text version of the tree/node dump report. As the HTML version makes this somewhat redundant, I'll probably rip out the plain version.

--------

Tests show that moving the search tree cursor from one node to an adjacent node takes about 200 microseconds, a reasonable time given the extent of the cursor's internal data structures that need updating. Node creation takes much longer, about two milliseconds each with rather higher times for tactically complex positions. As mentioned earlier, the *average* node processing time is budgeted at 200 milliseconds and this is a sum that includes all visits to a node, not just the first one.

--------

In addition to the tree/node HTML report, Symbolic's cognitive search also (optionally) produces a story web page. This output is generated as the search progresses and is intended to be easily understandable by skilled human chess players. This in part to better enable feedback for diagnostic and improvement purposes. It is also a defense against those whom might make accusations of cloning or other skullduggery. One idea is to have the program produce the story web page after each search and then automatically copy it over to my web host for a near real time display for spectators of ICS play. Each story generated during an ICS game could have its URL output as a kibitz so that any observers would be able to quickly and easily review the particulars of the search. Ideally, the story HTML would be produced by a spare processor core so as not to impede the successive search.

At this time, the story output is a bit limited but is becoming more detailed as other work progresses. (I'm trying to re-use older code, at least the code that's worth the effort.) My thought is that every new node produces at a minimum a well labeled diagram output that includes, among other items, the reason for the creation of the node along with the plans and goals in effect. Later visits to the same node would replace the space consuming diagram with a local hyperlink. At the conclusion of the game, a master web page could be generated that has links to all of the separate search pages.

So the story HTML output or a single search could be many megabytes in size, even with each search limited to a few hundred nodes. Perhaps I might have to pay extra to the web host for an increased bandwidth allocation.

--------

To assist with testing and debugging the pattern instance production system and the plan/goal elaboration system, I'm working on a specialized mate attack/defense search. Using the results of an earlier exploration of genetic algorithms for mate threat move metrics, I've come up with a whole tree search that should fairly effectively produce proof solutions for mate problems. Here's an outline on how it works:

1) [Initialize] The search tree is initialized with a single node (the root, ply = zero) with the given problem position. The node, like all nodes at even ply, is marked as "IsFM" (is first mover active). Odd ply nodes are "IsSM" (second mover). The first mover player is given the challenge of finding a mate while the second mover is assigned the role of defender.

2) [Loop control] While search time remains and there's not too many nodes and there's no mate found and there's at least one move available for the attacker to try, step 3 (attack) and step 4 (defend) are repeated.

3) [Attack generation] The tree search cursor is set to the root position. The attacker scans the entire tree looking at each IsFM node and ensures that the mate attack metrics generation productions are run at these nodes. The metrics are somewhat complex and include many factors; I'll likely describe them at some later date. The attacker determines the most threatening untried move; if (rarely) none are available, the loop exits with attack fail status. Otherwise, a new IsSM node is created by executing the move and control passes to step 4 (defend) with the search tree cursor positioned at the newly created node.

4) [Defense generation] The defense tries to find a defense for threat variation. If there are no moves available, then there is either a draw or a checkmate. If it's a draw, the defense simply exits and control resumes at step 2. If it's a checkmate, then the defense works its way up the threat variation two ply at a time trying to find refutation moves at prior IsSM nodes. If the defense can't find these at any level, then a mate is recorded and control returns to step 2 with the whole search ending. Otherwise, if the defense can find a refuting move at any level in the threat variation, then it plays that move, generates a new IsFM node, and control returns to step 2 for further processing. Selection of defense moves is based on mate defense metrics that are complex but somewhat simpler than the attack metric productions.

In summary, the attacker looks at *all* the first mover nodes each iteration to produce a threat variation while the defender looks *only* at second mover nodes along the given threat variation. The attacker's result, if successful, is a proven mate solution.

How well should this search perform? We'll find out soon, I hope. I'm sure that the node count for a successful search will be far, far less than that of a traditional A/B searcher. Time to solution is a different question, though.

Some possible refinements:

* Use the transposition repository to eliminate redundant subtrees.

* Have the defense pick moves based on a simultaneous comparison of all defending nodes along the threat variation instead of working its way up towards the root.

* Have the attack pick moves with a positive bias towards recently explored regions of the tree.

* Have both sides generate more than a single node at a time; instead, try a single variation of an odd number of moves based on an earlier successful variation grown from a sibling or cousin node. (I call this the "tendril technique".)

* Employ a global history heuristic matrix along with reply tables to assist with move ordering.

* Use productions to construct a tactical mesh instance at each node to help with move selection. (A tactical mesh is a list of lists that represents attack/defend/pin/xray/trap/flight properties of a position along with a probable score and a move/variation suggestion. It's kind of like a cache entry for a static exchange evaluation or a quiescence search result and it's the first step towards a causality facility.) Refutation candidate moves that do not seriously perturb a tactical mesh instance can be either given lesser priority or ignored entirely. Additionally, counter refutation productions can use a tactical mesh to suggest moves to avoid previously successful refutations.

* Use a meta analysis feature that measures the ongoing success rate of the attack at different nodes by looking at the subtree sizes and relative progress. These measurements can then be used to apply the appropriate bias in threat variation selection.

* Get the plan/goal elaboration system working along with a library of mating plan templates. This is a lot of work but it has to be done sooner or later. I can take a source like BWTC and slog through the more complicated mates to come up with patterns and the corresponding plan templates. This involves many other issues but it's worth the effort as this approach best models human search behavior, at least as I understand it.

* Adjust the attack success criteria to include "significant material gain" as a secondary and alternative goal to "defender's swift and certain death". This would provide a transition mechanism to a more general cognitive search.

* Implement a simple learning facility. This requires that there be weights for different attack/defend productions along with a post search feedback scheme. The feedback can be done by using a traditional A/B searcher to supply an optimal proof PV and then using this to measure and adjust the relative merit of those productions used at the corresponding nodes in the cognitive search tree. Another approach towards a simple learner is to assign a weight to every knowledge source and optimize these via Monte Carlo techniques and a whole lot of processor time.

* Implement a general learning facility. The idea here is to use a feedback mechanism to help write new productions. This is a somewhat daunting problem as currently all the productions are handwritten and it's hard to see exactly how such design and coding tasks can be done by a program.

--------

Compared to David Wilkins' program Paradise, Symbolic's mate search will almost certainly produce node counts higher by an order of magnitude or more. This is the price of providing a proof vs providing a nearly sure result. However, Symbolic can be tuned to incorporate plausibility measurements in its productions than can reduce the node count -- at the costs of lesser certainty, lower node frequency, and added complexity. As such tuning probably represents a more human like approach, it's likely a good idea to try.

A further idea is to incorporate the above specialized mate search into an adjunct task to the main cognitive search and run this task on a spare processor core. The main obstacle here is the availability of sufficient memory; or more specifically, the availability of funds for pricey high speed FB-DIMM chips.

--------

Compelled by honesty, here are some comments on Symbolic's departures from a strictly cognitive model:

1) Symbolic's cognitive search, like a human player, uses an opening book. Unlike a human player, its memory is nearly unlimited and is perfect in its recall. Could an intentionally impaired opening book subsystem be used to better simulate human performance and method?

2) The program's opening book provides moves, but not suggested ideas or themes. How could these be implemented in a strictly cognitive model?

3) Tablebases are used by the program. A more human approach would involve implementing many, many endgame specific productions. I note with no surprise that Paradise was exempted from endgame problem attempts.

4) Some pattern recognizers and other functions are implemented as ChessLisp primitives instead of as more general explicit Lisp routines. (Perhaps the most complicated of these is the built in mate in three predicate.) Do all or even most strong human chess players have similar mental capabilities that operate reliably at the subconscious level?

--------

Trivia:

1) Symbolic's source has been entered entirely via the Dvorak keyboard layout. I re-learned typing five years ago after glucose control issues caused hand tremor problems that made Qwerty keyboarding impractical. In retrospect, I should have tried Dvorak decades earlier.

2) Symbolic is regularly tested on an Macintosh that dates from the mid 1990s. The machine is slow but still useful as its run time environment sometimes produces faults due to portability bugs. For some time I was also testing with a similar vintage AMD box running Linux. I had to junk it when processor cooling fan failed and the cooked CPU, even with a fan replacement, would only run for an hour or so at a time before departing for the Twilight Zone.

3) PGN (Portable Game Notation) was originally called "Paragraph Game Notation" when it appeared in my old program Spector. One of the motivations for its development was that its predecessor "Column Game Notation" (one full move per line) couldn't display games longer than 25 moves on the nine inch B/W display of my 1986 Macintosh. Using a paragraph format for the move text allowed much longer games to fit on the porthole screen. So the name starts with "Portable" instead of some word not starting with a "P" because I was too lazy to change all the "PGN" identifiers in the program source.

4) Symbolic's toolkit keeps track of all time values with a one microsecond resolution; Spector did only centisecond resolution due to the typical host hardware of the mid 1980s. Both programs will need fixes to continue to operate after the end of the Thompson Era in early 2037. Somehow I'm not too worried about this.

--------

More to come.
nczempin

Re: Symbolic: Status Report 2007.04.25

Post by nczempin »

sje wrote:Symbolic: Status Report 2007.04.25
...
Thanks for all that, it makes an interesting read.
Compared to David Wilkins' program Paradise, Symbolic's mate search will almost certainly produce node counts higher by an order of magnitude or more. This is the price of providing a proof vs providing a nearly sure result. However, Symbolic can be tuned to incorporate plausibility measurements in its productions than can reduce the node count -- at the costs of lesser certainty, lower node frequency, and added complexity. As such tuning probably represents a more human like approach, it's likely a good idea to try.
Finally someone mentions Paradise. Finally I can reveal the reason I named my program "Eden".
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Symbolic: Status Report 2007.04.25

Post by sje »

nczempin wrote:Finally someone mentions Paradise. Finally I can reveal the reason I named my program "Eden".
It what manner does Eden reference any of the ideas in Paradise?

I'm curious as although there have been references to Paradise in the bridge and go domains, there hasn't been much in the chess domain.

I've been a bit surprised at this as Wilkins' 1980 paper has seen a lot of readers, particularly after its inclusion in the second edition of _Chess Skill in Man and Machine_. I did read somewhere that some of the Deep Thought team once looked at incorporating some of the concepts from Paradise but decided against this. There have been a couple of chess programs written in Lisp, but from what I've seen, they're not much more than simple A/B searchers.

Of course it's possible to write a plans and patterns searcher in a language other than Lisp. In fact, Symbolic started out this way with a host of specialized C++ templates and classes for handling Lisp like data structures. This soon became too unwieldy for what I ultimately wanted to do and it was simpler to write a Lisp interpreter instead.
nczempin

Re: Symbolic: Status Report 2007.04.25

Post by nczempin »

sje wrote:
nczempin wrote:Finally someone mentions Paradise. Finally I can reveal the reason I named my program "Eden".
It what manner does Eden reference any of the ideas in Paradise?
In name only. I wanted to look at some of Paradise's ideas, but so far I have so much to learn in traditional chess programming, so I decided to concentrate on that first.
I'm curious as although there have been references to Paradise in the bridge and go domains, there hasn't been much in the chess domain.

I've been a bit surprised at this as Wilkins' 1980 paper has seen a lot of readers, particularly after its inclusion in the second edition of _Chess Skill in Man and Machine_. I did read somewhere that some of the Deep Thought team once looked at incorporating some of the concepts from Paradise but decided against this. There have been a couple of chess programs written in Lisp, but from what I've seen, they're not much more than simple A/B searchers.

Of course it's possible to write a plans and patterns searcher in a language other than Lisp. In fact, Symbolic started out this way with a host of specialized C++ templates and classes for handling Lisp like data structures. This soon became too unwieldy for what I ultimately wanted to do and it was simpler to write a Lisp interpreter instead.
Although my naivete regarding PhD etc. makes me blush (never worked out, didn't even get an undergraduate degree for various reasons), here's a life-changing post on usenet from 1989:

http://groups.google.de/group/rec.games ... a48169724b

Remember, I was a first-year undergraduate, so when the next day I had an email from Professor Wilkins in my inbox, I was extremely impressed with the internet, to say the least.

He wrote to the effect of "yes, it's sad that no-one picked up on it" etc.

Interestingly, I have checked CiteSeer recently, and there are far more references to the Paradise paper than there were in 1989 (okay, not hard considering there were 0 back then).

So, Paradise has always had a special place in my heart...
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Symbolic: Status Report 2007.04.25

Post by sje »

I have a suspicion that at least a few chess programmers have tried an advanced plans and patterns approach based in part on the ideas in Paradise. The trouble is simply that it's a lot of hard work with little in the way of quick results. It's just not feasible to graft a costly and relatively slow pattern recognition system and a companion production system into a traditional A/B searcher without giving up the high node count rate that the A/B searcher needs.

Also, it's just too easy to code a traditional A/B searcher. The basic negamax algorithm will fit on a single page and there's plenty of samples available. There's no easy equivalent for plans and pattern systems that are suitable for the chess domain. Decent C and C++ compilers are easily available, but that's not always been true of Lisp or Prolog interpreters.

--------

Perhaps a good starting point for adding symbolic processing to a traditional A/B searcher would be to implement a causality facility and then use it to augment and partially replace an old style quiescence search. (As I recall, Wilkins wrote and tested a Lisp causality facility before writing Paradise which used the earlier work.) Only a few programs (Kaissa, CAPS, Paradise) have publicly claimed an operating causality facility; perhaps there are several commercial chess programs that secretly have this feature.