Last Christmas, my fatherinlaw gave me a copy of Mancala (also known as Kalah or Kalaha). I've played it a few dozen times, but my 7 year old daughter still beats me regularly!
So I wondered about coding up the rules in GDL to see how Sancho plays it. As it transpires, a quick look in the base repository revealed that somebody had already done this back in 2009. (This game isn't on Tiltyard rotation because it doesn't contain explicit definitions of the base and input propositions.)
Enthusiastically, I fired up Sancho and got it playing. Sadly, it only managed ~100 iterations/turn. (We've made some significant general improvements since then, but Kalaha is still an order of magnitude slower even than a complicated game like Chess.) At the time, I made some tweaks to the GDL which improved the speed approximately threefold, but that still wasn't enough for a good game.
So, it has sat on the back burner for nearly a year. Over the last couple of days, I've been digging into it in a bit more detail because I have some moderately revolutionary ideas for dramatic improvements. (Hopefully more in a further blog post.)
For now, I've produced an annotated version of the GDL which you may like to peruse. (You'll definitely want to understand the basic rules of the game first though  see the link above.)
Tuesday, December 9, 2014
Sunday, September 28, 2014
Caught in the tar pit
I have been a bit remiss in making regular posts recently, and as some may have noticed, Sancho has also been absent from Tiltyard for nearly a month.
The reason for this is that about 4 weeks ago I undertook a reworking of the MCTS tree's representation internally to Sancho, to reduce memory usage, and remove some roleasymmetry that didn't seem right. The catalyst for doing this was to set the groundwork for an enhanced movesequence search which will be the subject of a future post (at least if it works!). I expected it to take 3 or 4 days, but it soon turned out to be a sticky morass, from which escape was more difficult than envisaged!
The core of the change was to eliminate nondecision nodes in the tree (forced moves, forced noops, etc.), leaving only the nodes at which actual decisions have to be made. Since Sancho's tree considers choices for a single role at any one node (not joint moves), this meant that in most games (nonsimultaneous move games) decision layers were interposed with nondecision (forced noop) layers (one in N layers at most would have a decision, where there are N roles in the game). This was both wasteful of space (tree node allocation), and introduced role asymmetry because the state only changes every Nth layer of the tree (where a full joint move is implied by the choices in the preceding N layers).
In the new structure all nondecision nodes are eliminated, and (as a sideeffect) an edge always leads between nodes of different game states. The semantics of the representation however, should be equivalent.
Stabilizing this change took roughly the expected time, but unexpectedly it turned out to have a distinctly damaging impact on the quality of play in games that employed heuristics (primarily games with a piece heuristic active, such as breakthrough, or chess). The reason for this is not totally clear, but almost certainly has to do with the role asymmetry, which the original heuristic implementation was 'comfortable' with.
Since then I have been fiddling with the mechanisms by which heuristic values are blended into the MCTS tree, to find a modification that work well in the new structure. Over the past 3 weeks or so I have found various approaches that initially worked well in my tests with one game, only to find that they performed badly in another. The three games I have mostly been using to test (as they display somewhat different characteristics) were:
The reason for this is that about 4 weeks ago I undertook a reworking of the MCTS tree's representation internally to Sancho, to reduce memory usage, and remove some roleasymmetry that didn't seem right. The catalyst for doing this was to set the groundwork for an enhanced movesequence search which will be the subject of a future post (at least if it works!). I expected it to take 3 or 4 days, but it soon turned out to be a sticky morass, from which escape was more difficult than envisaged!
The core of the change was to eliminate nondecision nodes in the tree (forced moves, forced noops, etc.), leaving only the nodes at which actual decisions have to be made. Since Sancho's tree considers choices for a single role at any one node (not joint moves), this meant that in most games (nonsimultaneous move games) decision layers were interposed with nondecision (forced noop) layers (one in N layers at most would have a decision, where there are N roles in the game). This was both wasteful of space (tree node allocation), and introduced role asymmetry because the state only changes every Nth layer of the tree (where a full joint move is implied by the choices in the preceding N layers).
In the new structure all nondecision nodes are eliminated, and (as a sideeffect) an edge always leads between nodes of different game states. The semantics of the representation however, should be equivalent.
Stabilizing this change took roughly the expected time, but unexpectedly it turned out to have a distinctly damaging impact on the quality of play in games that employed heuristics (primarily games with a piece heuristic active, such as breakthrough, or chess). The reason for this is not totally clear, but almost certainly has to do with the role asymmetry, which the original heuristic implementation was 'comfortable' with.
Since then I have been fiddling with the mechanisms by which heuristic values are blended into the MCTS tree, to find a modification that work well in the new structure. Over the past 3 weeks or so I have found various approaches that initially worked well in my tests with one game, only to find that they performed badly in another. The three games I have mostly been using to test (as they display somewhat different characteristics) were:
 Breakthrough (simple equalvalue piece heuristic, fixed sum)
 Speed chess (variable value piece heuristic, fixed sum)
 Skirmish (variable value piece heuristic, nonfixed sum)
In particular I found that what worked well in Breakthrough (where material exchanges are typically somewhat binary in nature) didn't work well in the other two (and visa vera).
However, as of yesterday I am now getting positive test results (but much more testing remains to be done) in all of the above games, so hopefully this time the light at the end of the tunnel is not another oncoming train!
A ton of regression testing now needs to be done, for which I plan to use the following games:
 Reversi (uses a heuristic other than piecebased)
 Connect4 (goto nonheuristic fixedsum 2player game)
 Max knights (puzzle)
 Sudoku (very different puzzle)
 Three player free for all (more than 2 players)
 Blocker (simultaneous turn)
 Pentago (non alternating role choice sequence)
 Breakthrough with walls (heuristic and factored)
This will take a few more days, even if no problems are found. After that (I hope), normal service will be resumed...
Wednesday, August 20, 2014
Balanced diet
(Who ate all the pies?)
In the International General Game Playing Championship 2014, it struck me that many of the games that were played had significant bias for one player (usually, but not always, the first). In an attempt to compensate, many games were played both ways round, often with the predictable outcome. This was unfortunate because it increased the amount of time required to play games, reduced the excitement (by having predictable outcomes) and effectively reduced the bestof3 rounds to a bestofone (two matches of a game with a predictable outcome and then the decider  usually a game where the bias is unknown because it isn't played on Tiltyard and therefore statistics aren't available).
Whilst reading up on various abstract strategy games, I was introduced to the concept of the Pie rule. Based on the traditional method for children to divide a cake ("you cut, I'll choose"), it aims to balance two player games by giving the second player the opportunity to swap places with the first player after the first move. This way, the first player has an incentive to play the most balanced initial play the he possibly can. (If he makes an opening move that is too strong, the second player will swap with him and get the strong position. If he plays an opening move that is too weak, the second player will leave him with it.)
In the last few days, I have created Pierule variants of 9board TicTacToe and Hex. (I've also created a visualization for regular Hex so that it can be played on Tiltyard.) It's early days yet, but I notice that the pie rule seems to be doing the trick for 9BTTT. In the first 16 matches played on the Tiltyard, 9 went with the first player and 7 with the second. Whilst it's still a small sample size, that's looking substantially more balanced than the regular version.
So, for the GDL authors out there (by which I suppose I mean Alex!), consider the Pie rule for creating balance in your game.
For everybody else, what games would you like to see a Pie rule variant of? Whilst I'm certainly not making any promises, if you post in the comments, I'll consider doing them.
In the International General Game Playing Championship 2014, it struck me that many of the games that were played had significant bias for one player (usually, but not always, the first). In an attempt to compensate, many games were played both ways round, often with the predictable outcome. This was unfortunate because it increased the amount of time required to play games, reduced the excitement (by having predictable outcomes) and effectively reduced the bestof3 rounds to a bestofone (two matches of a game with a predictable outcome and then the decider  usually a game where the bias is unknown because it isn't played on Tiltyard and therefore statistics aren't available).
Whilst reading up on various abstract strategy games, I was introduced to the concept of the Pie rule. Based on the traditional method for children to divide a cake ("you cut, I'll choose"), it aims to balance two player games by giving the second player the opportunity to swap places with the first player after the first move. This way, the first player has an incentive to play the most balanced initial play the he possibly can. (If he makes an opening move that is too strong, the second player will swap with him and get the strong position. If he plays an opening move that is too weak, the second player will leave him with it.)
In the last few days, I have created Pierule variants of 9board TicTacToe and Hex. (I've also created a visualization for regular Hex so that it can be played on Tiltyard.) It's early days yet, but I notice that the pie rule seems to be doing the trick for 9BTTT. In the first 16 matches played on the Tiltyard, 9 went with the first player and 7 with the second. Whilst it's still a small sample size, that's looking substantially more balanced than the regular version.
So, for the GDL authors out there (by which I suppose I mean Alex!), consider the Pie rule for creating balance in your game.
For everybody else, what games would you like to see a Pie rule variant of? Whilst I'm certainly not making any promises, if you post in the comments, I'll consider doing them.
Sunday, August 17, 2014
Goal Latches
Latches are properties of a game's state which, once they become true, are guaranteed to remain true throughout the remainder of the game (positively latched); or which, once they become false, are guaranteed to remain false (negatively latched). A 'property' in this sense can be any logical function of the game's base state  most typically we talk about the simplest case of this, which is the state of an individual base proposition.
Identifying latches can be useful in several ways:
Identifying latches can be useful in several ways:
 It may allow reduced computational effort in calculating state transitions (e.g.  if a part of the propnet is effectively 'dark' due to a latch condition, then once that latch is detected that part of the propnet can be left uncalculated in future state transitions).
 Similarly it may allow reduced computational effort in heuristic calculation
 If the latch constrains the achievable goals then it can be used to optimize search
This post is about the use Sancho makes of the last case  i.e.  situations in which it detects that the achievable score range has been reduced by latching of one or more of the goal propositions (at most one for each role can become positively latched, in which case the score for that role is then fully determined in all possible paths, but several may become independently negatively latched, which serves to reduce the size of the set of achievable scores).
Latch detection might be analytic or heuristic in nature. I'll come back to heuristic latch detection in a future post  for this one I'm only considering analytic detection, though the use to which the result is put can be the same in either case.
Analytic latch detection is based on logical reasoning about the structure of the game. In Sancho's case this is done by analysis of the propnet, and currently we only identify fairly simple cases in which a base proposition feeds back to itself via logic of one of the following forms (where X is the base prop, and X' is its next value on state transition):
X' = X OR <other condition> (X is positively latched)
or
X' = ~X AND <other condition> (X is negatively latched)
Furthermore, if a goal proposition is of the form
G = X AND <other condition>, and X is negatively latched, then G is said to be negatively latched by X.
Similarly for the positive latch case:
G = X OR <other condition>
If we can identify such goal latches, we know that the set of achievable scores in any state of the game reachable from a state with latched goal propositions is either fully determined (positively latched goal), or is strictly reduced from the set of all possible goal values to some subset.
In Sancho this surfaces as a method on the state machine which returns the achievable goal range for any given state, as [min,max]. This range can be used in the MCTS search tree processing in several ways:
 If min == max, then all possible paths result in the same score. If this occurs for our role then whatever we play from that point on can have no impact on the eventual outcome. Consequently we can treat any tree node in which this occurs as if it were terminal, and not search below it (the one exception being that if this occurs on the root node we expand one level just so that we have legal moves to choose from when asked to return a move)
 If a terminal state is encountered with a score of the max achievable, we can invoke tree trimming by propagating upwards when the role deciding on the move at this branch is the one with the maxachievable score (i.e.  this amounts to a 'win' relative to the parent state and no further searching is needed since it cannot be improved upon)
 If a move reduces the achievable range (either at the top or at the bottom) then we can trivially define a heuristic to favor search on paths that increase the min achievable score, and disfavor those that decrease the max achievable.
Because latch detection is typically very cheap (at search time, once the analysis has been done during metagaming) we can also use the change in achievable score as a heuristic during playouts to guide otherwise random playouts along paths that tend to maximize the expected goal value, on the assumption (within the playout) that all achievable scores are equally likely. This can most simply be done by simply downweighting moves that decrease the choosing role's max score, and upweighting those that increase the min. In zerosum games, upweighting those that decrease the max opponent role scores, and downweighting those that increase their min scores can also be applied. Currently we do this VERY crudely and handle the 'decrease weight' cases by preventing those moves being selected at all in the presence of any other choice!
Some examples help illustrate where these techniques give benefit, so I'll discuss a few in the following paragraphs.
The Untwisty Corridor Family
This family of games are basically just test cases for latch detection. They are puzzles involving traversing a maze (logically speaking anyway  none of them actually have visualization currently so far as I am aware). Any step onto the wrong path sets a 'failure' proposition in the game's state, and the game terminates after a fixed number of moves. Consequently you must make the 'correct' choice for each of a fixed number of steps (in 'untwisty corridor' you need to go straight every move, hence the name). Because a wrong choice sets a latched proposition, which ultimately determines the goal values, this is a simple latchedgoal case, where any wrong move immediately fully determines the final score (as 0). Because all such states are immediately treated as if they were terminal, and not searched, the effect is that the MCTS search only ever visits nodes one step off the correct branch, which reduced the size of the search space from being exponential in number of steps to being linear.
In fact, the basic untwisty corridor is somewhat flawed, because all false choices lead to the SAME state (apart from the step counter), so it is trivially solved by transposition, provided the player maintains a transposition table of some sort. Furthermore, at only 6 steps, it is small enough to solve by brute force anyway! The game 'untwistycomplex2' is an attempt to address these issues, and a better test.
Escort Latch Breakthrough
The game Escort Latch Breakthrough is a variant of Breakthrough where each role has a king and 8 pawns. The goal is to get the King to the far end of the board. If both kings are captured the game is a draw. A typical MCTS player, without latch analysis, will usually move a pawn from in front of their king and then just advance the king right up to one rank before it can be captured by the opponent's pawns. At that point it's usually fairly trivial for the opponent to force capture it (kings cannot move backwards) and games between such players tend to end up as mutual king exchanges and therefore draws (for example, this game). The reason this happens is that the vulnerability of the king has very little impact on the ensemble of playout scores, because almost all playouts fail to bother capturing the kings, and MCTS convergence is slowed by the need to achieve convergence in subtrees following king capture.
With latch detection a king capture is seen as an achievable score range change (from [0,100] to [50,100] or [0,50] depending on which king is captured). This impacts the search in several ways:
 In states where one king has been captured, capturing the other is terminal, and can be propagated as if it were a win (even though it scores only 50), because it is the best achievable result
 Moves that capture/avoid capture of a king are heuristically selected more frequently, which speeds convergence
 Playouts are much more reasonable, since they do not include games in which a king is NOT captured if it could be (i.e.  where a king just marches through a defending line of pawns without being molested). This dramatically improves the quality of Monte Carol samples generated by the playouts. This gain far outweighs the cost in reduced playout count that the evaluation requires (which is actually quite substantial since all moves must be examined at each stage)
The result is that rather than optimistically advancing their king into danger, the player tends to hold it back, and work to capture its opponent's king first (see this game for example)
Friday, August 1, 2014
Puzzling it out
During the recent competition I promised to spill the beans on our Sudokuenabling analysis, and other (mostly) puzzlespecific enhancements. Accordingly, this post will go through the more interesting techniques Sancho uses in puzzle solving (i.e.  1player games, or pseudo1player games, which I will define later).
Plans
The first observation that can be made universally about 1player (deterministic, complete information) games, is that the player is entirely in charge of its own fate, and doesn't have to worry about what some other entity might do. This means that any solution path seen will always remain valid, if followed from that point onward. We therefore trap any sight of a solution state and commit the path we saw to it as a 'plan', after which, when asked for a move, we simply read the next move off the plan (hence the very fast response that Sancho exhibits in puzzles once it solves it, since no further searching is required).
Opportunities for caching plans arise both during tree node expansion (when a child turns out to be terminal and a solution), and during MonteCarlo playouts (from which we return the full path played in 1player games).
For many of the typical puzzles encountered, a solution will be found during metagaming time, so as soon as the game manager starts asking for moves, we will be able to feed them back as fast as it can consume them. In the recent tournament we were able to find and cache plans during metagaming for each of the following:
Plans
The first observation that can be made universally about 1player (deterministic, complete information) games, is that the player is entirely in charge of its own fate, and doesn't have to worry about what some other entity might do. This means that any solution path seen will always remain valid, if followed from that point onward. We therefore trap any sight of a solution state and commit the path we saw to it as a 'plan', after which, when asked for a move, we simply read the next move off the plan (hence the very fast response that Sancho exhibits in puzzles once it solves it, since no further searching is required).
Opportunities for caching plans arise both during tree node expansion (when a child turns out to be terminal and a solution), and during MonteCarlo playouts (from which we return the full path played in 1player games).
For many of the typical puzzles encountered, a solution will be found during metagaming time, so as soon as the game manager starts asking for moves, we will be able to feed them back as fast as it can consume them. In the recent tournament we were able to find and cache plans during metagaming for each of the following:
We were unable to find plans for Hunter or HunterBig, though for entirely different reasons. In Hunter the maximum achievable score is actually 87, though this is not (readily) inferrable from the rules, so although we find a maximumscoring solution fast, we cannot be sure it is maximal, and so do not cache it (and instead continue to search normally each move). In HunterBig, repeat visits to a previously visited square are allowed (not terminal), and this dramatically increases the size of the search space (and prevents greedy rollout cutoffs), so we simply fail to find an optimal solution at all.
Target States
In some games it is possible to identify the state you need to reach to achieve a solution by analysis of the propnet. Specifically, it is possible to generate the set of all such states by fixing the winning goal value proposition to true and flowing backward through the propnet feeding that goal proposition, until base proposition are reached on all paths (encountering an OR [or an AND when the output is required to be false] causes the search to branch, generally leading to N solutions for an Ninput OR).
For most games this is impractical since either the size of the set of solution states is extremely large (there are many possible solution states, only some of which may be reachable from the initial state of course), or the size of the set of interim propnet states encountered during the backward search blows up.
However, for some games it allows a state, or small set of states to be identified fairly easily. 8puzzle (and the family of sliding block puzzles generally) is one such, where there is exactly one solution configuration by construction.
Given a target state and a measure of distance between states, simple A* search can be used to identify the optimal path (strictly the distance measure must be what is known as an 'allowable heuristic', which basically means it never overestimates the number of moves needed to get between the two states). Using A* with a nonadmissible heuristic will still work, but may examine many more intermediate states during the search, and may not find the shortest possible path, if multiple paths exist.
In situations where our metagaming MonteCarlo simulation shows no differentiation in scores of playouts (everything scores 0, or at least the same), and we can identify a target state (and a couple of other minor conditions necessary for there to be a decent chance of having a halfdecent distancebetweenstates metric), we attempt to solve using A* during metagaming. If we fail we fall back on regular search. If we succeed we cache the plan found.
The state distance heuristic we use is just Hamming distance of the base proposition set (i.e.  how many of the base propositions are different between the states). This is an admissible heuristic for 8puzzle.
To reduce the search space further in such cases, we also strip out any base propositions which are independent of the moves made (i.e. the control logic, and in particular the step counter), which means that moves which return to a previous state (apart from the step counter  so previous configuration of the tiles in 8puzzle) are not considered. Strictly speaking, this leaves us solving only an approximation of the 'real' game, but it would require a fairly perverse goal specification for this to be a bad approximation. Note that we already had analysis to identify the control logic, as something similar is done to take those aspects out of the propnet prior to factorization attempts.
With the above it typically takes us a little under a second to solve 8puzzle.
Extending this to 15puzzle is somewhat more of a challenge (and on my list of things to look into). In particular this will require:
 Switching from A* to Iterated Deepening A* almost certainly (to keep memory usage within reasonable bounds)
 Identification of a much better heuristic than Hamming distance. If anyone is interested in sliding block puzzle generally, a good summary of solution methods may be found here. Identifying Manhattan distance from propnet analysis (or statistical analysis of simplified games) should be pretty easy, but although it's considerably better than Hamming distance, Manhattan distance will not be good enough to solve 15puzzle in reasonable time. I plan to look into attempts to create small pattern databases on the fly from solutions to subgames, but that's currently on the futureexperiments list.
As an aside, I plan to experiment with removal of the control logic (for state comparison purposes) in puzzles generally. I suspect it would make the difference to finding a perfect solution in hunterbig for example.
Goal Latches
A goal proposition, g is said to be positively latched by a base proposition, b, if g can never become false once b becomes true. Similarly a goal proposition, g', is said to be negatively latched by a base proposition, b', if g' can never become true once b' becomes true.
Analysis of the propnet can often reveal such latches. In some cases very simplistic analysis. If latched goals are identified, then any state in which a goal becomes positively latched can be considered terminal for search purposes (the goal value is fixed in all successor states, so no point in searching further). Similarly if the 'winning' goal becomes negatively latched (and you're only interested in 'wins').
Very simple analysis of this nature yields immediate solutions for simple games like a couple of those that featured (or tried to in one case) in the recent tournament:
 untwisty coridoor (this has no visualization, but basically its a 'maze' where you seal your fate if you step off a single safe path)
 untistycomplex2  this is a much more subtle version of the above, which addresses some issues with the simple version (it is trivially solvable by transposition analysis, even if you know nothing about latches). Unfortunately the GDL they tried to use was broken. I have provided the Stanford guys with a corrected version (which Sancho duly solves), but as of this writing the Stanford repository has not been updated with it to my knowledge, so I cannot provide a link
Move choice partitioning
Sudoku presents an interesting problem. No target state is identifiable (finding one would amount to knowing a solution in fact!), the branching factor is huge, and no score signals are generated except by the (unique usually, but certainty very sparse) solution itself. For these reasons, any statistical search that doesn't somehow radically trim the search space is doomed from the outset.
Fortunately, the encoding of Sudoku makes the constraints implicit in the definition of the legals  that is it is only legal to play moves that do not immediately break one of the noduplicatesinaroworcolumnorsquare constraints. Having these constraints implicit in the encoding makes the problem a lot easier than it would otherwise be.
Several things become apparent from fairly simple analysis of the propnet:
 The 100score goal is only achievable if no cell is marked with an X in the terminal state (that is you've managed to apply legal numeric moves to very cell). That is to say some set X' of base props disable the solution, so in any solution state, S, S^X' = 0
 All cell state base propositions apart from those that mark the cells with an X are themselves latches (that is to say, once a cell is numerically marked it can never become unmarked)
 All Xmarkings are negative latches (once unset they cannot become set)
 The moves upon which X markings of cells are directly dependent (i.e.  those moves that can be reached from a X marking proposition by backward chaining through the propnet, without crossing a transition in the propnet apart from the base prop concerned's own transition) form a partitioning of the set of all moves. the equivalence sets of this partitioning are exactly the moves which change a single cell. Because of the negative latch nature of the X markings and the requirement that no X marked cells remain in the solution state, at least one move must be played from each partition.
In any game which exhibits the above properties, in any given turn it is only necessary to consider choices from one of the partitions of the move space. The ordering is arbitrary, but search is reduced by choosing the most constrained (smallest intersection of partition and current legal moves) at any given tree node. In Sudoku this means we'll consider the possibilities for the cell with the fewest legal numeric options still available to mark it with (often a single choice) in each state encountered during the search. The result is actually a very small search tree!
We implement all of the above with what we call state machine filters, which act as indirection layers through which isTerminal(), getGoal() and getLegalMoves() calls pass (and some latchrelated queries which are not directly relevant here). Such a filter can reduce the set of legal moves that the underlying state machine generates and provide a subset to the search algorithm running above it (the most constrained partition in the above analysis in the case of the movepartitioning filter). It can also modify terminality and goal values as seen by the search algorithm, which is necessary for other uses (which will be the subject of a future post) not relevant to the Sudoku solve.
In general, given game G with generated state machine S, the concept of state machine filters allows us to apply the results of analysis (typically during metagaming) to the state machine to transform it from S to S' for the purposes of search. Some filters preserve game semantics accurately (the move choice partitioning one does for example), so G' = G even though S' != S. However, filters which transform the game itself (and thus have G' != G) are also of interest. Such a filter would cause you to solve an approximation of the actual game  this may be a good enough heuristic in itself, or may be part of a mechanism to apply 'strategic' guidance to a full search in G. By architecting the filters as a separate layer we can isolate the analysis and its use without requiring either the underlying state machine implementation to change, or the overlying search algorithm to change (once they are enabled to cope with the insertion of filters generically in the first place)
Pseudo Puzzles
Some (contrived!) Nplayer games are just N 1player games bolted together (Dual Hunter for example). In such cases it is not necessary to search the combined game tree which includes the moves of other players, since their board states are irrelevant to our goals.
Sancho detects this by a dependency analysis on the goal propositions of its role, and if it finds that other roles are irrelevant (that is to say, their moves cannot impact our score), it entirely replaces the subgame in which that other role is involved, with a subgame that has exactly one move, which has no impact on any state. It also flags the game as a pseudopuzzle (if only its own role is left after replacing irrelevant ones), which enables plancaching. This is safe, because the planned sequence cannot be impacted by anything other roles do, and hence can be replayed regardless of their moves.
The net effect is that Sancho searches a greatly reduced search space. Multiknightstour, which is 2 10X10 knight's tour puzzles bolted together in this way, was played on the second day of the 2014 championships, but sadly not in Sancho's bracket, as it would have benefited significantly from this analysis.
A related case occurs in Chinook, which is essentially two games of checkers, played on the odd and even squares simultaneously. In the Stanford version of Chinook (which is different to the base version, as found on Tiltyard) one player scores for captures they make on the even board, and the other for captures they make on the odd board. This actually makes the board your role does NOT score on totally irrelevant to your own goals. Since Sancho successfully factors Chinook into two independent games, it actually goes further in this case and doesn't bother searching one tree at all. Instead it just noops in that tree (which is rather unfortunate for its opponent given the way they score in Chinook!). I should note that I think Stanford Chinook is NOT a good game for tournaments in future unless the elective noop is removed. Even then I think that base.Chinook is a better factorized game test for competition purposes.
Thursday, July 31, 2014
Sancho is GGP Champion 2014!
At approximately 1:30am this morning (BST), Sancho was crowned
General Game Playing Champion 2014. Sadly I wasn’t there to witness it
because I was feeling unwell and had gone to bed. However, I was pleased
to wake up to the good news this morning.
The final twelve players competed in the doubleelimination
phase last night. Sancho played against 3 of them, winning 21, 20, 31.
[The links below are to visualizations of the matches. Several of them work best in Chrome. Use the right/left arrows to see how the match progressed.]
 Round 1 (best of 3) vs Galvanise written by Richard, a fellow Edinburgher who won the 2^{nd} Coursera GGP course
 Loss in 9board tictactoe (not the best start) against the bias
 Win in 9board tictactoe with the bias
 Win in Connect 4 (which is actually a provable win for the 2^{nd} player on this size of board, but the current crop of players show a small bias for the 1^{st} player), playing first
 Round 2 (best of 3) vs QFWFQ, a competitor of ours in the 1^{st} Coursera GGP course, last autumn
 Walkover in Hex
 Win in Joint Connect 4 (two simultaneous games of connect 4, where you have to get a line of 4 on the top board or force your opponent to make a line of 4 on the bottom board)
 Grand Final (best of 5) vs LeJoueur, longtime GGP player written by JeanNoel Vittaut  a researcher in machine learning in Paris University (I think).
 Win in Hex after the opponent blundered a very strong position. (By my reckoning, he shouldn't have connected to the bottom at move 14 because there were still two options open. By connecting early, he lost a move at the top and never recovered.)
 Loss in Breakthrough – our first in many months. I’m told this was a good game. (Aim is to get a pawn through the other side. Pawns can move straight or diagonally and can take diagonally only.) I haven't analysed this yet.
 Win in a contrived combination of tictactoe, chess and connect 4, which we usually play badly at and were playing against the bias. I haven't looked at this yet.
 Win in Joint Connect 4. This is a strong game for us because Sancho factors it into the two individual games (thereby massively reducing the branching factor).
We were disappointed not to get to play the reigning champion (TurboTurtle by Sam) who was eliminated in his first round, although we often get to play GreenShell (which we strongly suspect is the same as TurboTurtle  but Sam has declined several opportunities to confirm) on the Tiltyard.
With many thanks to Bertrand, Ben, Eric & Todor for organising this event and thanks to all the other competitors for a good game.
We plan to be back in January 2015 (the AAAI conference is moving forward 6 months) and Steve hopes to be there in person. Polish up those players and come and show us how it should really be done!
More blog posts to follow in the coming week or so about some of what made Sancho the champion this time round.
Wednesday, July 30, 2014
Finals Day 1 roundup
[This article was really pitched at colleagues / family, so the GGP regulars will know most of the detail.]
Over 8 hours yesterday, our player, “Sancho”, played 18 matches and finished at the top of the table, more than a complete game clear of the next nearest opponent. We lost 1 game (where we were playing second in a game against a significant 1^{st}player bias) and dropped a few points here and there on some of the other games.
·
Alquerque
– a nonzero sum game. Capturing an opponent’s piece gets you 10
points. You play 10 moves each. Trading pieces is good in this
stage of the tournament because everybody is involved in a match of
this. Therefore a 90/100 loss is better than a 10/0 win.
· Breakthrough – try to get a pawn to the other side of the board. This ought to have been a real showcase game for us – but a technical glitch means our match wasn't saved, so the sample shown is some of our competitors playing.
· Pentago is like noughts and crosses, with a twist (literally). There are 4 boards in a grid. On each turn you make your mark and then twist the board. 5 in a row to win. We won this playing first and second (which is much harder).
· 9board tictactoe. The position you play on your turn defines the board that your opponent must play on in the next turn. Another game with a 1^{st}player bias, we won playing first, but lost playing second.
· Hex – connect the edges of the board. Your opponent is trying to do the same at right angles to you. This was played whilst I was sleeping – looks like a complete walkover.
· Chinook is two games of draughts played simultaneously (on the white and black squares). Points scored for taking opponent pieces – but only on the board that’s the same colour as your opponent’s pieces. Not the best game, because you’re allowed to pass – meaning that you can force your opponent to score 0 (by never moving on the board he scores on) – which we duly did. The visualization only works in Chrome.
· Duidoku – a twoplayer game played on a Sudoku grid. But you aren’t trying to complete the grid. Instead, you’re trying to make it such that your opponent doesn’t have a legal play (i.e. can’t put any number in any cell without violating one of the Sudoku constraints). We hadn’t seen this before and, by the look of the results, it has a strong 1^{st}player bias. Since it was only played one way round, I suspect we got lucky here.
· And a contrived tictactoe based game that isn’t worth describing (or looking at).
·
A
Hamiltonian cycle.
· Hunter – which is a Knight’s Tour on a board of a size where a complete tour isn’t possible.
· And the same on a much bigger board – where we didn’t manage an optimal solution, dropping 7 points.
· A sliding tiles puzzle, which we solved in the smallest possible number of moves.
· Sudoku, where we were one of just 2 players to solve it.
· Also three contrived puzzles with no useful visualization.
Over 8 hours yesterday, our player, “Sancho”, played 18 matches and finished at the top of the table, more than a complete game clear of the next nearest opponent. We lost 1 game (where we were playing second in a game against a significant 1^{st}player bias) and dropped a few points here and there on some of the other games.
Player

Games

Points

Average

Sancho

18

1612

90

General

18

1478

82

QFWFQ

18

1472

82

Galvanise

18

1429

79


LeJoueur 
18

1412

78

TurboTurtle

18

1356

75

Gamer

18

1325

74

Alloy

18

1300

72

Dumalion

18

1272

71

MINIPlayer

18

1230

68

Knower

18

1038

58

LICAgent

18

1027

57

====
QuorumPlayer 
18

1017

57

MonkSaki

18

964

54

Valor

18

790

44

Ary

18

740

41

AIRush

18

491

27

Here are the games we played (where you can see us in action, turnbyturn, by pressing the left/right arrows).
· Breakthrough – try to get a pawn to the other side of the board. This ought to have been a real showcase game for us – but a technical glitch means our match wasn't saved, so the sample shown is some of our competitors playing.
· Pentago is like noughts and crosses, with a twist (literally). There are 4 boards in a grid. On each turn you make your mark and then twist the board. 5 in a row to win. We won this playing first and second (which is much harder).
· 9board tictactoe. The position you play on your turn defines the board that your opponent must play on in the next turn. Another game with a 1^{st}player bias, we won playing first, but lost playing second.
· Hex – connect the edges of the board. Your opponent is trying to do the same at right angles to you. This was played whilst I was sleeping – looks like a complete walkover.
· Chinook is two games of draughts played simultaneously (on the white and black squares). Points scored for taking opponent pieces – but only on the board that’s the same colour as your opponent’s pieces. Not the best game, because you’re allowed to pass – meaning that you can force your opponent to score 0 (by never moving on the board he scores on) – which we duly did. The visualization only works in Chrome.
· Duidoku – a twoplayer game played on a Sudoku grid. But you aren’t trying to complete the grid. Instead, you’re trying to make it such that your opponent doesn’t have a legal play (i.e. can’t put any number in any cell without violating one of the Sudoku constraints). We hadn’t seen this before and, by the look of the results, it has a strong 1^{st}player bias. Since it was only played one way round, I suspect we got lucky here.
· And a contrived tictactoe based game that isn’t worth describing (or looking at).
And here are the puzzles (1player games) that we solved.
· Hunter – which is a Knight’s Tour on a board of a size where a complete tour isn’t possible.
· And the same on a much bigger board – where we didn’t manage an optimal solution, dropping 7 points.
· A sliding tiles puzzle, which we solved in the smallest possible number of moves.
· Sudoku, where we were one of just 2 players to solve it.
· Also three contrived puzzles with no useful visualization.
This evening and into the early hours of tomorrow
(UK time), the top 12 teams in the table above will compete in a
doubleelimination tournament to declare the 2014 champion.
Tuesday, July 29, 2014
Tired but happy with the first day
The first day of the 2014 competition consisted mostly of puzzles and nonfixedsum 2 player games (with a couple of fixed sum ones thrown in to leaven the mix).
These were a mixture of factorizable and nonfactorizable games, with a few latchoptimizable examples thrown in for good measure.
The day was hardfought, and we started off slowly, with most of our worst games happening early on, notably a game of 9board TicTacToe as second player (it has a fairly strong first player role bias), which was our only complete loss of the day (this and Pentago were both played with roles reversed to remove the bias).
Things then started playing to our strengths. Most significantly a Sudoku to solve, which leveraged some significant latch and dependency analysis we perform (which will be the subject of a future post), and which only we and one other player were able to solve. By the time we came to a match of Hex, our direct competitor on the leaderboard was a player called 'General' (who's author has promised to post some descriptive information after the tournament, which I'll edit a link in to when I get it), who was also our opponent at Hex. Fortunately for us, we were able to generate a much better sampling rate to feed our MonteCarlo tree search (about 70K simulated games per turn, against only a couple of thousand), and that enabled us to win reasonably easily. I should note that 70K playouts per 20 second turn would ordinarily be considered a pretty poor sampling rate, but Hex is notorious for being hard to simulate quickly.
At the end of the day our we scored a total of 1612 points over 19 games (from a possible total of 1887 given the games concerned, as one puzzle had a max possible score of 87), putting us comfortably in the top spot. This means that tomorrow we'll have the benefit of playing in the top bracket of the doubleelimination phase, which means it takes two match losses to eliminate us (which is a significant advantage for the top 4 players, seeded to that bracket.
These were a mixture of factorizable and nonfactorizable games, with a few latchoptimizable examples thrown in for good measure.
The day was hardfought, and we started off slowly, with most of our worst games happening early on, notably a game of 9board TicTacToe as second player (it has a fairly strong first player role bias), which was our only complete loss of the day (this and Pentago were both played with roles reversed to remove the bias).
Things then started playing to our strengths. Most significantly a Sudoku to solve, which leveraged some significant latch and dependency analysis we perform (which will be the subject of a future post), and which only we and one other player were able to solve. By the time we came to a match of Hex, our direct competitor on the leaderboard was a player called 'General' (who's author has promised to post some descriptive information after the tournament, which I'll edit a link in to when I get it), who was also our opponent at Hex. Fortunately for us, we were able to generate a much better sampling rate to feed our MonteCarlo tree search (about 70K simulated games per turn, against only a couple of thousand), and that enabled us to win reasonably easily. I should note that 70K playouts per 20 second turn would ordinarily be considered a pretty poor sampling rate, but Hex is notorious for being hard to simulate quickly.
At the end of the day our we scored a total of 1612 points over 19 games (from a possible total of 1887 given the games concerned, as one puzzle had a max possible score of 87), putting us comfortably in the top spot. This means that tomorrow we'll have the benefit of playing in the top bracket of the doubleelimination phase, which means it takes two match losses to eliminate us (which is a significant advantage for the top 4 players, seeded to that bracket.
Subscribe to:
Posts (Atom)