Dag Mcts
NonArchimedeanMachineLearning.DAGMCTSConfig — Type
DAGMCTSConfigConfiguration parameters for the DAG-MCTS optimizer.
Fields
num_simulations::Int: Number of MCTS simulations to run per stepexploration_constant::Float64: UCT exploration constant c (usually √2 ≈ 1.41)degree::Int: Degree for child polydisc generation (passed tochildrenfunction)value_transform::Function: Transform from loss to value (default: sigmoid_transform())persist_table::Bool: Whether to persist the transposition table across optimization stepsselection_mode::SelectionMode: Strategy for selecting the next step (VisitCount,BestValue, orBestLoss)track_parents::Bool: Whether to retain parent pointers for debugging and verification
Design Decision (recorded for future experimentation)
The persist_table option allows experimenting with:
- persist_table=true: Reuse learned information across steps (may grow large)
- persist_table=false: Fresh search each step (like standard MCTS)
TODO: Add support for the following features from standard MCTS:
- strict mode for single-branch descent
- max_children limit
NonArchimedeanMachineLearning.DAGMCTSConfig — Method
DAGMCTSConfig(; kwargs...)Create a DAG-MCTS configuration with sensible defaults.
Keyword Arguments
num_simulations::Int=100: Number of simulations per stepexploration_constant::Float64=1.41: UCT exploration constantdegree::Int=1: Child generation degreevalue_transform::Function=sigmoid_transform(): Loss to value transformation (seesigmoid_transform,tanh_transform,negation_transform)persist_table::Bool=true: Whether to persist transposition table across stepsselection_mode::SelectionMode=VisitCount: Child selection strategy (VisitCount,BestValue, orBestLoss)track_parents::Bool=false: Whether to track parent pointers (needed for debug verification; off by default for performance)
NonArchimedeanMachineLearning.DAGMCTSNode — Type
DAGMCTSNode{S,T,N}A node in the DAG-MCTS search graph.
Unlike standard MCTS nodes, DAG nodes can have multiple parents since the same polydisc state can be reached via different action sequences.
Fields
polydisc::ValuationPolydisc{S,T,N}: The polydisc at this nodeparents::Vector{DAGMCTSNode{S,T,N}}: All parent nodes (can be multiple in a DAG)children::Vector{DAGMCTSNode{S,T,N}}: Child nodes indexed by action (1-based)visits::Int: Total visit count N(s) aggregated from all pathstotal_value::Float64: Sum of all values Q(s) backpropagated through this nodeis_expanded::Bool: Whether this node's children have been generatedis_terminal::Bool: True if expansion produces no children (precision limit reached)is_solved::Bool: True if terminal, or expanded with all children solvedproven_value::Float64: Exact value once solved (NaN if unsolved)unsolved_children_count::Int: Number of children not yet marked solved
Design Decision (recorded for future experimentation)
We track all parents in a vector rather than having no parent information. This is more expensive but useful for analysis and debugging. The alternative would be to track no parent info and rely purely on the explicit path during traversal.
NonArchimedeanMachineLearning.DAGMCTSNode — Method
DAGMCTSNode(polydisc::ValuationPolydisc{S,T,N}) where {S,T,N}Create a new DAG-MCTS node with the given polydisc and no parents.
NonArchimedeanMachineLearning.DAGMCTSState — Type
DAGMCTSState{S,T,N}State maintained across DAG-MCTS optimization steps.
Fields
root::DAGMCTSNode{S,T,N}: The current root node of the search graphtransposition_table::Dict{HashedPolydisc{S,T,N}, DAGMCTSNode{S,T,N}}: Global table mapping hashed polydisc states to node instancesstep_count::Int: Number of optimization steps takenbest_node::Union{DAGMCTSNode{S,T,N}, Nothing}: Running tracker of the best node by average valuebest_value::Float64: Best average value seen so farbest_root_child::Union{DAGMCTSNode{S,T,N}, Nothing}: Which direct child of root leads to best_nodebest_root_action::Int: Action index of bestrootchildmin_loss_node::Union{DAGMCTSNode{S,T,N}, Nothing}: Node with minimum raw loss evaluationmin_loss::Float64: Minimum raw loss seen so farmin_loss_root_child::Union{DAGMCTSNode{S,T,N}, Nothing}: Which direct child of root leads to minlossnode
NonArchimedeanMachineLearning.average_value — Method
average_value(node::DAGMCTSNode)Compute the average value Q(s)/N(s) of a node. Returns 0.0 if node has not been visited.
NonArchimedeanMachineLearning.backpropagate! — Function
backpropagate!(path::Vector{<:DAGMCTSNode}, value::Float64)Backpropagate a value through all nodes in the explicit path.
This is the "Stack Method" for DAG backpropagation - we iterate through the specific path taken during this traversal rather than following parent pointers (which would be ambiguous in a DAG).
Arguments
path: Vector of nodes from root to leaf representing this traversal's pathvalue: The transformed value to backpropagateeval_node: The leaf node where the loss was evaluatedloss_value: The raw loss value at the leaf (before value_transform)
Note
This updates the global N(s) and Q(s) statistics in each shared node instance.
NonArchimedeanMachineLearning.check_solved! — Method
check_solved!(node::DAGMCTSNode)Check if a DAG node should be marked as solved. A node is solved when expanded AND all children are solved. (DAG-MCTS always generates all children, so is_expanded implies fully expanded.)
Sets proven_value to the max of children's proven_values (single-agent optimization).
Returns true if the node was newly marked as solved.
NonArchimedeanMachineLearning.dag_mcts_descent — Method
dag_mcts_descent(loss::Loss, param::ValuationPolydisc{S,T,N}, state::DAGMCTSState{S,T,N}, config::DAGMCTSConfig) where {S,T,N}Perform one step of DAG-MCTS optimization.
This function follows the same interface as other optimizers (greedydescent, mctsdescent, etc.), making it compatible with OptimSetup.
Arguments
loss::Loss: The loss function structureparam::ValuationPolydisc{S,T,N}: Current parameter valuesstate::DAGMCTSState{S,T,N}: DAG-MCTS state (includes transposition table)config::DAGMCTSConfig: Configuration parameters
Returns
Tuple{ValuationPolydisc{S,T,N}, DAGMCTSState{S,T,N}, Bool}: New parameters, updated state, and convergence status
NonArchimedeanMachineLearning.dag_mcts_descent_init — Method
dag_mcts_descent_init(param::ValuationPolydisc{S,T,N}, loss::Loss, config::DAGMCTSConfig=DAGMCTSConfig()) where {S,T,N}Initialize an optimization setup for DAG-MCTS.
Arguments
param::ValuationPolydisc{S,T,N}: Initial parameter valuesloss::Loss: The loss function structureconfig::DAGMCTSConfig: DAG-MCTS configuration (uses defaults if not provided)
Returns
OptimSetup: Configured optimization setup for DAG-MCTS
Example
# Set up DAG-MCTS optimizer with persistent transposition table
config = DAGMCTSConfig(num_simulations=200, persist_table=true)
optim = dag_mcts_descent_init(param, loss, config)
# Run optimization
for i in 1:100
step!(optim)
if i % 10 == 0
println("Step \$i: Loss = ", eval_loss(optim))
println(" Unique nodes: ", length(optim.state.transposition_table))
end
endNonArchimedeanMachineLearning.dag_mcts_search — Method
dag_mcts_search(root::DAGMCTSNode{S,T,N}, table::Dict, loss::Loss, config::DAGMCTSConfig) where {S,T,N}Run DAG-MCTS from a root node and return the best child.
Performs config.num_simulations iterations of DAG-MCTS. Returns the best child according to the configured selection mode.
Returns
Tuple of (best_polydisc, best_node, converged).
NonArchimedeanMachineLearning.dag_mcts_simulation! — Method
dag_mcts_simulation!(root::DAGMCTSNode{S,T,N}, table::Dict, loss::Loss, config::DAGMCTSConfig) where {S,T,N}Perform one complete DAG-MCTS simulation.
The four phases of MCTS adapted for DAG:
- Selection: Follow UCT to a leaf, maintaining explicit path stack
- Expansion: Expand leaf using transposition table for deduplication
- Evaluation: Compute value at the leaf (or a child)
- Backpropagation: Update statistics along the explicit path
Returns
Float64: The value obtained from this simulation
NonArchimedeanMachineLearning.evaluate_node — Method
evaluate_node(node::DAGMCTSNode{S,T,N}, loss::Loss, config::DAGMCTSConfig) where {S,T,N}Evaluate a node using the loss function and transform to value.
Returns
Float64: The transformed value (higher is better)
NonArchimedeanMachineLearning.expand_node! — Method
expand_node!(node::DAGMCTSNode{S,T,N}, table::Dict, config::DAGMCTSConfig) where {S,T,N}Expand a node by generating its child polydiscs and linking them in the DAG.
Uses the transposition table to detect if any child polydisc already exists, linking to the existing node if so (the "Lookup & Link" step).
Arguments
node: The node to expandtable: The transposition tableconfig: Configuration parameters
Returns
Nothing (modifies node.children in place)
NonArchimedeanMachineLearning.find_best_node_in_dag — Method
find_best_node_in_dag(root::DAGMCTSNode, table::Dict)Recursively find the node with the best average value in the entire DAG.
Returns the node with highest average_value, considering only visited nodes.
NonArchimedeanMachineLearning.get_dag_stats — Method
get_dag_stats(state::DAGMCTSState)Get statistics about the DAG structure.
Returns
NamedTuple with:
unique_nodes: Number of unique nodes in transposition tabletotal_visits: Sum of all visit countsmulti_parent_nodes: Number of nodes with multiple parents (true transpositions)
NonArchimedeanMachineLearning.get_or_create_node! — Method
get_or_create_node!(table::Dict, polydisc::ValuationPolydisc{S,T,N}, parent::Union{DAGMCTSNode{S,T,N}, Nothing}=nothing) where {S,T,N}Look up a polydisc in the transposition table, creating a new node if not found.
This is the core "Lookup & Link" operation for DAG-MCTS:
- Wrap polydisc in HashedPolydisc (computes hash once)
- Check if hashed polydisc is in transposition_table via
get!(single hash probe) - If found: retrieve existing node instance; if not: create new node
- Link the parent if provided
Arguments
table: The transposition table (keyed by HashedPolydisc)polydisc: The polydisc state to look upparent: Optional parent node to link
Returns
DAGMCTSNode{S,T,N}: The node for this polydisc (existing or newly created)
NonArchimedeanMachineLearning.print_dag_stats — Function
print_dag_stats(state::DAGMCTSState, max_depth::Int=3)Print statistics about the DAG-MCTS structure for debugging.
NonArchimedeanMachineLearning.propagate_solved_up_dag! — Method
propagate_solved_up_dag!(node::DAGMCTSNode, path::Vector{<:DAGMCTSNode})Propagate solved status upward in the DAG after node has been marked solved.
When parent pointers are available (track_parents=true), uses BFS to notify ALL parents across the DAG. When parent pointers are unavailable, propagates only along the explicit simulation path (pragmatic fallback — some parents may not be notified immediately, but will catch up in future simulations).
NonArchimedeanMachineLearning.rebuild_table_from_subtree! — Method
rebuild_table_from_subtree!(state::DAGMCTSState{S,T,N}, config::DAGMCTSConfig) where {S,T,N}Rebuild the transposition table to contain only nodes reachable from state.root.
Performs a BFS traversal down from the root, collecting all reachable nodes into a fresh dictionary. The old table is replaced, allowing Julia's GC to reclaim any unreachable nodes. If config.track_parents is enabled, parent arrays of surviving nodes are filtered to remove references to pruned ancestors.
This is the core of the "DAG-first" architecture: the DAG structure is maintained by node.children pointers, and the transposition table is merely an ephemeral index rebuilt each step.
NonArchimedeanMachineLearning.select_best_child_dag — Method
select_best_child_dag(root::DAGMCTSNode, table::Dict, config::DAGMCTSConfig)Select the best child of root according to the configured selection mode.
Selection Modes
VisitCount: Returns child with highest visit count (standard MCTS, robust)BestValue: Finds node with best average value in DAG, returns root's child leading to it (greedy)BestLoss: Finds leaf with minimum raw loss evaluation, returns root's child leading to it
Returns
The selected child node.
NonArchimedeanMachineLearning.select_child — Method
select_child(node::DAGMCTSNode, exploration_constant::Float64)Select the child with the highest UCT score.
Returns
Tuple of (actionindex, childnode) for the best child
NonArchimedeanMachineLearning.select_path — Method
select_path(root::DAGMCTSNode, exploration_constant::Float64)Select a path from root to a leaf node using UCT.
Unlike standard MCTS, we maintain an explicit path stack since nodes don't have single parent pointers.
Arguments
root: The root node to start fromexploration_constant: UCT exploration constant
Returns
Vector{DAGMCTSNode}: The path from root to leaf (inclusive)
NonArchimedeanMachineLearning.trace_to_root_child — Method
trace_to_root_child(target::DAGMCTSNode, root::DAGMCTSNode, table::Dict)Trace back from a node to find which direct child of root lies on a path to it.
Uses parent pointers to trace upward from the target to root, which is O(depth) instead of O(nodes). In a DAG, nodes can have multiple parents, so at each step we pick any parent that is an ancestor of root (BFS upward).
Returns
The direct child of root that can reach node, or nothing if not found.
NonArchimedeanMachineLearning.uct_score — Method
uct_score(node::DAGMCTSNode, parent_visits::Int, exploration_constant::Float64)Compute the UCT score for a node.
UCT(s) = Q(s)/N(s) + c * √(ln(N(parent)) / N(s))
where Q(s) and N(s) are the global statistics from the shared node instance, and N(parent) is the visit count of the parent in the current traversal.
Arguments
node: The child node being scoredparent_visits: Visit count of the parent node (N(parent))exploration_constant: The exploration constant c
Returns
Float64: The UCT score (higher = should explore)
NonArchimedeanMachineLearning.verify_transposition_table — Method
verify_transposition_table(state::DAGMCTSState)Verify the integrity of the transposition table.
Checks that:
- All nodes in the table have polydiscs equal to their keys (via isequal)
- All children of expanded nodes are in the table
- Parent relationships are consistent
Returns
Bool: true if table is consistent, false otherwise