Transforming Sorted Arrays to Binary Search Trees using Reinforcement Learning
Published:
This report presents a reinforcement learning approach to constructing binary search trees (BSTs) from sorted arrays, challenging the classical “midpoint” method. We formulate the task as a custom environment in which a PPO agent sequentially programs a computational machine to position values in a BST-encoded array. A specially tailored reward function, using incremental scoring and distance-based penalties, guides the agent to valid BST placements even when partial solutions are encountered. We constrain the search space by restrictive action masking to prohibit invalid or nonsensical moves. Experimental results highlight the importance of reward shaping and action masking in stabilizing training. Our results demonstrate that a highly optimized learning environment is more important than the agent’s parameters. Although performance can be improved by using a transformer encoder. Our approach faces challenges when attempting to use more sophisticated control flow and longer input lengths.
