Abstract
Globally-scalable route optimization based on human preferences remains an open problem. Although past work created increasingly general solutions for the inverse reinforcement learning (IRL) formulation, these have not been successfully scaled to world-sized MDPs (200M states), large datasets (110M samples), and nearly foundation-sized models (360M parameters). In this work, we surpass this scale through a series of advancements focused on graph compression, parallelization, and problem initialization based on dominant eigenvectors. We introduce Receding Horizon Inverse Planning (RHIP), an approximated IRL algorithm which generalizes existing work and enables control of key performance trade-offs via a planning horizon parameter. Our policy achieves an 18% improvement in global route quality, and, to our knowledge, is the largest instance of IRL in a real-world setting to date. We include insightful negative results on state-of-the-art eigenvalue solvers, and identify future opportunities to further improve performance via IRL-specific batching strategies. Our results show critical benefits to more sustainable modes of transportation (e.g. two-wheelers), where factors beyond journey time (e.g. route safety) play an outsized role.
Authors
Matt Barnes, Matthew Abueg, Oliver F. Lange, Matt Deeds, Jason Trader, Denali Molitor, Markus Wulfmeier*, Shawn O'Banion*
Venue
arXiv