How does gradient descent iteratively discover local minimas? How exactly was this algorithm invented from the ground up? What are the limitations of traditional gradient descent?
All dynamic programming problems may be modeled as a DAG of sub problems. Once you topologically sort that DAG, even tricky challenges, like finding the longest increasing subsequence or stacking cuboids, become as simple as tracing the longest path through your map.