I go through the history of complexity: entropy, Kolmogorov complexity, Levin complexity, minimum description length, epiplexity, logical depth, and multiscale logical depth. I compare program length, runtime, and precision aspects of these theories. I relate them to neural network training dynamics and conjecture that logical depth is the most useful one.