Complex Dynamics of Microprocessor Performances During Program Execution
Regularity, Chaos, and Others
Hugues Berry, Daniel Garcia Perez, Olivier Temam
INRIA, CEA, INRIA
Modern computer architectures result from a rapidly growing evolution that can be traced back to the 1960s, when Moore observed that the number of transistors per integrated circuit displayed an exponential growth and predicted that this trend would continue . The so-called Moore’s law has indeed been maintained during the last 40 years, as transistor density doubled approximately every 18 months. Consequently, today computer processors rely on amazingly high numbers of transistors: the widespread Intel® Pentium® 4 contains 42 million transistors but the more recent Itanium® 2 possesses 410 million of them. Furthermore, a constant of this evolution is that processor speed (especially, its clock rate) by far outperforms memory operations. Hence, most recent advances in the field have mainly aimed at hiding memory latencies using engineering solutions (parallel executions, pipelining, cache memory systems). But this necessarily came with further increases of the processor complexity.
As a consequence, predicting the precise performance of microprocessors (the number of instructions executed each second) during execution of programs running on modern computer architectures has become increasingly difficult. For instance, one efficient way to optimize computer performance for a given program consists in fine-tuning the compiler options to adapt the compiler work to the considered architecture. Yet the complexity of modern architectures is such that rational optimizations, guided by a thorough knowledge of the architecture, are now less efficient, up to the point that more systematic automated search methods based on machine-learning, genetic algorithms, or iterative trial-and-error techniques are being investigated as possible replacements. Hence, on the basis of the high number of their components dedicated to performance improvement and the intricacy of their interactions, the instantaneous performance of modern microprocessors may be viewed as a complex system.
Here, we study the time-evolution of performance during program execution by current microprocessors. During the execution of archetypal programs (benchmarks), we record simultaneously several variables that are characteristic for the performance and memory operation efficiency (cache misses). These traces are then analyzed using a variety of methods for nonlinear time series analysis  (fluctuation analysis, characterization of the attractors obtained after embedding, surrogate data tests, etc.). Depending on the executed program, these studies evidence three dynamical categories. For several programs, the performance dynamics are regular periodic, as would intuitively be expected for microprocessor operations. But for many programs, the dynamics are much richer and highly variable. This variability seems to stem from pseudo-stochastic sources for several of these programs. But interestingly, the performance dynamics during the execution of several others display clear evidences of deterministic chaos, with sensitivities to initial conditions that are comparable to textbook chaotic systems .
These results imply that predicting performance on the basis of short samples of the total performance trace might be difficult for several programs. In a more general perspective, they reveal how the instantaneous performances of modern microprocessors constitute a complex (or at least complicated) system whose characterization would benefit from analysis with modern tools of nonlinear and complexity science.
 G. E. Moore (1965) Cramming more components onto integrated circuits, Electronics 38: 114–117.  T. Schreiber (1999) Interdisciplinary application of nonlinear time series methods, Physics Reports 308: 2–64.  H. Berry, D. Garcia Perez, O. Temam (2006) Chaos in computer performance, CHAOS 16: 013110.