CCCP: Architecture Design Framework

Chip multiprocessors are being designed and produced with ever-increasing numbers of cores. Each core executes a single thread; multiple threads are needed to take advantage of multiple cores. This is great when running many applications, or if applications are explicitly multi-threaded. However, there will always be a need to run traditional, single-thread applications: applications which cannot get any benefit from multi-core systems without assistance.

Our research in automatic thread extraction enables these programs to finally get improvements from a multi-core system. There are two traditional difficulties:

C/C++ can obfuscate the true nature of the program
communication/synchronization overhead can easily dominate execution time

We attack these problems by using a profiling, optimizing compiler that targets the the Voltron architecture. Specifically, the compiler:

focuses on statistical analysis (rather than provable analysis)
uses directed communication to reduce overhead (rather than communication via memory)

There are three types of parallelism that the compiler extracts:

Statistical Loop-Level Parallelism (LLP) iterations divided between cores in large chunks loops are identified as statistically independent
Fine-Grain Thread-Level Parallelism (TLP) overlap cache misses for memory parallelism separate control flow
Instruction-Level Parallelism (ILP) cores execute in lockstep exploits ILP across multiple cores directed communication and lockstep execution allow lowest communication latency

Relevant Publications

None.

Page last modified January 22, 2016.