Automatic Thread Extraction

Chip multiprocessors are being designed and produced with ever-increasing numbers of cores. Each core executes a single thread; multiple threads are needed to take advantage of multiple cores. This is great when running many applications, or if applications are explicitly multi-threaded. However, there will always be a need to run traditional, single-thread applications: applications which cannot get any benefit from multi-core systems without assistance.

Our research in automatic thread extraction enables these programs to finally get improvements from a multi-core system. There are two traditional difficulties:

We attack these problems by using a profiling, optimizing compiler that targets the the Voltron architecture. Specifically, the compiler: There are three types of parallelism that the compiler extracts:

Statistical Loop-Level Parallelism (LLP)
  • iterations divided between cores in large chunks
  • loops are identified as statistically independent
Fine-Grain Thread-Level Parallelism (TLP)
  • overlap cache misses for memory parallelism
  • separate control flow
Instruction-Level Parallelism (ILP)
  • cores execute in lockstep
  • exploits ILP across multiple cores
  • directed communication and lockstep execution allow lowest communication latency

Relevant Publications


Page last modified January 22, 2016.