With the advent of programmable GPUs, throughput oriented workloads are quickly moving onto these architectures. Workloads where throughput is more important than latency have shown tremendous performance on these architectures. The applications that have shown improvement on these programmable GPUs are data-parallel, floating point computation intensive, and dense matrix style applications. But these architectures are also very energy-inefficient. The power budget for such systems is already approaching 1KW per card. The plot below shows the performance per watt of architectures that have been used for throughput processing. As can be seen from the plot below, all architectures lie between 0.1Mops/mw to 10Mops/mw. Though power is not necessarily a significant drawback for GPUs now, futurisitic applications that will require more computations may have power acting as a serious impediment. Many projections already suggest that for a peta-flop of performance, thousands of kilo-watts will be required.

Efficiency of High Performance Computing

Efficiency of HPC

On the opposite of the spectrum is the case of smartphones. Their appetite for higher computation power has been increasing dramatically. But the nature of the device makes power/energy the biggest challenge. With newer and richer "apps" coming out everyday, the architectures of the processors on these cores is changing at a rapid pace. So what we are observing here is a convergence between scientific processing and processing on smartphone where power is the first-class challenge. The goal of this work is to come up with architecture designs which will lie at a higher efficiency than current architectures.

Our work on throughput processor design focusses on investigating architectural designs and techniques that can provide higher energy efficiency by better utilization of the hardware as compared to the current best GPU solutions. We are looking at workloads which have the above mentioned characteristics. These include both scientific and graphics applications. One solution is to develop ASICs or hardwired accelerators for common computation patterns. However we believe that this approach is orthogonal and focus on developing a fully programmable processor design for throughput computing.

Below is a list of some of the research we are (or have been) working on in the area of processors.

Page last modified January 22, 2016.