Publications
2008
DVFS in Loop Accelerators using BLADES
(paper: pdf; slides: ppt)
Ganesh Dasika, Shidhartha Das, Kevin Fan, Scott Mahlke and David Bull
Design Automation Conference 2008
Jun. 2008, pp. To Appear.
Orchestrating the Execution of Stream Programs on Multicore Platforms
(paper: pdf; slides: ppt)
Manjunath Kudlur and Scott Mahlke
Proc. ACM SIGPLAN 2008 Conference on Programming Languages Design and Implementation (PLDI)
Jun. 2008, pp. To Appear.
VEAL: Virtualized Execution Accelerator for Loops
(paper: pdf; slides: ppt)
Nathan Clark, Amir Hormati, and Scott Mahlke
Proc. Intl. Symposium on Computer Architecture (ISCA)
Jun. 2008, pp. To Appear.
Modulo Scheduling for Highly Customized Datapaths to Increase Hardware Reusability
(paper: pdf; slides: ppt)
Kevin Fan, Hyunchul Park, Manjunath Kudlur, and Scott Mahlke.
Proc. 2008 International Symposium on Code Generation and Optimization (CGO)
Apr. 2008, pp. 124-133.
Uncovering Hidden Loop Level Parallelism in Sequential Applications
(paper: pdf; slides: pdf)
Hongtao Zhong, Mojtaba Mehrara, Steve Lieberman, and Scott Mahlke.
Proc. 14th International Symposium on High-Performance Computer Architecture (HPCA)
Feb. 2008, pp. 290-301.

2007
StageNet: A Reconfigurable CMP Fabric for Resilient Systems
(paper: pdf; slides: ppt)
Shantanu Gupta, Shuguang Feng, Jason Blome, and Scott Mahlke.
2nd Reconfigurable and Adaptive Architecture Workshop (RAAW)
Dec. 2007.
Self-calibrating Online Wearout Detection
(paper: pdf; slides: ppt)
Jason Blome, Shuguang Feng, Shantanu Gupta, and Scott Mahlke.
Proc. 40th Intl. Symposium on Microarchitecture (MICRO)
Dec. 2007, pp. 109-120.
Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures
(paper: pdf; slides: ppt)
Michael Chu, Rajiv Ravindran, and Scott Mahlke.
Proc. 40th Intl. Symposium on Microarchitecture (MICRO)
Dec. 2007, pp. 369-378.
Hierarchical Coarse-grained Stream Compilation for Software Defined Radio
(paper: pdf; slides: ppt)
Yuan Lin, Manjunath Kudlur, Scott Mahlke, and Trevor Mudge
Proc. 2007 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2007, pp. 115-124.
The Next Generation Challenge for Software Defined Radio
(paper: pdf; slides: ppt)
Mark Woh, Sangwon Seo, Hyunseok Lee, Yuan Lin, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, and Krisztian Flautner
Proc. 7th Intl. Workshop on Systems, Architectures, Modeling, and Simulation (SAMOS)
Jul. 2007, pp. 343-354.
Code and Data Partitioning for Fine-grain Parallelism
(paper: pdf; slides: ppt)
Michael Chu and Scott Mahlke
Proc. ACM SIGPLAN/SIGBED 2007 Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)
Jun. 2007, pp. 161-164.
Compiler-Managed Partitioned Data Caches for Low Power
(paper: pdf; slides: ppt)
Rajiv Ravindran, Michael Chu, and Scott Mahlke
Proc. ACM SIGPLAN/SIGBED 2007 Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)
Jun. 2007, pp. 237-247.
Architecting a Reliable CMP Switch Architecture
(paper: pdf)
Kypros Constantinides, Stephen Plaza, Jason Blome, Valeria Bertacco, Scott Mahlke, Todd Austin, Bin Zhang, and Michael Orshansky
ACM Transactions on Architecture and Code Optimization
Vol. 4, No. 1, Mar. 2007, pp. 1-37.
Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping
(paper: pdf; slides: ppt)
Amir Hormati, Nathan Clark, and Scott Mahlke
Proc. 2007 International Symposium on Code Generation and Optimization (CGO)
Mar. 2007, pp. 147-157.
Liquid SIMD: Abstracting SIMD Hardware using Lightweight Dynamic Mapping
(paper: pdf; slides: ppt)
Nathan Clark, Amir Hormati, Sami Yehia, Scott Mahlke, and Krisztian Flautner
Proc. 2007 Intl. Symposium on High Performance Computer Architecture (HPCA)
Feb. 2007, pp. 216-227.
Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications
(paper: pdf; slides: ppt)
Hongtao Zhong, Steven A. Lieberman, and Scott A. Mahlke
Proc. 2007 Intl. Symposium on High Performance Computer Architecture (HPCA)
Feb. 2007, pp. 25-36.
SODA: A High-Performance DSP Architecture for Software-Defined Radio
(paper: pdf)
Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, and Krisztián Flautner
IEEE Micro (Micro's Top Picks in Computer Architecture for 2006)
Vol. 27, No. 1, Jan./Feb. 2007, pp. 114-123.

2006
Online Timing Analysis for Wearout Detection
(paper: pdf; slides: ppt)
Jason Blome, Shuguang Feng, Shantanu Gupta, Scott Mahlke.
2nd Workshop on Architectural Reliability (WAR)
Dec. 2006.
SPEX: A Programming Language for Software Defined Radio
(paper: pdf; slides: ppt)
Yuan Lin, Robert Mullenix, Mark Woh, Scott Mahlke, Trevor Mudge, Alastair Reid, Krisztian Flautner.
2006 Software Defined Radio Technical Conference and Product Exposition
Nov. 2006.
Streamroller: Compiler Orchestrated Synthesis of Accelerator Pipelines
(slides: ppt)
Manjunath Kudlur, Kevin Fan, Ganesh Dasika, and Scott Mahlke.
Workshop on Compiler Assisted SoC Assembly (CASA)
Oct. 2006.
Increasing Hardware Efficiency with Multifunction Loop Accelerators
(paper: pdf; slides: ppt)
Kevin Fan, Manjunath Kudlur, Hyunchul Park, and Scott Mahlke.
Proc. 2006 Intl. Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)
Oct. 2006, pp. 276-281.
Streamroller: Automatic Synthesis of Prescribed Throughput Accelerator Pipelines
(paper: pdf; slides: ppt)
Manjunath Kudlur, Kevin Fan, and Scott Mahlke.
Proc. 2006 Intl. Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)
Oct. 2006, pp. 270-275.
Modulo Graph Embedding: Mapping Applications onto Coarse-Grained Reconfigurable Architectures
(paper: pdf; slides: ppt)
Hyunchul Park, Kevin Fan, Manjunath Kudlur, and Scott Mahlke.
Proc. 2006 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2006, pp. 136-146.
Scalable Subgraph Mapping for Acyclic Computation Accelerators
(paper: pdf; slides: ppt)
Nathan Clark, Amir Hormati, Scott Mahlke, and Sami Yehia.
Proc. 2006 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2006, pp. 147-157.
Cost-Efficient Soft Error Protection for Embedded Microprocessors
(paper: pdf; slides: ppt)
Jason A. Blome, Shantanu Gupta, Shuguang Feng, Scott Mahlke, and Daryl Bradley.
Proc. 2006 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2006, pp. 421-431.
Design and Implementation of Turbo Decoders for Software Defined Radio
(paper: pdf; slides: ppt)
Yuan Lin, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, Alastair Reid, and Krisztian Flautner.
Proc. IEEE 2006 Workshop on Signal Processing Systems (SiPS)
Oct. 2006.
A Scalable Low-power Architecture For Software Radio
(slides: ppt)
Scott Mahlke
6th International Forum on Application-Specific Multi-Processor SoC (MPSoC)
Aug. 2006.
SODA: A Low-power Architecture For Software Radio
(paper: pdf; slides: ppt)
Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, and Krisztian Flautner
Proc. 33rd Intl. Symposium on Computer Architecture (ISCA)
Jun. 2006, pp. 89-100.
Compiler-directed Data Partitioning for Multicluster Processors
(paper: pdf; slides: ppt)
Michael Chu and Scott Mahlke
Proc. 4th Intl. Symposium on Code Generation and Optimization (CGO)
Mar. 2006, pp. 208-218.
BulletProof: A Defect-Tolerant CMP Switch Architecture
(paper: pdf; slides: ppt)
Kypros Constantinides, Stephen Plaza, Jason Blome, Bin Zhang, Valeria Bertacco, Scott Mahlke, Todd Austin, and Michael Orshansky
Proc. 12th Intl. Symposium on High-Performance Computer Architecture (HPCA)
Feb. 2006, pp. 3-14.

2005
Software Defined Radio - A High Performance Embedded Challenge
(paper: pdf; slides: ppt)
Hyunseok Lee, Yuan Lin, Yoav Harel, Mark Woh, Scott Mahlke, Trevor Mudge, and Krisztian Flautner
Proc. 2005 Intl. Conference on High Performance Embedded Architectures and Compilers (HiPEAC)
Nov. 2005, pp. 6-26.
Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System
(paper: pdf; slides: ppt)
Kevin Fan, Manjunath Kudlur, Hyunchul Park, and Scott Mahlke.
Proc. 38th Intl. Symposium on Microarchitecture (MICRO)
Nov. 2005, pp. 219-230.
A Microarchitectural Analysis of Soft Error Propagation in a Production-level Embedded Microprocessor
(paper: pdf; slides: ppt)
Jason Blome, Scott Mahlke, Daryl Bradley, and Krisztian Flautner.
1st Workshop on Architectural Reliability (WAR)
Nov. 2005.
Assessing SEU Vulnerability via Circuit-level Timing Analysis
(paper: pdf; slides: ppt)
Kypros Constantinides, Stephen Plaza, Jason Blome, Bin Zhang, Valeria Bertacco, Scott Mahlke, Todd Austin, and Michael Orshansky.
1st Workshop on Architectural Reliability (WAR)
Nov. 2005.
A System Solution for High-Performance, Low Power SDR
(paper: pdf; slides: ppt)
Yuan Lin, Hyunseok Lee, Yoav Harel, Mark Woh, Scott Mahlke, Trevor Mudge, and Krisztian Flautner
2005 Software Defined Radio Technical Conference and Product Exposition
Nov. 2005.
Automated Custom Instruction Generation for Domain-Specific Processor Acceleration
(paper: pdf)
Nathan Clark, Hongtao Zhong, and Scott Mahlke.
IEEE Transactions on Computers
Vol. 54, No. 10, Oct. 2005, pp. 1258-1270.
Exploring the Design Space of LUT-based Transparent Accelerators
(paper: pdf; slides: ppt)
Sami Yehia, Nathan Clark, Scott Mahlke, and Krisztian Flautner.
Proc. 2005 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Sep. 2005, pp. 11-21.
Compiler-directed Synthesis of Multifunction Loop Accelerators
(paper: pdf; slides: ppt)
Kevin Fan, Manjunath Kudlur, Hyunchul Park, and Scott Mahlke.
Workshop on Application Specific Processors (WASP)
Sep. 2005, pp. 91-98.
A Distributed Control Path Architecture for VLIW Processors
(paper: pdf; slides: ppt)
Hongtao Zhong, Kevin Fan, Scott Mahlke, and Michael Schlansker.
Proc. 14th Intl. Conference on Parallel Architectures and Compilation Techniques (PACT)
Sep. 2005, pp. 197-206.
Partitioning Variables across Multiple Register Windows to Reduce Spill Code in a Low-power Processor
(paper: pdf)
Rajiv Ravindran, Robert Senger, Eric Marsman, Ganesh Dasika, Matthew Guthaus, Scott Mahlke, and Richard Brown.
IEEE Transactions on Computers
Vol. 54, No. 8, Aug. 2005, pp. 998-1012.
Trimaran: An Infrastructure for Research in Instruction-Level Parallelism
(paper: pdf)
Lakshmi Chakrapani, John Gyllenhaal, Wen-mei Hwu, Scott Mahlke, Krishna Palem, and Rodric Rabbah.
Lecture Notes in Computer Science
Springer-Verlag, Vol. 3602, Aug. 2005, pp. 32-41.
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors
(paper: pdf; slides: ppt)
Nathan Clark, Jason Blome, Michael Chu, Scott Mahlke, Stuart Biles, and Krisztian Flautner.
Proc. 32nd Intl. Symposium on Computer Architecture (ISCA)
Jun. 2005, pp. 272-283.
A 16-bit, Low-Power Microcontroller with Monolithic MEMS-LC Clocking
(paper: pdf; slides: ppt)
Eric Marsman, Robert Senger, Michael McCorquodale, Mathew Guthaus, Rajiv Ravindran, Ganesh Dasika, Scott Mahlke, and Richard Brown.
Proc. Intl. Symposium on Circuits and Systems (ISCAS)
May 2005, pp. 624-627.
Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache
(paper: pdf; slides: ppt)
Rajiv Ravindran, Pracheeti Nagarkar, Ganesh Dasika, Eric Marsman, Robert Senger, Scott Mahlke, and Richard Brown.
Proc. 3rd Intl. Symposium on Code Generation and Optimization (CGO)
Mar. 2005, pp. 179-190.

2004
Application Specific Processing on a General Purpose Core via Transparent Instruction Set Customization
(paper: pdf; slides: ppt)
Nathan Clark, Manjunath Kudlur, Hyunchul Park, Scott Mahlke, and Krisztian Flautner.
Proc. 37th Intl. Symposium on Microarchitecture (MICRO)
Dec. 2004, pp. 30-40.
Automatic Synthesis of Customized Local Memories for Multicluster Application Accelerators
(paper: pdf; slides: ppt)
Manjunath Kudlur, Kevin Fan, Michael Chu, and Scott Mahlke.
Proc. IEEE 15th Intl. Conference on Application-Specific Systems, Architectures and Processors (ASAP)
Sep. 2004, pp. 304-314.
Compiler-directed Synthesis of Programmable Loop Accelerators
(slides: ppt)
Kevin Fan, Hyunchul Park, and Scott Mahlke.
2004 Workshop on Emerging Directions in Electronic Design Automation: Accelerating Time-to-market through Compiler-driven Optimization of Embedded Platforms
Sep. 2004.
Memory System Design Space Exploration for Low-Power, Real-time Speech Recognition
(paper: pdf; slides: ppt)
Rajeev Krishna, Scott Mahlke, and Todd Austin.
Proc. 2004 Intl. Conference on Hardware/Software Codesign and System Synthesis (CODES-ISSS)
Sep. 2004, pp. 140-145.
A Programmable Vector Coprocessor Architecture for Wireless Applications
(paper: pdf; slides: ppt)
Yuan Lin, Nadav Baron, Hyunseok Lee, Scott Mahlke, and Trevor Mudge.
Proc. 3rd Workshop on Application Specific Processors (WASP)
Sep. 2004.
OptimoDE: Programmable Accelerator Engines Through Retargetable Customization
(slides: ppt)
Nathan Clark, Hongtao Zhong, Kevin Fan, Scott Mahlke, Krisztian Flautner, and Koen Van Nieuwenhove.
Proc. Hot Chips 16
Aug. 2004.
Cost-Sensitive Partitioning in an Architecture Synthesis System for Multicluster Processors
(paper: pdf)
Michael L. Chu, Kevin C. Fan, Rajiv A. Ravindran, and Scott A. Mahlke.
IEEE Micro
Vol. 24, No. 3, May/Jun. 2004, pp. 10-20.
Mobile Supercomputers
(paper: pdf)
Todd Austin, David Blaauw, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, and Wayne Wolf.
IEEE Computer
Vol. 37, No. 5, May 2004, pp. 82-84.
FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths
(paper: pdf; slides: ppt)
Manjunath Kudlur, Kevin Fan, Michael Chu, Rajiv Ravindran, Nathan Clark, and Scott Mahlke.
Proc. 2nd Intl. Symposium on Code Generation and Optimization (CGO)
Mar. 2004, pp. 201-212.
Probabilistic Predicate-Aware Modulo Scheduling
(paper: pdf; slides: ppt)
Mikhail Smelyanskiy, Scott Mahlke, and Edward Davidson
Proc. 2nd Intl. Symposium on Code Generation and Optimization (CGO)
Mar. 2004, pp. 151-162.

2003
Automatic Design of Application Specific Instruction Set Extensions Through Dataflow Graph Exploration
Nathan Clark, Hongtao Zhong, Wilkin Tang and Scott Mahlke
International Journal of Parallel Programming
Vol. 31, No. 6, Dec. 2003, pp. 429-449.
Cost-Sensitive Operation Partitioning for Synthesizing Custom Multicluster Datapath Architectures
(paper: pdf; slides: ppt)
Michael L. Chu, Kevin C. Fan, Rajiv A. Ravindran and Scott A. Mahlke
Proc. 2nd Workshop on Application Specific Processors (WASP)
Dec. 2003. pp. 40-47.
Processor Acceleration Through Automated Instruction Set Customization
(paper: pdf; slides: ppt)
Nathan Clark, Hongtao Zhong, and Scott Mahlke
Proc. 36th Intl. Symposium on Microarchitecture (MICRO)
Dec. 2003. pp. 129-140.
Increasing the Number of Effective Registers in a Low-Power Processor Using a Windowed Register File
(paper: pdf; slides: ppt)
Rajiv A. Ravindran, Robert M. Senger, Eric D. Marsman, Ganesh S. Dasika, Matthew R. Guthaus, Scott A. Mahlke, and Richard B. Brown
Proc. 2003 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2003, pp. 125-136.
Architectural Optimizations for Low-Power, Real-Time Speech Recognition
(paper: pdf; slides: ppt)
Rajeev Krishna, Scott Mahlke, and Todd Austin
Proc. 2003 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2003, pp. 220-231.
Systematic Register Bypass Customization for Application-Specific Processors
(paper: pdf; slides: ppt)
Kevin Fan, Nathan Clark, Michael Chu, K.V. Manjunath, Rajiv Ravindran, Mikhail Smelyanskiy, and Scott Mahlke.
Proc. IEEE 14th Intl. Conference on Application-Specific Systems, Architectures and Processors (ASAP)
Jun. 2003, pp. 64-74.
Region-based Hierarchical Operation Partitioning for Multicluster Processors
(paper: pdf; slides: ppt)
Michael Chu, Kevin Fan, and Scott Mahlke.
Proc. ACM SIGPLAN 2003 Conference on Programming Languages Design and Implementation (PLDI)
Jun. 2003, pp. 300-311.
Predicate-Aware Scheduling: A Technique for Reducing Resource Constraints
(paper: pdf; slides: ppt)
Mikhail Smelyanskiy, Scott A. Mahlke, Edward S. Davidson, and Hsien-Hsin S. Lee.
Proc. 1st Intl. Symposium on Code Generation and Optimization (CGO)
Mar. 2003, pp. 169-178.

2002
Automatically Generating Custom Instruction Set Extensions
(paper: pdf; slides: ppt)
Nathan Clark, Wilkin Tang, and Scott Mahlke.
Proc. 1st Workshop on Application Specific Processors (WASP)
Nov. 2002, pp. 94-101.
Insights into the Memory Demands of Speech Recognition Algorithms
(paper: pdf; )
Rajeev Krishna, Scott Mahlke, and Todd Austin.
Proc. ACM/IEEE 2nd Workshop on Memory Performance Issues (WMPI)
May 2002.

Theses
Architectural and Compiler Mechanisms for Accelerating Single Thread Applications on Multicore Processors
(paper: pdf)
Hongtao Zhong. 2008.
Cooperative Data and Computation Partitioning for Decentralized Architectures
(paper: pdf)
Michael Chu. 2007.
Customizing the Computation Capabilities of Microprocessors
(paper: pdf)
Nathan Clark. 2007.
Hardware/Software Techniques for Memory Power Optimizations in Embedded Processors
(paper: pdf)
Rajiv Ravindran. 2007.
Hardware/Software Mechanisms for Increasing Resource Utilization on VLIW/EPIC Processors
(paper: pdf)
Mikhail Smelyanskiy. 2004.

Disclaimer: The documents contained on this page have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Page last modified April 20, 2008.