Publications
2020
SIEVE: Speculative Inference on the Edge with Versatile Exportation
(paper: pdf; slides: pptx)
Babak Zamirai, Salar Latifi, Pedram Zamirai, Scott Mahlke
Proc. 57th Design Automation Conference (DAC)
July. 2020.
Path Sensitive Signatures for Control Flow Error Detection
(paper: pdf; slides: pptx)
Ze Zhang, Sunghyun Park, Scott Mahlke
21st International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)
June. 2020.
PolygraphMR: Enhancing the Reliability and Dependability of CNNs
(paper: pdf; slides: pptx)
Salar Latifi, Babak Zamirai, Scott Mahlke
50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
June. 2020.
Low-Cost Prediction-Based Fault Protection Strategy
(paper: pdf; slides: pptx)
Sunghyun Park, Shikai Li, Ze Zhang, Scott Mahlke
Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization (CGO)
Feb. 2020.

2019
Multi-objective Exploration for Practical Optimization Decisions in Binary Translation
(paper: pdf; slides: pptx)
Sunghyun Park, Youfeng Wu, Janghaeng Lee, Amir Aupov, Scott Mahlke
ESWEEK-TECS special issue / the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2019.
TF-Net: Deploying Sub-Byte Deep Neural Networks on Microcontrollers
(paper: pdf; slides: pptx)
Jiecao Yu, Andrew Lukefahr, Reetuparna Das, Scott Mahlke
ESWEEK-TECS special issue / the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2019.
POSTER: Pairing Up CNNs for High Throughput Deep Learning
(paper: pdf; slides: pptx)
Babak Zamirai, Salar Latifi, Scott Mahlke
2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)
Sep. 2019.
Characterization of Unnecessary Computations in Web Applications
(paper: pdf; slides: pptx)
Hossein Golestani, Scott Mahlke, Satish Narayanasamy
2019 Intl. Symposium on Performance Analysis of Systems and Software (ISPASS)
Mar. 2019.

2018
Scratch That (But Cache This): A Hybrid Register Cache/Scratchpad for GPUs
(paper: pdf; slides: pptx)
Jonathan Bailey, John Kloosterman, and Scott Mahlke
2018 Intl. Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES)
Oct. 2018.
Sculptor: Flexible Approximation with Selective Dynamic Loop Perforation
(paper: pdf; slides: pptx)
Shikai Li, Sunghyun Park, Scott Mahlke
Proc. 32nd International Conference on Supercomputing (ICS)
Jun. 2018.
Low Cost Transient Fault Protection Using Loop Output Prediction
(paper: pdf; slides: pptx)
Sunghyun Park, Shikai Li, Scott Mahlke
48th IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)
Jun. 2018.
Low Cost Transient Fault Protection Using Loop Output Protection
(paper: pdf; slides: pptx)
Sunghyun Park, Shikai Li, Scott Mahlke
14th IEEE workshop on Silicon Errors in Logic - System Effects (SELSE)
April. 2018.
In-Memory Data Parallel Processor
(paper: pdf; slides: pptx)
Daichi Fujiki, Scott Mahlke, Reetuparna Das
Proc. 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
Mar. 2018.

2017
Mirage Cores: The Illusion of Many Out-of-order Cores Using In-order Hardware
(paper: pdf; slides: pptx)
Shruti Padmanabha, Andrew Lukefahr, Reetuparna Das, Scott Mahlke
Proc. 50th IEEE/ACM International Symposium on Microarchitecture (MICRO)
Oct. 2017.
RegLess: Just-in-Time Operand Staging for GPUs
(paper: pdf; slides: pptx)
John Kloosterman, Jonathan Beaumont, D. Anoushe Jamshidi, Jonathan Bailey, Trevor Mudge, Scott Mahlke
Proc. 50th IEEE/ACM International Symposium on Microarchitecture (MICRO)
Oct. 2017.
DeftNN: addressing bottlenecks for DNN execution on GPUs via synapse vector elimination and near-compute data fission
(paper: pdf)
Parker Hill, Animesh Jain, Mason Hill, Babak Zamirai, Chang-Hong Hsu, Michael A. Laurenzano, Scott Mahlke, Lingjia Tang, Jason Mars
Proc. 50th IEEE/ACM International Symposium on Microarchitecture (MICRO)
Oct. 2017.
Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism
(paper: pdf; slides: pptx)
Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, Scott Mahlke
Proc. The 44th International Symposium on Computer Architecture (ISCA)
Jun. 2017.
Dynamic Resource Management for Efficient Utilization of Multitasking GPUs
(paper: pdf; slides: pptx)
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
Proc. 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
Apr. 2017.

2016
BugMD: Automatic Mismatch Diagnosis for Bug Triaging
(paper: pdf)
Biruk Mammo, Milind Furia, Valeria Bertacco, Scott Mahlke, Daya S Khudia
Proc. 35th Intl. Conference on Computer-Aided Design (ICCAD)
Nov. 2016.
Concise loads and stores: The case for an asymmetric compute-memory architecture for approximation
(paper: pdf)
Animesh Jain, Parker Hill, Shih-Chieh Lin, Muneeb Khan, Md E. Haque, Michael A. Laurenzano, Scott Mahlke, Lingjia Tang, Jason Mars
Proc. 49th IEEE/ACM Intl. Symposium on Microarchitecture (MICRO)
Oct. 2016.
A Bypass First Policy for Energy-Efficient Last Level Caches
(paper: pdf; slides: pptx)
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
Proc. International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)
Jul. 2016.
Input responsiveness: using canary inputs to dynamically steer approximation
(paper: pdf)
Michael A. Laurenzano, Parker Hill, Mehrzad Samadi, Scott Mahlke, Jason Mars and Lingjia Tang
Proc. 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
Jun. 2016, pp. 161--176.
Statistical Error Bounds for Data Parallel Applications
(paper: pdf)
Parker Hill, Michael Laurenzano, Babak Zamirai, Mehrzad Samadi, Scott Mahlke, Jason Mars, Lingjia Tang
The 2016 Workshop on Approximate Computing Across the Stack (WAX)
April. 2016.
Exploring Fine-Grained Heterogeneity with Composite Cores
(paper: pdf)
Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Faissal M. Sleiman, Ronald G. Dreslinski, Thomas F. Wenisch, and Scott Mahlke
IEEE Transactions on Computers (TC)
vol. 65, no. 2, Feb. 2016, pp.~535--547.
Quality Control for Approximate Accelerators by Error Prediction
(paper: pdf)
Daya S Khudia, Babak Zamirai, Mehrzad Samadi and Scott Mahlke
IEEE Design and Test
Vol. 33, No. 1, Jan. 2016, pp. 43-50.

2015
WarpPool: Sharing Requests with Inter-Warp Coalescing for Throughput Processors
(paper: pdf slides: pptx)
John Kloosterman, Jonathan Beaumont, Mick Wollman, Ankit Sethia, Ron Dreslinski, Trevor Mudge, and Scott Mahlke
Proc. 48th IEEE/ACM International Symposium on Microarchitecture (MICRO)
Dec. 2015.
DynaMOS: Dynamic Schedule Migration for Heterogeneous Cores
(paper: pdf slides: pptx)
Shruti Padmanabha, Andrew Lukefahr, Reetuparna Das, and Scott Mahlke
Proc. 48th IEEE/ACM International Symposium on Microarchitecture (MICRO)
Dec. 2015.
ELF: Maximizing Memory-level Parallelism for GPUs with Coordinated Warp and Fetch Scheduling
(paper: pdf; slides: pptx)
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
Proc. International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
Nov. 2015.
Orchestrating Multiple Data-Parallel Kernels on Multiple Devices
(paper: pdf; slides: pptx)
Janghaeng Lee, Mehrzad Samadi, and Scott Mahlke
Proc. 24th Intl. Conference on Parallel Architectures and Compilation Techniques (PACT)
Oct. 2015.
Fine Grain Cache Partitioning using Per-Instruction Working Blocks
(paper: pdf; slides: pptx)
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
Proc. 24th International Conference on Parallel Architectures and Compilation Techniques (PACT)
Oct. 2015.
SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration
(paper: pdf)
Janghaeng Lee, Mehrzad Samadi, Yongjun Park, and Scott Mahlke
ACM Transactions on Computer Systems (TOCS)
Aug. 2015.
Rumba: An Online Quality Management System for Approximate Computing
(paper: pdf; slides: pptx)
Daya S Khudia, Babak Zamirai, Mehrzad Samadi and Scott Mahlke
Proc. The 42nd International Symposium on Computer Architecture (ISCA)
Jun. 2015.
Colony of NPUs: Scaling the Efficiency of Neural Accelerators
(paper: pdf; slides: key)
Babak Zamirai, Daya S Khudia, Mehrzad Samadi, and Scott Mahlke
Proc. The 2015 Workshop on Approximate Computing Across the Stack (WAX)
Jun. 2015.
Approximating with Input Level Granularity
(paper: pdf; slides: pdf)
Parker Hill, Michael Laurenzano, Mehrzad Samadi, Scott Mahlke, Jason Mars, and Lingjia Tang
Proc. The 2015 Workshop on Approximate Computing Across the Stack (WAX)
Jun. 2015.
Adaptive Cache Partitioning on a Composite Core
(paper: pdf; slides: pptx)
Jiecao Yu, Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Scott Mahlke
The 3rd Annual Workshop on Parallelism in Mobile Platforms (PRISM-3)
Jun. 2015.
Accelerating Asynchronous Programs through Event Sneak Peek
(paper: pdf; slides: pptx)
Gaurav Chadha, Scott Mahlke, Satish Narayanasamy
Proc. The 42nd International Symposium on Computer Architecture (ISCA)
Jun. 2015.
Accelerating Mobile Applications through Flip-Flop Replication
(paper: pdf)
Mark Gordon, David Ke Hong, Peter M. Chen, Jason Flinn, Scott Mahlke, and Z. Morley Mao
Proc. 13th Intl. Conference on Mobile Systems, Applications, and Services
May 2015, pp.~137--150.
Chimera: Collaborative Preemption for Multitasking on a Shared GPU
(paper: pdf; slides: pptx)
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
Proc. 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
Mar. 2015.
Mascar: Speeding up GPU Warps by Reducing Memory Pitstops
(paper: pdf; slides: pptx)
Ankit Sethia, D. Anoushe Jamshidi and Scott Mahlke
The 21st IEEE Symposium on High Performance Computer Architecture (HPCA)
Feb. 2015.
Using Graphics Processing Units in an LTE Base Station
(paper: pdf)
Qi Zheng, Yajing Chen, Hyunseok Lee, Ronald Dreslinski, Chaitali Chakrabarti, Achilleas Anastasopoulos, Scott Mahlke, Trevor Mudge
Journal of Signal Processing Systems
vol. 78, no. 1, Jan. 2015, pp.~35--47.

2014
Equalizer: Dynamic Tuning of GPU Resources for Efficient Execution
(paper: pdf; slides: pptx)
Ankit Sethia and Scott Mahlke
The 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
Dec. 2014.
Harnessing Soft Computations for Low-budget Fault Tolerance
(paper: pdf; slides: pptx)
Daya Shanker Khudia and Scott Mahlke
The 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
Dec. 2014.
Scaling Performance via Self-Tuning Approximation for Graphics Engines
(paper: pdf)
Mehrzad Samadi, Janghaeng Lee, D. Anoushe Jamshidi, Amir Hormati, and Scott Mahlke
ACM Transactions on Computer Systems (TOCS)
Aug. 2014.
EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications
(paper: pdf; slides: pptx)
Gaurav Chadha, Scott Mahlke, Satish Narayanasamy
Proc. 23nd Intl. Conference on Parallel Architectures and Compilation Techniques (PACT)
Sep. 2014.
D2MA: Accelerating Coarse-Grained Data Transfer for GPUs
(paper: pdf; slides: pptx)
D. Anoushe Jamshidi, Mehrzad Samadi, and Scott Mahlke
Proc. 23nd Intl. Conference on Parallel Architectures and Compilation Techniques (PACT)
Sep. 2014.
VAST: The Illusion of a Large Memory Space for GPUs
(paper: pdf; slides: pptx)
Janghaeng Lee, Mehrzad Samadi, and Scott Mahlke
Proc. 23nd Intl. Conference on Parallel Architectures and Compilation Techniques (PACT)
Sep. 2014.
Heterogeneous Microarchitectures Trump Voltage Scaling for Low-Power Cores
(paper: pdf; slides: pptx)
Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Ronald Dreslinski Jr., Thomas F. Wenisch, and Scott Mahlke
Proc. 23nd Intl. Conference on Parallel Architectures and Compilation Techniques (PACT)
Sep. 2014.
Embracing Heterogeneity with Dynamic Core Boosting
(paper: pdf; slides: pptx)
Hyoun Kyu Cho and Scott Mahlke
Proc. 2014 ACM International Conference on Computing Frontiers (CF)
May 2014.
CPU-GPU Collaboration for Output Quality Monitoring
(paper: pdf; slides: pptx)
Mehrzad Samadi and Scott Mahlke
First Workshop on Approximate Computing Across the System Stack (WACAS)
Mar. 2014.
Paraprox: Pattern-Based Approximation for Data Parallel Applications
(paper: pdf; slides: pptx)
Mehrzad Samadi, D. Anoushe Jamshidi, Janghaeng Lee, and Scott Mahlke
Proc. 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
Mar. 2014.
Leveraging GPUs Using Cooperative Loop Speculation
(paper: pdf)
Mehrzad Samadi, Amir Hormati, Janghaeng Lee, and Scott Mahlke
ACM Transactions on Architecture and Code Optimization (TACO)
Feb. 2014.

2013
Trace Based Phase Prediction For Tightly-Coupled Heterogeneous Cores
(paper: pdf; slides: pdf)
Shruti Padmanabha, Andrew Lukefahr, Reetuparna Das, and Scott Mahlke
Proc. 46th IEEE/ACM International Symposium on Microarchitecture (MICRO)
Dec. 2013.
SAGE: Self-Tuning Approximation for Graphices Engines
(paper: pdf; slides: pptx)
Mehrzad Samadi, Janghaeng Lee, D. Anoushe Jamshidi, Amir Hormati, and Scott Mahlke
Proc. 46th IEEE/ACM International Symposium on Microarchitecture (MICRO)
Dec. 2013.
Efficient Execution of Augmented Reality Applications on Mobile Programmable Accelerators
(paper: pdf; slides: pptx)
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
Proc. The 2013 International Conference on Field-Programmable Technology (ICFPT)
Dec. 2013.
APOGEE: Adaptive Prefetching On GPUs for Energy Efficiency
(paper: pdf; slides: pptx)
Ankit Sethia, Ganesh Dasika, Mehrzad Samadi, and Scott Mahlke
Proc. 22nd Intl. Conference on Parallel Architectures and Compilation Techniques (PACT)
Sep. 2013.
Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems
(paper: pdf; slides: pptx)
Janghaeng Lee, Mehrzad Samadi, Yongjun Park, and Scott Mahlke
Proc. 22nd Intl. Conference on Parallel Architectures and Compilation Techniques (PACT)
Sep. 2013.
Low Cost Control Flow Protection Using Abstract Control Signatures
(paper: pdf; slides: pptx)
Daya Shanker Khudia and Scott Mahlke
Proc. ACM SIGPLAN 2013 Conference on Languages, Compilers, Tools and Theory for Embedded Systems (LCTES)
Jun. 2013.
Concurrency Bugs in Multithreaded Software: Modeling and Analysis Using Petri Nets
(paper: pdf)
Hongwei Liao, Yin Wang, Hyoun Kyu Cho, Jason Stanley, Terence Kelly, Stéphane Lafortune, Scott Mahlke, and Spyros Reveliotis
Journal of Discrete Event Dynamic Systems
Vol. 23, Issue 2, Jun. 2013, pp. 157-195.
Practical Lock/Unlock Pairing for Concurrent Programs
(paper: pdf; slides: pptx)
Hyoun Kyu Cho, Yin Wang, Hongwei Liao, Terence Kelly, Stephane Lafortune, and Scott Mahlke
Proc. 2013 Intl. Symposium on Code Generation and Optimization
Feb. 2013.
Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications
(paper: pdf; slides: pptx)
Hyoun Kyu Cho, Tipp Moseley, Richard Hank, Derek Bruening, and Scott Mahlke
Proc. 2013 Intl. Symposium on Code Generation and Optimization
Feb. 2013.
Illusionist: Transforming Lightweight Cores into Aggressive Cores on Demand
(paper: pdf; slides: pdf)
Amin Ansari, Shuguang Feng, Shantanu Gupta, Josep Torrellas, and Scott Mahlke
Proc. 19th IEEE Intl. Symposium on High Performance Computer Architecture (HPCA)
Feb. 2013.

2012
A Customized Processor for Energy Efficient Scientific Computing
(paper: pdf)
Ankit Sethia, Ganesh Dasika, Trevor Mudge, and Scott Mahlke
IEEE Transactions on Computers
Vol. 61, No. 12, Dec. 2012, pp. 1711-1723.
Efficient Performance Scaling of Future CGRAs for Mobile Applications
(paper: pdf; slides: ppt)
Yongjun Park, Jason Jong Kyu Park, and Scott Mahlke
Proc. The 2012 International Conference on Field-Programmable Technology (FPT)
Dec. 2012.
Composite Cores: Pushing Heterogeneity into a Core
(paper: pdf; slides: pptx)
Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Faissal M. Sleiman, Ronald Dreslinski, Thomas F. Wenisch, and Scott Mahlke
Proc. 45th IEEE/ACM International Symposium on Microarchitecture (MICRO)
Dec. 2012.
Libra: Tailoring SIMD Execution using Heterogeneous Hardware and Dynamic Configurability
(paper: pdf; slides: ppt)
Yongjun Park, Jason Jong Kyu Park, Hyunchul Park, and Scott Mahlke
Proc. 45th IEEE/ACM International Symposium on Microarchitecture (MICRO)
Dec. 2012.
Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems
(paper: pdf; slides: ppt)
Hyoun Kyu Cho and Scott Mahlke
2012 Workshop on Near-threshold Computing
Dec. 2012.
When Less Is MOre (LIMO): Controlled Parallelism for Improved Efficiency
(paper: pdf; slides: pptx)
Gaurav Chadha, Scott Mahlke and Satish Narayanasamy
Proc. 2012 Intl. Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES)
Oct. 2012, pp. 141-150.
COMET: Code Offload by Migrating Execution Transparently
(paper: pdf; slides: ppt)
Mark S. Gordon, D. Anoushe Jamshidi, Scott Mahlke, Z. Morley Mao and Xu Chen
Proc. 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI)
Oct. 2012, pp. 93-106.
Efficient Soft Error Protection for Commodity Embedded Microprocessors using Profile Information
(paper: pdf; slides: pptx)
Daya Shanker Khudia, Griffin Wright, and Scott Mahlke
Proc. ACM SIGPLAN 2012 Conference on Languages, Compilers, Tools and Theory for Embedded Systems (LCTES)
Jun. 2012.
Adaptive Input-aware Compilation for Graphics Engines
(paper: pdf; slides: pptx)
Mehrzad Samadi, Amir Hormati, Mojtaba Mehrara, Janghaeng Lee, and Scott Mahlke
Proc. ACM SIGPLAN 2012 Conference on Programming Languages Design and Implementation (PLDI)
Jun. 2012.
Process Variation in Near-Threshold Wide SIMD Architecture
(paper: pdf; slides: ppt)
Sangwon Seo, Ronald Dreslinski, Mark Woh, Yongjun Park, Scott Mahlke, David Blaauw, Chaitali Chakrabarti, and Trevor Mudge
Proc. 49th Design Automation Conference (DAC)
Jun. 2012.
Runtime Asynchronous Fault Tolerance via Speculation
(paper: pdf)
Yun Zhang, Soumyadeep Ghosh, Jialu Huang, Jae W. Lee, Scott A. Mahlke, and David I. August
Proc. 2012 Intl. Symposium on Code Generation and Optimization (CGO)
Apr. 2012.
Automatic Speculative DOALL for Clusters
(paper: pdf)
Hanjun Kim, Nick P. Johnson, Jae W. Lee, Scott A. Mahlke, and David I. August
Proc. 2012 Intl. Symposium on Code Generation and Optimization (CGO)
Apr. 2012.
Reducing the Cost of Protection Against Soft Errors using Profile Based Analysis
(paper: pdf; slides: pptx)
Daya Shanker Khudia, Griffin Wright, and Scott Mahlke
8th IEEE workshop on Silicon Errors in Logic - System Effects (SELSE)
Mar. 2012.
SIMD Defragmenter: Efficient ILP Realization on Data-parallel Architectures
(paper: pdf; slides: ppt)
Yongjun Park, Sangwon Seo, Hyunchul Park, Hyoun Kyu Cho, and Scott Mahlke
Proc. 17th Intl. Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
Mar. 2012.
Paragon: Collaborative Speculative Loop Execution on GPU and CPU
(paper: pdf slides: ppt)
Mehrzad Samadi, Amir Hormati, Janghaeng Lee, and Scott Mahlke
Fifth Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU)
Mar. 2012.

2011
Encore: Low-Cost, Fine-Grained Transient Fault Recovery
(paper: pdf; slides: ppt)
Shuguang Feng, Shantanu Gupta, Amin Ansari, Scott Mahlke, and David August
Proc. 44th IEEE/ACM International Symposium on Microarchitecture (MICRO)
Dec. 2011.
Bundled Execution of Recurring Traces for Energy-Efficient General Purpose Processing
(paper: pdf; slides: ppt)
Shantanu Gupta, Shuguang Feng, Amin Ansari, Scott Mahlke, and David August
Proc. 44th IEEE/ACM International Symposium on Microarchitecture (MICRO)
Dec. 2011.
PEPSC: A Power Efficient Processor for Scientific Computing
(paper: pdf ; slides: ppt)
Ganesh Dasika, Ankit Sethia, Trevor Mudge, and Scott Mahlke
Proc. 20th Intl. Conference on Parallel Architectures and Compilation Techniques (PACT)
Oct. 2011.
Dynamically Accelerating Client-side Web Applications through Decoupled Execution
(paper: pdf ; slides: ppt)
Mojtaba Mehrara, and Scott Mahlke
Proc. 2011 Intl. Symposium on Code Generation and Optimization (CGO)
April 2011.
Sponge: Portable Stream Programming on Graphics Engines
(paper: pdf ; slides: pptx)
Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge and Scott Mahlke
Proc. 16th Intl. Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
Mar. 2011, pp 381-392.
Archipelago: A Polymorphic Cache Design for Enabling Robust Near-Threshold Operation
(paper: pdf ; slides:pptx)
Amin Ansari, Shuguang Feng, Shantanu Gupta, and Scott Mahlke
Proc. 17th IEEE Intl. Symposium on High Performance Computer Architecture (HPCA)
February 2011.
Dynamic Parallelization of JavaScript Applications Using an Ultra-lightweight Speculation Mechanism
(paper: pdf ; slides: pptx)
Mojtaba Mehrara, Po-Chun Hsu, Mehrzad Samadi, and Scott Mahlke
Proc. 17th IEEE Intl. Symposium on High Performance Computer Architecture (HPCA)
Feb. 2011, pp. 87-98.
A Power-Efficient 32b ARM ISA Processor Using Timing-error Detection and Correction for Transient-error Tolerance and Adaptation to PVT Variation
(paper: pdf)
David Bull, Shidhartha Das, Karthik Shivshankar, Ganesh Dasika, Krisztian Flautner, and David Blaauw
IEEE Journal of Solid-State Circuits (JSSCC)
Vol. 46, No. 1, Jan. 2011, pp. 18-31.
Maximizing Spare Utilization by Virtually Reorganizing Faulty Cache Lines
(paper: pdf)
Amin Ansari, Shantanu Gupta, Shuguang Feng, and Scott Mahlke
IEEE Transactions on Computers
Vol. 60, No. 1, Jan. 2011, pp. 35-49.
StageNet: A Reconfigurable Fabric for Constructing Dependable CMPs
(paper: pdf)
Shantanu Gupta, Shuguang Feng, Amin Ansari and Scott Mahlke
IEEE Transactions on Computers
Vol. 60, No. 1, Jan. 2011, pp. 5-19.

2010
Erasing Core Boudaries for Robust and Configurable Performance
(paper: pdf ; slides:pptx)
Shantanu Gupta, Shuguang Feng, Amin Ansari, and Scott Mahlke
Proc. 43rd Intl. Symposium on Microarchitecture (MICRO)
December 2010.
Putting Faulty Cores to Work
(paper: pdf)
Amin Ansari, Shuguang Feng, Shantanu Gupta, and Scott Mahlke
IEEE Micro
Vol. 30, No. 6, Nov. 2010, pp. 36-45.
Mighty Morphing Power-SIMD
(paper: ; slides: )
Ganesh Dasika, Mark Woh, Sangwon Seo, Nathan Clark, Trevor Mudge, and Scott Mahlke
Proc. 2010 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2010.
Resource Recycling: Putting Idle Resources to Work on a Composable Accelerator
(paper: pdf ; slides:pptx)
Yongjun Park, Hyunchul Park, Scott Mahlke and Sukjin Kim
Proc. 2010 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2010.
MEDICS: Ultra-Portable Processing for Medical Image Reconstruction
Ganesh Dasika, Ankit Sethia, Vincentius Robby, Trevor Mudge, and Scott Mahlke
Proc. 19th Intl. Conference on Parallel Architectures and Compilation Techniques (PACT)
Sep. 2010.
StageWeb: Interweaving Pipeline Stages into a Wearout and Variation Tolerant CMP Fabric
(paper: pdf ; slides: pptx)
Shantanu Gupta, Amin Ansari, Shuguang Feng, and Scott Mahlke
Proc. 40th Intl. Conference on Dependable Systems and Networks (DSN)
Jun. 2010.
Necromancer: Enhancing System Throughput by Animating Dead Cores
(paper: pdf ; slides: pptx)
Amin Ansari, Shuguang Feng, Shantanu Gupta, and Scott Mahlke
Proc. 37th Intl. Symposium on Computer Architecture (ISCA)
Jun. 2010.
Shoestring: Probabilistic Soft Error Reliability on the Cheap
(paper: pdf ; slides: pptx)
Shuguang Feng, Shantanu Gupta, Amin Ansari, and Scott Mahlke
Proc. 15th Intl. Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
Mar. 2010, pp 385-396.
MacroSS: Macro-SIMDization of Streaming Applications
(paper: pdf ; slides: pptx)
Amir Hormati, Yoonseo Choi, Mark Woh, Manjunath Kudlur, Rodric Rabbah, Trevor Mudge and Scott Mahlke
Proc. 15th Intl. Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
Mar. 2010, pp 285-296.
A Power-Efficient 32b ARM ISA Processor Using Timing-error Detection and Correction for Transient-error Tolerance and Adaptation to PVT Variation
(paper: pdf ; slides: pdf)
David Bull, Shidhartha Das, Karthik Shivshankar, Ganesh Dasika, Krisztian Flautner, and David Blaauw
Proc. 2010 Intl. Solid-State Circuits Conference (ISSCC)
Feb. 2010, pp 284-286.
Maestro: Orchestrating Lifetime Reliability in Chip Multiprocessors
(paper: pdf ; slides: pptx)
Shuguang Feng, Shantanu Gupta, Amin Ansari, and Scott Mahlke
Proc. 2010 Intl. Conference on High-Performance Embedded Architectures and Compilers (HiPEAC)
Jan. 2010, pp 186-200.
AnySP: Anytime Anywhere Anyway Signal Processing
(paper: pdf)
Mark Woh, Sangwon Seo, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, and Krisztian Flautner
IEEE Micro (2009 Top Picks in Computer Architecture)
Vol. 30, No. 1, Jan. 2010, pp. 81-91.
Mobile Computers for the Next-Generation Cell Phone
(paper: pdf)
Mark Woh, Scott Mahlke, Trevor Mudge, and Chaitali Chakrabarti
IEEE Computer
Vol. 43, No. 1, Jan. 2010, pp. 81-85.

2009
Eliminating Concurrency Bugs with Control Engineering
(paper: pdf)
Terence Kelly, Yin Wang, Stephane Lafortune, and Scott Mahlke
IEEE Computer
Vol. 42, No. 12, Dec. 2009, pp. 52-60.
Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtuzlied Execution for Mobile Multimedia Applications
(paper: pdf ; slides: pptx)
Hyunchul Park, Yongjun Park, and Scott Mahlke
Proc. 42nd Intl. Symposium on Microarchitecture (MICRO)
Dec. 2009, pp. 370-380.
ZerehCache: Armoring Cache Architectures in High Defect Density Technologies
(paper: pdf ; slides: pptx)
Amin Ansari, Shantanu Gupta, Shuguang Feng, and Scott Mahlke
Proc. 42nd Intl. Symposium on Microarchitecture (MICRO)
Dec. 2009, pp. 100-110.
Low-Power Scientific Computing
(paper: pdf ; slides: pptx)
Ganesh Dasika, Ankit Sethia, Trevor Mudge, and Scott Mahlke
1st Workshop on New Directions in Computer Architecture
Dec. 2009
Multicore Compilation Strategies and Challenges
(paper: pdf)
Mojtaba Mehrara, Thomas Jablin, Dan Upton, David August, Kim Hazelwood, and Scott Mahlke
IEEE Signal Processing Magazine
Vol. 26, No. 6, Nov. 2009, pp. 55-63.
CGRA Express: Accelerating Execution using Dynamic Operation Fusion
(paper: pdf ; slides:ppt)
Yongjun Park, Hyunchul Park and Scott Mahlke
Proc. 2009 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2009, pp. 271-280.
Adaptive Online Testing for Efficient Hard Fault Detection
(paper: pdf ; slides:ppt)
Shantanu Gupta, Amin Ansari, Shuguang Feng and Scott Mahlke
Proc. 27th Intl. Conference on Computer Design (ICCD)
Oct. 2009, pp. 343-349.
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures
(paper: pdf ; slides: pptx)
Amir Hormati, Yoonseo Choi, Manjunath Kudlur, Rodric Rabbah, Trevor Mudge and Scott Mahlke
Proc. 18th Intl. Conference on Parallel Architectures and Compilation Techniques (PACT)
Sept. 2009, pp. 214-223.
Enabling Ultra Low Voltage System Operation by Tolerating On-Chip Cache Failures
(paper: pdf ; slides: pptx)
Amin Ansari, Shuguang Feng, Shantanu Gupta, and Scott Mahlke
Proc. 2009 Intl. Symposium on Low Power Electronics and Design (ISLPED)
Aug. 2009, pp. 307-310.
High Performance Mobile Computing Using Flexible Wide SIMD Processors
(slides: ppt)
Scott Mahlke
9th Intl. Forum on Embedded MPSoC and Multicore (MPSOC)
Aug. 2009.
Parade: A Versatile Parallel Architecture for Accelerating Pulse-Train Clustering
(paper: pdf ; slides: pptx)
Amin Ansari, Dan Zhang, and Scott Mahlke
Proc. 7th IEEE Symposium on Application Specific Processors (SASP)
Jul. 2009, pp. 88-93.
Power-Efficient Medical Image Processing using PUMA
(paper: pdf ; slides: pptx)
Ganesh Dasika, Kevin Fan and Scott Mahlke
Proc. 7th IEEE Symposium on Application Specific Processors (SASP)
Jul. 2009, pp. 29-34.
A Dataflow-centric Approach to Design Low Power Control Paths in CGRAs
(paper: pdf ; slides: ppt)
Hyunchul Park, Yongjun Park, and Scott Mahlke
Proc. 7th IEEE Symposium on Application Specific Processors (SASP)
Jul. 2009, pp. 15-20.
Liquid Metal's OPTIMUS: Synthesis of Efficient Streaming Hardware
(slides: pptx)
Scott Mahlke and Rodric Rabbah
Tutorial at 46th Design Automation Conference, High-Level Synthesis for ESL Design: Fundamentals and Case Studies
Jul. 2009.
Customizing Wide-SIMD Architectures for H.264
(paper: pdf ; slides: ppt)
Sangwon Seo, Mark Woh, Scott Mahlke, Trevor Mudge, Sundaram Vijay, and Chaitali Chakrabarti
Proc. 9th Intl. Symposium on Systems, Architectures, Modeling and Simulation (SAMOS)
Jul. 2009, pp. 172-179.
AnySP: Anytime Anywhere Anyway Signal Processing
(paper: pdf ; slides: ppt)
Mark Woh, Sangwon Seo, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, and Krisztian Flautner
Proc. 36th Intl. Symposium on Computer Architecture (ISCA)
Jun. 2009, pp. 128-139.
Parallelizing Sequential Applications on Commodity Hardware using a Low-cost Software Transactional Memory
(paper: pdf ; slides: ppt)
Mojtaba Mehrara, Jeff Hao, Po-Chun Hsu, and Scott Mahlke
Proc. ACM SIGPLAN 2009 Conference on Programming Languages Design and Implementation (PLDI)
Jun. 2009, pp. 166-176.
Reducing Control Power in CGRAs with Token Flow
(paper: pdf ; slides: ppt)
Hyunchul Park, Yongjun Park, and Scott Mahlke
Workshop on Optimizations for DSP and Embedded Systems (ODES)
Mar. 2009.
Stream Compilation for Real-time Embedded Multicore Systems
(paper: pdf ; slides: ppt)
Yoonseo Choi, Yuan Lin, Nathan Chong, Scott Mahlke and Trevor Mudge
Proc. 2009 International Symposium on Code Generation and Optimization (CGO)
Mar. 2009, pp. 210-220.
Bridging the Computation Gap Between Programmable Processors and Hardwired Accelerators
(paper: pdf; slides: ppt)
Kevin Fan, Manjunath Kudlur, Ganesh Dasika, and Scott Mahlke
Proc. 15th Intl. Symposium on High-Performance Computer Architecture (HPCA)
Feb. 2009, pp. 313-322.
The Theory of Deadlock Avoidance via Discrete Control
(paper: pdf ; slides: ppt)
Yin Wang, Stephane Lafortune, Terence Kelly, Manjunath Kudlur, and Scott Mahlke
Proc. 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL)
Jan. 2009, pp. 252-263.

2008
Gadara: Dynamic Deadlock Avoidance for Multithreaded Programs
(paper: pdf ; slides: ppt)
Ying Wang, Terence Kelly, Manjunath Kudlur, Stephane Lafortune, and Scott Mahlke
Proc. 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI)
Dec. 2008, pp. 281-294.
From SODA to Scotch: The Evolution of a Wireless Baseband Processor
(paper: pdf ; slides:ppt)
Mark Woh, Yuan Lin, Sangwon Seo, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, Richard Bruce, Danny Kershaw, Alastair Reid, Mladen Wilder, and Krisztian Flautner
Proc. 41st Intl. Symposium on Microarchitecture (MICRO)
Nov. 2008, pp. 152-163.
The StageNet Fabric for Constructing Resilient Multicore Systems
(paper: pdf ; slides:ppt)
Shantanu Gupta, Shuguang Feng, Amin Ansari, Jason Blome, and Scott Mahlke
Proc. 41st Intl. Symposium on Microarchitecture (MICRO)
Nov. 2008, pp. 141-151.
Adaptive Streaming for Dealing with Dynamic Heterogeneity
(slides: ppt)
Amir Hormati and Scott Mahlke
Workshop on Streaming Systems: From Web and Enterprise to Multicore
Nov. 2008.
A Reconfigurable Microarchitecture Building Block for Resilient CMP Systems
(paper: pdf ; slides:ppt)
Shantanu Gupta, Shuguang Feng, Amin Ansari, Jason Blome, and Scott Mahlke
Proc. 2008 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2008, pp. 1-10.
Optimus: Efficient Realization of Streaming Applications on FPGAs
(paper: pdf; slides: ppt)
Amir Hormati, Manjunath Kudlur, David Bacon, Scott Mahlke, and Rodric Rabbah
Proc. 2008 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2008, pp. 41-50.
Edge-centric Modulo Scheduling for Coarse-Grained Reconfigurable Architectures
(paper: pdf ; slides: ppt)
Hyunchul Park, Kevin Fan, Scott Mahlke, Taewook Oh, Heeseok Kim, and Hong-seok Kim.
Proc. 17th Intl. Conference on Parallel Architectures and Compilation Techniques (PACT)
Oct. 2008, pp. 166-176.
Reliable Systems on Unreliable Fabrics
(paper: pdf)
Todd Austin, Valeria Bertacco, Scott Mahlke, and Yu Cao
IEEE Design and Test of Computers
Vol. 25, No. 4, Jul. 2008, pp. 322-332.
A Parameterized Dataflow Language Extension for Embedded Streaming Systems
(paper: pdf slides: ppt)
Yuan Lin, Yoonseo Choi, Scott Mahlke, Trevor Mudge, and Chaitali Chakrabarti
Proc. Intl. Symposium on Systems, Architectures, Modeling and Simulation (SAMOS)
Jul. 2008, pp. 10-17.
Olay: Combat the Signs of Aging with Introspective Reliability Management.
(paper: pdf slides: ppt)
Shuguang Feng, Shantanu Gupta, and Scott Mahlke
The Workshop on Quality-Aware Design (W-QUAD)
Jun. 2008.
VEAL: Virtualized Execution Accelerator for Loops
(paper: pdf ; slides: ppt)
Nathan Clark, Amir Hormati, and Scott Mahlke
Proc. 35th Intl. Symposium on Computer Architecture (ISCA)
Jun. 2008, pp. 389-400.
DVFS in Loop Accelerators using BLADES
(paper: pdf ; slides: ppt)
Ganesh Dasika, Shidhartha Das, Kevin Fan, Scott Mahlke and David Bull
Proc. 45th Design Automation Conference (DAC)
Jun. 2008, pp. 894-897.
Orchestrating the Execution of Stream Programs on Multicore Platforms
(paper: pdf; slides: ppt)
Manjunath Kudlur and Scott Mahlke
Proc. ACM SIGPLAN 2008 Conference on Programming Languages Design and Implementation (PLDI)
Jun. 2008, pp. 114-124.
Integrating Post-programmability Into the High-level Synthesis Equation
(slides: ppt)
Scott Mahlke
Tutorial at 45th Design Automation Conference, High-level Synthesis: Back to the Future Workshop
Jun. 2008.
The Application of Supervisory Control to Deadlock Avoidance in Concurrent Software
(paper: pdf ; slides: ppt)
Yin Wang, Terence Kelly, Manjunath Kudlur, Scott Mahlke, and Stephane Lafortune.
9th Intl. Workshop on Discrete Event Systems (WODES)
May 2008.
Modulo Scheduling for Highly Customized Datapaths to Increase Hardware Reusability
(paper: pdf; slides: ppt)
Kevin Fan, Hyunchul Park, Manjunath Kudlur, and Scott Mahlke.
Proc. 2008 Intl. Symposium on Code Generation and Optimization (CGO)
Apr. 2008, pp. 124-133.
Analyzing the Scalability of SIMD for the Next Generation Software Defined Radio
(paper: pdf; slides: ppt)
Mark Woh, Yuan Lin, Sangwon Seo, Trevor Mudge and Scott Mahlke.
Proc. 2008 IEEE Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Mar. 2008, pp. 5388-5391.
Uncovering Hidden Loop Level Parallelism in Sequential Applications
(paper: pdf; slides: pdf)
Hongtao Zhong, Mojtaba Mehrara, Steve Lieberman, and Scott Mahlke.
Proc. 14th Intl. Symposium on High-Performance Computer Architecture (HPCA)
Feb. 2008, pp. 290-301.

2007
StageNet: A Reconfigurable CMP Fabric for Resilient Systems
(paper: pdf; slides: ppt)
Shantanu Gupta, Shuguang Feng, Jason Blome, and Scott Mahlke.
2nd Reconfigurable and Adaptive Architecture Workshop (RAAW)
Dec. 2007.
Self-calibrating Online Wearout Detection
(paper: pdf; slides: ppt)
Jason Blome, Shuguang Feng, Shantanu Gupta, and Scott Mahlke.
Proc. 40th Intl. Symposium on Microarchitecture (MICRO)
Dec. 2007, pp. 109-120.
Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures
(paper: pdf; slides: ppt)
Michael Chu, Rajiv Ravindran, and Scott Mahlke.
Proc. 40th Intl. Symposium on Microarchitecture (MICRO)
Dec. 2007, pp. 369-378.
Hierarchical Coarse-grained Stream Compilation for Software Defined Radio
(paper: pdf; slides: ppt)
Yuan Lin, Manjunath Kudlur, Scott Mahlke, and Trevor Mudge
Proc. 2007 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2007, pp. 115-124.
The Next Generation Challenge for Software Defined Radio
(paper: pdf; slides: ppt)
Mark Woh, Sangwon Seo, Hyunseok Lee, Yuan Lin, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, and Krisztian Flautner
Proc. 7th Intl. Workshop on Systems, Architectures, Modeling, and Simulation (SAMOS)
Jul. 2007, pp. 343-354.
Code and Data Partitioning for Fine-grain Parallelism
(paper: pdf ; slides: ppt)
Michael Chu and Scott Mahlke
Proc. ACM SIGPLAN/SIGBED 2007 Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)
Jun. 2007, pp. 161-164.
Compiler-Managed Partitioned Data Caches for Low Power
(paper: pdf ; slides: ppt)
Rajiv Ravindran, Michael Chu, and Scott Mahlke
Proc. ACM SIGPLAN/SIGBED 2007 Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)
Jun. 2007, pp. 237-247.
Architecting a Reliable CMP Switch Architecture
(paper: pdf)
Kypros Constantinides, Stephen Plaza, Jason Blome, Valeria Bertacco, Scott Mahlke, Todd Austin, Bin Zhang, and Michael Orshansky
ACM Transactions on Architecture and Code Optimization
Vol. 4, No. 1, Mar. 2007, pp. 1-37.
Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping
(paper: pdf ; slides: ppt)
Amir Hormati, Nathan Clark, and Scott Mahlke
Proc. 2007 Intl. Symposium on Code Generation and Optimization (CGO)
Mar. 2007, pp. 147-157.
Liquid SIMD: Abstracting SIMD Hardware using Lightweight Dynamic Mapping
(paper: pdf ; slides: ppt)
Nathan Clark, Amir Hormati, Sami Yehia, Scott Mahlke, and Krisztian Flautner
Proc. 2007 Intl. Symposium on High Performance Computer Architecture (HPCA)
Feb. 2007, pp. 216-227.
Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications
(paper: pdf ; slides: ppt)
Hongtao Zhong, Steven A. Lieberman, and Scott A. Mahlke
Proc. 2007 Intl. Symposium on High Performance Computer Architecture (HPCA)
Feb. 2007, pp. 25-36.
SODA: A High-Performance DSP Architecture for Software-Defined Radio
(paper: pdf)
Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, and Krisztián Flautner
IEEE Micro (Micro's Top Picks in Computer Architecture for 2006)
Vol. 27, No. 1, Jan./Feb. 2007, pp. 114-123.

2006
Online Timing Analysis for Wearout Detection
(paper: pdf ; slides: ppt)
Jason Blome, Shuguang Feng, Shantanu Gupta, Scott Mahlke.
2nd Workshop on Architectural Reliability (WAR)
Dec. 2006.
SPEX: A Programming Language for Software Defined Radio
(paper: pdf ; slides: ppt)
Yuan Lin, Robert Mullenix, Mark Woh, Scott Mahlke, Trevor Mudge, Alastair Reid, Krisztian Flautner.
2006 Software Defined Radio Technical Conference and Product Exposition
Nov. 2006.
Streamroller: Compiler Orchestrated Synthesis of Accelerator Pipelines
(slides: ppt)
Manjunath Kudlur, Kevin Fan, Ganesh Dasika, and Scott Mahlke.
Workshop on Compiler Assisted SoC Assembly (CASA)
Oct. 2006.
Increasing Hardware Efficiency with Multifunction Loop Accelerators
(paper: pdf ; slides: ppt)
Kevin Fan, Manjunath Kudlur, Hyunchul Park, and Scott Mahlke.
Proc. 2006 Intl. Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)
Oct. 2006, pp. 276-281.
Streamroller: Automatic Synthesis of Prescribed Throughput Accelerator Pipelines
(paper: pdf ; slides: ppt)
Manjunath Kudlur, Kevin Fan, and Scott Mahlke.
Proc. 2006 Intl. Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)
Oct. 2006, pp. 270-275.
Modulo Graph Embedding: Mapping Applications onto Coarse-Grained Reconfigurable Architectures
(paper: pdf ; slides: ppt)
Hyunchul Park, Kevin Fan, Manjunath Kudlur, and Scott Mahlke.
Proc. 2006 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2006, pp. 136-146.
Scalable Subgraph Mapping for Acyclic Computation Accelerators
(paper: pdf ; slides: ppt)
Nathan Clark, Amir Hormati, Scott Mahlke, and Sami Yehia.
Proc. 2006 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2006, pp. 147-157.
Cost-Efficient Soft Error Protection for Embedded Microprocessors
(paper: pdf; slides: ppt)
Jason A. Blome, Shantanu Gupta, Shuguang Feng, Scott Mahlke, and Daryl Bradley.
Proc. 2006 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2006, pp. 421-431.
Design and Implementation of Turbo Decoders for Software Defined Radio
(paper: pdf ; slides: ppt)
Yuan Lin, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, Alastair Reid, and Krisztian Flautner.
Proc. IEEE 2006 Workshop on Signal Processing Systems (SiPS)
Oct. 2006.
A Scalable Low-power Architecture For Software Radio
(slides: ppt)
Scott Mahlke
6th Intl. Forum on Application-Specific Multi-Processor SoC (MPSoC)
Aug. 2006.
SODA: A Low-power Architecture For Software Radio
(paper: pdf ; slides: ppt)
Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, and Krisztian Flautner
Proc. 33rd Intl. Symposium on Computer Architecture (ISCA)
Jun. 2006, pp. 89-100.
Compiler-directed Data Partitioning for Multicluster Processors
(paper: pdf ; slides: ppt)
Michael Chu and Scott Mahlke
Proc. 4th Intl. Symposium on Code Generation and Optimization (CGO)
Mar. 2006, pp. 208-218.
BulletProof: A Defect-Tolerant CMP Switch Architecture
(paper: pdf ; slides: ppt)
Kypros Constantinides, Stephen Plaza, Jason Blome, Bin Zhang, Valeria Bertacco, Scott Mahlke, Todd Austin, and Michael Orshansky
Proc. 12th Intl. Symposium on High-Performance Computer Architecture (HPCA)
Feb. 2006, pp. 3-14.

2005
Software Defined Radio - A High Performance Embedded Challenge
(paper: pdf ; slides: ppt)
Hyunseok Lee, Yuan Lin, Yoav Harel, Mark Woh, Scott Mahlke, Trevor Mudge, and Krisztian Flautner
Proc. 2005 Intl. Conference on High Performance Embedded Architectures and Compilers (HiPEAC)
Nov. 2005, pp. 6-26.
Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System
(paper: pdf ; slides: ppt)
Kevin Fan, Manjunath Kudlur, Hyunchul Park, and Scott Mahlke.
Proc. 38th Intl. Symposium on Microarchitecture (MICRO)
Nov. 2005, pp. 219-230.
A Microarchitectural Analysis of Soft Error Propagation in a Production-level Embedded Microprocessor
(paper: pdf ; slides: ppt)
Jason Blome, Scott Mahlke, Daryl Bradley, and Krisztian Flautner.
1st Workshop on Architectural Reliability (WAR)
Nov. 2005.
Assessing SEU Vulnerability via Circuit-level Timing Analysis
(paper: pdf ; slides: ppt)
Kypros Constantinides, Stephen Plaza, Jason Blome, Bin Zhang, Valeria Bertacco, Scott Mahlke, Todd Austin, and Michael Orshansky.
1st Workshop on Architectural Reliability (WAR)
Nov. 2005.
A System Solution for High-Performance, Low Power SDR
(paper: pdf ; slides: ppt)
Yuan Lin, Hyunseok Lee, Yoav Harel, Mark Woh, Scott Mahlke, Trevor Mudge, and Krisztian Flautner
2005 Software Defined Radio Technical Conference and Product Exposition
Nov. 2005.
Automated Custom Instruction Generation for Domain-Specific Processor Acceleration
(paper: pdf)
Nathan Clark, Hongtao Zhong, and Scott Mahlke.
IEEE Transactions on Computers
Vol. 54, No. 10, Oct. 2005, pp. 1258-1270.
Exploring the Design Space of LUT-based Transparent Accelerators
(paper: pdf ; slides: ppt)
Sami Yehia, Nathan Clark, Scott Mahlke, and Krisztian Flautner.
Proc. 2005 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Sep. 2005, pp. 11-21.
Compiler-directed Synthesis of Multifunction Loop Accelerators
(paper: pdf; slides: ppt)
Kevin Fan, Manjunath Kudlur, Hyunchul Park, and Scott Mahlke.
Workshop on Application Specific Processors (WASP)
Sep. 2005, pp. 91-98.
A Distributed Control Path Architecture for VLIW Processors
(paper: pdf; slides: ppt)
Hongtao Zhong, Kevin Fan, Scott Mahlke, and Michael Schlansker.
Proc. 14th Intl. Conference on Parallel Architectures and Compilation Techniques (PACT)
Sep. 2005, pp. 197-206.
Partitioning Variables across Multiple Register Windows to Reduce Spill Code in a Low-power Processor
(paper: pdf)
Rajiv Ravindran, Robert Senger, Eric Marsman, Ganesh Dasika, Matthew Guthaus, Scott Mahlke, and Richard Brown.
IEEE Transactions on Computers
Vol. 54, No. 8, Aug. 2005, pp. 998-1012.
Trimaran: An Infrastructure for Research in Instruction-Level Parallelism
(paper: pdf)
Lakshmi Chakrapani, John Gyllenhaal, Wen-mei Hwu, Scott Mahlke, Krishna Palem, and Rodric Rabbah.
Lecture Notes in Computer Science
Springer-Verlag, Vol. 3602, Aug. 2005, pp. 32-41.
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors
(paper: pdf ; slides: ppt)
Nathan Clark, Jason Blome, Michael Chu, Scott Mahlke, Stuart Biles, and Krisztian Flautner.
Proc. 32nd Intl. Symposium on Computer Architecture (ISCA)
Jun. 2005, pp. 272-283.
A 16-bit, Low-Power Microcontroller with Monolithic MEMS-LC Clocking
(paper: pdf ; slides: ppt)
Eric Marsman, Robert Senger, Michael McCorquodale, Mathew Guthaus, Rajiv Ravindran, Ganesh Dasika, Scott Mahlke, and Richard Brown.
Proc. Intl. Symposium on Circuits and Systems (ISCAS)
May 2005, pp. 624-627.
Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache
(paper: pdf ; slides: ppt)
Rajiv Ravindran, Pracheeti Nagarkar, Ganesh Dasika, Eric Marsman, Robert Senger, Scott Mahlke, and Richard Brown.
Proc. 3rd Intl. Symposium on Code Generation and Optimization (CGO)
Mar. 2005, pp. 179-190.

2004
Application Specific Processing on a General Purpose Core via Transparent Instruction Set Customization
(paper: pdf ; slides: ppt)
Nathan Clark, Manjunath Kudlur, Hyunchul Park, Scott Mahlke, and Krisztian Flautner.
Proc. 37th Intl. Symposium on Microarchitecture (MICRO)
Dec. 2004, pp. 30-40.
Automatic Synthesis of Customized Local Memories for Multicluster Application Accelerators
(paper: pdf ; slides: ppt)
Manjunath Kudlur, Kevin Fan, Michael Chu, and Scott Mahlke.
Proc. IEEE 15th Intl. Conference on Application-Specific Systems, Architectures and Processors (ASAP)
Sep. 2004, pp. 304-314.
Compiler-directed Synthesis of Programmable Loop Accelerators
(slides: ppt)
Kevin Fan, Hyunchul Park, and Scott Mahlke.
2004 Workshop on Emerging Directions in Electronic Design Automation: Accelerating Time-to-market through Compiler-driven Optimization of Embedded Platforms
Sep. 2004.
Memory System Design Space Exploration for Low-Power, Real-time Speech Recognition
(paper: pdf ; slides: ppt)
Rajeev Krishna, Scott Mahlke, and Todd Austin.
Proc. 2004 Intl. Conference on Hardware/Software Codesign and System Synthesis (CODES-ISSS)
Sep. 2004, pp. 140-145.
A Programmable Vector Coprocessor Architecture for Wireless Applications
(paper: pdf ; slides: ppt)
Yuan Lin, Nadav Baron, Hyunseok Lee, Scott Mahlke, and Trevor Mudge.
Proc. 3rd Workshop on Application Specific Processors (WASP)
Sep. 2004.
OptimoDE: Programmable Accelerator Engines Through Retargetable Customization
(slides: ppt)
Nathan Clark, Hongtao Zhong, Kevin Fan, Scott Mahlke, Krisztian Flautner, and Koen Van Nieuwenhove.
Proc. Hot Chips 16
Aug. 2004.
Cost-Sensitive Partitioning in an Architecture Synthesis System for Multicluster Processors
(paper: pdf)
Michael L. Chu, Kevin C. Fan, Rajiv A. Ravindran, and Scott A. Mahlke.
IEEE Micro
Vol. 24, No. 3, May/Jun. 2004, pp. 10-20.
Mobile Supercomputers
(paper: pdf)
Todd Austin, David Blaauw, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, and Wayne Wolf.
IEEE Computer
Vol. 37, No. 5, May 2004, pp. 82-84.
FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths
(paper: pdf ; slides: ppt)
Manjunath Kudlur, Kevin Fan, Michael Chu, Rajiv Ravindran, Nathan Clark, and Scott Mahlke.
Proc. 2nd Intl. Symposium on Code Generation and Optimization (CGO)
Mar. 2004, pp. 201-212.
Probabilistic Predicate-Aware Modulo Scheduling
(paper: pdf ; slides: ppt)
Mikhail Smelyanskiy, Scott Mahlke, and Edward Davidson
Proc. 2nd Intl. Symposium on Code Generation and Optimization (CGO)
Mar. 2004, pp. 151-162.

2003
Automatic Design of Application Specific Instruction Set Extensions Through Dataflow Graph Exploration
Nathan Clark, Hongtao Zhong, Wilkin Tang and Scott Mahlke
Intl. Journal of Parallel Programming
Vol. 31, No. 6, Dec. 2003, pp. 429-449.
Cost-Sensitive Operation Partitioning for Synthesizing Custom Multicluster Datapath Architectures
(paper: pdf ; slides: ppt)
Michael L. Chu, Kevin C. Fan, Rajiv A. Ravindran and Scott A. Mahlke
Proc. 2nd Workshop on Application Specific Processors (WASP)
Dec. 2003. pp. 40-47.
Processor Acceleration Through Automated Instruction Set Customization
(paper: pdf ; slides: ppt)
Nathan Clark, Hongtao Zhong, and Scott Mahlke
Proc. 36th Intl. Symposium on Microarchitecture (MICRO)
Dec. 2003. pp. 129-140.
Increasing the Number of Effective Registers in a Low-Power Processor Using a Windowed Register File
(paper: pdf ; slides: ppt)
Rajiv A. Ravindran, Robert M. Senger, Eric D. Marsman, Ganesh S. Dasika, Matthew R. Guthaus, Scott A. Mahlke, and Richard B. Brown
Proc. 2003 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2003, pp. 125-136.
Architectural Optimizations for Low-Power, Real-Time Speech Recognition
(paper: pdf ; slides: ppt)
Rajeev Krishna, Scott Mahlke, and Todd Austin
Proc. 2003 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Oct. 2003, pp. 220-231.
Systematic Register Bypass Customization for Application-Specific Processors
(paper: pdf ; slides: ppt)
Kevin Fan, Nathan Clark, Michael Chu, K.V. Manjunath, Rajiv Ravindran, Mikhail Smelyanskiy, and Scott Mahlke.
Proc. IEEE 14th Intl. Conference on Application-Specific Systems, Architectures and Processors (ASAP)
Jun. 2003, pp. 64-74.
Region-based Hierarchical Operation Partitioning for Multicluster Processors
(paper: pdf ; slides: ppt)
Michael Chu, Kevin Fan, and Scott Mahlke.
Proc. ACM SIGPLAN 2003 Conference on Programming Languages Design and Implementation (PLDI)
Jun. 2003, pp. 300-311.
Predicate-Aware Scheduling: A Technique for Reducing Resource Constraints
(paper: pdf ; slides: ppt)
Mikhail Smelyanskiy, Scott A. Mahlke, Edward S. Davidson, and Hsien-Hsin S. Lee.
Proc. 1st Intl. Symposium on Code Generation and Optimization (CGO)
Mar. 2003, pp. 169-178.

2002
Automatically Generating Custom Instruction Set Extensions
(paper: pdf ; slides: ppt)
Nathan Clark, Wilkin Tang, and Scott Mahlke.
Proc. 1st Workshop on Application Specific Processors (WASP)
Nov. 2002, pp. 94-101.
Insights into the Memory Demands of Speech Recognition Algorithms
(paper: pdf; )
Rajeev Krishna, Scott Mahlke, and Todd Austin.
Proc. ACM/IEEE 2nd Workshop on Memory Performance Issues (WMPI)
May 2002.

Theses
Efficient Deep Neural Network Computation on Processors
(paper: pdf)
Jiecao Yu, 2019.
Data Resource Management in Throughput Processors
(paper: pdf)
John Kloosterman, 2018.
Composite Cores: Improving Energy Efficiency Through Fine-Grained Heterogeneity
(paper: pdf)
Andrew Lukefahr, 2016.
Exploiting fine-grain heterogeneity to build energy-efficient processors
(paper: pdf)
Shruti Padmanabha, 2016.
Enabling Efficient Resource Utilization on Multitasking Throughput Processors
(paper: pdf)
Jason Jong Kyu Park, 2016.
Virtualizing Data Parallel Systems for Portability, Productivity, and Performance
(paper: pdf)
Janghaeng Lee, 2015.
Dependable Computing On Inexact Hardware Through Anomaly Detection
(paper: pdf)
Daya S Khudia, 2015.
Dynamic Hardware Resource Management for Efficient Throughput Processing
(paper: pdf)
Ankit Sethia, 2015.
Intelligent Management of Inter-Thread Synchronization Dependencies for Concurrent Programs
(paper: pdf)
Hyoun Kyu Cho, 2014.
Dynamic Orchestration of Massively Data Parallel Execution
(paper: pdf)
Mehrzad Samadi, 2014.
Libra: Achieving Efficient Instruction- and Data- Parallel Execution for Mobile Applications
(paper: pdf)
Yongjun Park, 2013.
Overcoming Hard-Faults in High-Performance Microprocessors
(paper: pdf)
Amin Ansari, 2011.
Power-Efficient Accelerators for High-Performance Applications
(paper: pdf)
Ganesh Dasika, 2011.
Delivering Affordable Fault-tolerance to Commodity Computer Systems
(paper: pdf)
Shuguang Feng, 2011.
Adaptive Architectures for Robust and Efficient Computing
(paper: pdf)
Shantanu Gupta, 2011.
Compiling Stream Applications for Heterogeneous Architectures
(paper: pdf)
Amir Hormati, 2011.
Compiler and Runtime Techniques For Automatic Parallelization of Sequential Applications
(paper: pdf)
Mojtaba Mehrara, 2011.
Polymorphic Pipeline Array: A Flexible Multicore Accelerator for Mobile Multimedia Applications
(paper: pdf)
Hyunchul Park. 2009.
Realizing Software Defined Radio - A Study in Designing Mobile Supercomputers
(paper: pdf)
Yuan Lin. 2008.
Automatic Design of Efficient Application-centric Architectures
(paper: pdf)
Kevin Fan. 2008.
Streamroller : A Unified Compilation and Synthesis Framework for Streaming Applications
(paper: pdf)
Manjunath Kudlur. 2008.
Architectural and Compiler Mechanisms for Accelerating Single Thread Applications on Multicore Processors
(paper: pdf)
Hongtao Zhong. 2008.
Cooperative Data and Computation Partitioning for Decentralized Architectures
(paper: pdf)
Michael Chu. 2007.
Customizing the Computation Capabilities of Microprocessors
(paper: pdf)
Nathan Clark. 2007.
Hardware/Software Techniques for Memory Power Optimizations in Embedded Processors
(paper: pdf)
Rajiv Ravindran. 2007.
Hardware/Software Mechanisms for Increasing Resource Utilization on VLIW/EPIC Processors
(paper: pdf)
Mikhail Smelyanskiy. 2004.

Disclaimer: The documents contained on this page have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Page last modified August 13, 2020.