About Me
I am a Principal Engineer in the Machine Learning research group at Arm Research in Austin, TX. I have been working on model and runtime optimization of Generative AI networks, such as large language models(LLMs) and vision transformer networks, for on-device, hardware-aware deployment. I have worked on developing resource-efficient computer vision (CV) and natural language processing (NLP/NLU)
models, neural network compression, and neural architecture search techniques for already constrained CV and NLP networks executing on highly constrained platforms, such as microcontrollers, and personal mobile devices, etc. In that context, I have developed novel compact model architectures, sub-byte quantization, low-rank matrix factorization, dynamic execution, and pruning techniques. Besides, I contribute to the design of a next-generation neural hardware accelerator and work on developing new instruction definitions and kernel optimizations for high throughput matrix multiplication. My research work has resulted in multiple high-impact publications in top-tier machine learning conferences and workshops (ICCV, MLSys, CVPR, NeurIPS).
Before joining Arm in July 2017, I received a Ph.D. in Electrical Engineering specializing in Computer Architecture, Machine Learning and AI-Assisted Systems Design from University of Wisconsin-Madison in 2017.
During the course of my Ph.D., I conducted original research in designing highly accurate machine learning-guided neural branch prediction for CPU microarchitecture, improving the execution efficiency of PHP scripting language through hardware accelerators and compiler optimizations, and developing efficient memory consistency model for modern processor architecture. My research work has resulted in multiple high-impact publications in top-tier computer architecture conferences (ISCA, MICRO, HPCA, PACT).
I received my Master's Degree (M.S.) in Computer Engineering from Texas A&M University in 2011. During the course of my Master's degree, I conducted original research in design for testing, and dynamic CMOS circuits, designed and implemented a
path delay test generator maximizing crosstalk-induced slowdown in modern digital circuits, and performed a detailed evaluation of a circuit and layout fabric based on dynamic circuits. This research work resulted in publications in top-tier circuit/VLSI design conferences (ICCD, MWSCAS).
I received my Bachelor's Degree (B.E.) in Electrical and Electronics Engineering from Birla Institute of Technology & Science (Pilani), India in 2008.
Research Interests
Machine learning for constrained systems, theory and design of deep neural networks for CV and NLP/NLU applications, neural network compression techniques, neural architecture search, Computer Vision, Natural Language Understanding, neural network kernel optimizations,
AI-optimized processor and system architecture, computer architecture
Eduction
-
Ph.D., Electrical Engineering, University of Wisconsin-Madison, 2017
Minor: Computer Sciences
-
M.S., Computer Engineering, Texas A&M University, College Station, 2011
-
B.E. (Hons.), Electrical and Electronics Engineering, Birla Institute of Technology & Science, Pilani, India, 2008
Industrial Experience
-
Principal Engineer, Machine Learning & AI, Arm, Apr. 2024 - Present
Staff Research Engineer, Machine Learning & AI, Arm Research, Apr. 2021 - Mar. 2024
Senior Research Engineer, Machine Learning & AI, Arm Research, Jul. 2017 - Mar. 2021
-
Co-Op Engineer, AMD Research, Jun. 2015 - Dec. 2015
-
Co-Op Engineer, AMD, May. 2010 - Aug. 2010
-
Design Engineer, Freescale Semiconductor, Jul. 2008 - Jul. 2009
-
Project Intern, Texas Instruments, Jan. 2008 - Jun. 2008
Publications / Patents
-
Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers [Paper]
Natalia Frumkin, Dibakar Gope, Diana Marculescu
International Conference on Computer Vision (ICCV), Oct. 2023.
-
PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices [Paper]
Yuji Chai, Devashree Tripathy, Chuteng Zhou, Dibakar Gope, Igor Fedorov, Ramon Matas, David Brooks, Gu-Yeon Wei, Paul Whatmough
ArXiv, 2023.
-
A Neural Processing Unit for Attention-Based Inference
Shounak Datta, Dibakar Gope, Jesse Beu, and Mark O’Connor
US Patent Application, 2022.
-
Restructurable Activation Networks [Paper]
Kartikeya Bhardwaj, James Ward, Caleb Tung*, Dibakar Gope*, Lingchuan Meng, Igor Fedorov, Alex Chalfin, Paul Whatmough, Danny Loh (* Equal Contribution)
ArXiv, 2022.
-
Collapsible Linear Blocks for Super-Efficient Super Resolution [Paper]
Kartikeya Bhardwaj, Milos Milosavljevic, Liam O'Neil, Dibakar Gope, Ramon Matas, Alex Chalfin, Naveen Suda, Lingchuan Meng, Danny Loh
Fifth Conference on Machine Learning and Systems (MLSys), Aug. 2022.
-
Super-Efficient Super Resolution for Fast Adversarial Defense at the Edge [Paper]
Kartikeya Bhardwaj, Dibakar Gope, James Ward, Paul N. Whatmough, Danny Loh
Special Initiative on Autonomous Systems Design (ASD) in conjunction with Design, Automation & Test in Europe (DATE), Mar. 2022.
-
System and Method for Accelerating Neural Networks
Dibakar Gope, Jesse Beu, and Milos Milosavljevic
US Patent Application, 2021.
-
MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers [Paper]
Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas Navarro, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, Paul N. Whatmough
Fourth Conference on Machine Learning and Systems (MLSys), Apr. 2021.
-
System, Devices and/or Processes for Adapting Neural Network Processing Devices
Urmish Thakker, Jesse Beu, Dibakar Gope, and Mark O’Connor
US Patent Application, 2021.
-
Rank and Run-time aware compression of NLP Applications [Paper]
Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, Matthew Mattina
First Workshop on Simple and Efficient Natural Language Processing in conjunction with Empirical Methods in Natural Language Processing (EMNLP), Nov. 2020.
-
Compressing RNNs to Kilobyte budget for IoT devices using Kronecker Products [Paper]
Urmish Thakker, Jesse Beu, Dibakar Gope, Chu Zhou, Igor Fedorov, Ganesh Dasika, and Matthew Mattina
ACM Journal on Emerging Technologies in Computing Systems, 2021.
-
High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands [arXiv]
Dibakar Gope, Jesse Beu, and Matthew Mattina.
ArXiv, 2020.
-
Understanding the Impact of Dynamic Channel Pruning on Conditionally Parameterized Convolutions [Paper]
Ravi Raju*, Dibakar Gope*, Urmish Thakker, and Jesse Beu (* Equal Contribution)
2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things (AIChallengeIoT) in conjunction with ACM (SenSys), Nov. 2020.
-
Pushing the Envelope of Dynamic Spatial Gating technologies [Paper]
Xueqin Huang, Urmish Thakker, Dibakar Gope, and Jesse Beu
2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things (AIChallengeIoT) in conjunction with ACM (SenSys), Nov. 2020.
-
Ternary MobileNets via Per-Layer Hybrid Filter Banks [Paper] [Supplemental][arXiv]
Dibakar Gope, Jesse Beu, Urmish Thakker, and Matthew Mattina
Joint Workshop on Efficient Deep Learning in Computer Vision in conjunction with (CVPR 2020), Jun. 2020.
-
Aggressive Compression of MobileNets Using Hybrid Ternary Layers [Paper] [Poster]
Dibakar Gope, Jesse Beu, Urmish Thakker, and Matthew Mattina
tinyML Summit 2020, Feb. 2020.
-
Mixed-Element-Size Instruction
Jesse Beu, Dibakar Gope, and David Mansell
US Patent Application, 2020.
-
Mixed-Precision Computation Unit
Dibakar Gope, Jesse Beu, Paul Whatmough, and Matthew Mattina
US Patent Application, 2020.
-
Hybrid Filter Banks for Artificial Neural Networks
Dibakar Gope, Jesse Beu, Paul Whatmough, and Matthew Mattina
US Patent Application, 2020.
-
Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications (Wake Word Detection) [Paper] [Poster]
Dibakar Gope, Ganesh Dasika, and Matthew Mattina
Second Conference on Machine Learning and Systems (MLSys), Mar. 2019.
-
Pushing the Limits of RNN Compression [Paper]
Urmish Thakker, Igor Fedorov, Jesse Beu, Dibakar Gope, Chu Zhou, Ganesh Dasika, and Matthew Mattina
5th Workshop on Energy Efficient Machine Learning and Cognitive Computing, Co-located with the 33rd Conference on Neural Information Processing Systems (NeurIPS), Dec. 2019.
-
Run-Time Efficient RNN Compression for Inference on Edge Devices [Paper]
Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, and Matthew Mattina
4th Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications, Co-located with the 46th Int. Symp on Computer Architecture (ISCA), Jun. 2019.
-
RNN Compression using Hybrid Matrix Decomposition
Urmish Thakker, Ganesh Dasika, Jesse Beu, Dibakar Gope, and Matthew Mattina
tinyML Summit, Mar. 2019.
-
Scoped Persistence Barriers for Non-Volatile Memories
Arkaprava Basu, Mitesh Meswani, Dibakar Gope, and Sooraj Puthoor
US Patent, 2019.
-
A Case for Scoped Persist Barriers in GPUs [Paper]
Dibakar Gope, Arkaprava Basu, Sooraj Puthoor, and Mitesh Meswani
11th Workshop on General Purpose Processing using GPU (GPGPU), In conjunction with Symp. on Principles and Practice of Parallel Programming (PPoPP), Feb. 2018.
-
Apparatus and Method for Bias-Free Branch Prediction
Mikko Lipasti, and Dibakar Gope
US Patent, 2018.
-
The CURE: Cluster Communication Using Registers [Paper]
Vignyan Reddy Kothinti Naresh, Dibakar Gope, and Mikko H. Lipasti
Proceedings of the Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Oct. 2017.
-
Architectural Support for Server-Side PHP Processing [Paper]
Dibakar Gope, David J. Schlais, and Mikko H. Lipasti
Proceedings of the 44th Int. Symp. on Computer Architecture (ISCA), Jun. 2017.
-
Hash Map Inlining [Paper]
Dibakar Gope, and Mikko H. Lipasti
Proceedings of the 25th Int. Conf. on Parallel Architectures and Compilation Techniques (PACT), Sep. 2016.
-
Statement-Level Parallelism for Scripting Languages [Paper]
Dibakar Gope, and Mikko H. Lipasti
1st Workshop on the High Performance Scripting Languages, In conjunction with Symp. on Principles and Practice of Parallel Programming (PPoPP), Feb. 2015.
-
Bias-Free Branch Predictor [Paper]
Dibakar Gope, and Mikko H. Lipasti
Proceedings of the 47th IEEE/ACM Int. Symp. on Microarchitecture (MICRO), Dec. 2014.
-
Bias-Free Neural Predictor [Paper] [Code]
Dibakar Gope, and Mikko H. Lipasti
Proceedings of the 4th JILP Workshop on Computer Architecture Competitions (JWAC-4): Championship Branch Prediction (CBP), Jun. 2014.
-
Atomic SC for Simple In-order Processors [Paper]
Dibakar Gope, and Mikko H. Lipasti
Proceedings of the 20th IEEE Int. Symp. on High Performance Computer Architecture (HPCA), Feb. 2014.
*Nominated for best paper award
-
Maximizing Crosstalk-Induced Slowdown during Path Delay Test [Paper]
Dibakar Gope, and Duncan M. (Hank) Walker
Proceedings of the 30th IEEE Int. Conf. on Computer Design (ICCD), Sep. 2012.
-
Exploring a Circuit Design Approach Based on One-Hot Multi-Valued Domino Logic [Paper]
Dibakar Gope, Kent Lin, and Sunil P. Khatri
Proceedings of the 53rd IEEE Int. Midwest Symp. on Circuits & Systems (MWSCAS), Aug. 2010.
-
Detection of High Resistance Bridge Defects using Slack Based Dynamic Bridging Fault Model [Paper]
Dibakar Gope, Srinivasulu Alampally, Srinivas Kumar Vooka, and Rubin A. Parekhji
Proceedings of the Synopsys Users Group India (SNUG), 2008.
Arxiv Preprints/Technical Report
-
The gem5 Simulator: Version 20.0+ [arXiv]
Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, Rico Amslinger, Matteo Andreozzi, Adrià Armejach, Nils Asmussen, Srikant Bharadwaj, Gabe Black, Gedare Bloom, Bobby R. Bruce, Daniel Rodrigues Carvalho, Jeronimo Castrillon, Lizhong Chen, Nicolas Derumigny, Stephan Diestelhorst, Wendy Elsasser, Marjan Fariborz, Amin Farmahini-Farahani, Pouya Fotouhi, Ryan Gambord, Jayneel Gandhi, Dibakar Gope, Thomas Grass, Bagus Hanindhito, Andreas Hansson, Swapnil Haria, Austin Harris, Timothy Hayes, Adrian Herrera, Matthew Horsnell, Syed Ali Raza Jafri, Radhika Jagtap, Hanhwi Jang, Reiley Jeyapaul, Timothy M. Jones, Matthias Jung, Subash Kannoth, Hamidreza Khaleghzadeh, Yuetsu Kodama, Tushar Krishna, Tommaso Marinelli, Christian Menard, Andrea Mondelli, Tiago Mück, Omar Naji, Krishnendra Nathella, Hoa Nguyen, Nikos Nikoleris, Lena E. Olson, Marc Orr, Binh Pham, Pablo Prieto, Trivikram Reddy, Alec Roelke, Mahyar Samani, Andreas Sandberg, Javier Setoain, Boris Shingarov, Matthew D. Sinclair, Tuan Ta, Rahul Thakur, Giacomo Travaglini, Michael Upton, Nilay Vaish, Ilias Vougioukas, Zhengrong Wang, Norbert Wehn, Christian Weis, David A. Wood, Hongil Yoon, Éder F. Zulian.
Theses
-
Architectural Support for Scripting Languages [PDF]
Ph.D. Dissertation: School of Electrical and Computer Engineering, University of Wisconsin-Madison, Jun. 2017
Advisor: Professor Mikko H. Lipasti
-
Maximizing Crosstalk-Induced Slowdown During Path Delay Test [PDF]
Master's Thesis: School of Electrical and Computer Engineering, Texas A&M University, College Station, Jun. 2011
Advisor: Professor Duncan M. (Hank) Walker
Courses (Machine Learning)
Introduction to Deep Learning, Bayesian Methods for Machine Learning, Practical Reinforcement Learning, Natural Language Processing, Deep Learning in Computer Vision, Neural Networks and Deep Learning, Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization, Convolutional Neural Networks, Sequence Models, Linear Algebra, Multivariate Calculus, Principal Component Analysis
Professional Services
-
Program Committee Member
-
Conference on Machine Learning and Systems (MLSys), 2022
-
IEEE International Symposium on High-Performance Computer Architecture (HPCA) 2023
-
IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2022
-
IEEE International Conference on Computer Design (ICCD) 2021
-
IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT) 2021
-
International Conference on Compilers, Architecture, and Synthesis of Embedded Systems (CASES) 2021
-
ACM International Conference on Computing Frontiers (CF) 2021
-
IEEE International Conference on Computer Design (ICCD) 2020
-
ACM International Conference on Computing Frontiers (CF) 2020
-
IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC) 2020
-
IEEE International Conference on Computer Design (ICCD) 2019
-
ACM International Conference on Computing Frontiers (CF) 2019
-
International Conference on Compilers, Architecture, and Synthesis of Embedded Systems (CASES) 2019
-
IEEE International Symposium on Workload Characterization (IISWC) 2019
-
Arm Research Summit (Arm Research Summit) 2019
-
External Review Committee Member
-
ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2023
-
ACM/IEEE International Symposium on Computer Architecture (ISCA) 2022
-
ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2022
-
IEEE International Symposium on High-Performance Computer Architecture (HPCA) 2022
-
IEEE/ACM International Symposium on Microarchitecture (MICRO) 2021
-
ACM/IEEE International Symposium on Computer Architecture (ISCA) 2021
-
IEEE/ACM International Symposium on Microarchitecture (MICRO) 2020
-
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) 2020
-
ACM/IEEE International Symposium on Computer Architecture (ISCA) 2020
-
ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2020
-
ACM/IEEE International Symposium on Computer Architecture (ISCA) 2019
-
IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT) 2019
-
Journals Review
-
ACM Transactions on Architecture and Code Optimization (TACO) 2020
-
IEEE Computer Architecture Letters (CAL) 2019
-
IEEE Transactions on Computers (TC) 2019
Contact
Linkedin: www.linkedin.com/in/dibakar-gope-89060119