Dibakar Gope's Webpage

Dibakar Gope

Principal Engineer, Technical Lead
Machine Learning & AI
Arm Inc., USA

Google Scholar
Linkedin

About Me

I am a Principal Engineer and Technical Lead in the Machine Learning group at Arm Inc. I have been working on model optimization, algorithm research, and runtime optimization for Generative AI networks such as large language models (LLMs), text-to-image generation diffusion models, and vision transformer networks for on-device, hardware-aware deployment. I have worked on developing resource-efficient computer vision (CV) and natural language processing (NLP/NLU) models, neural network compression, and neural architecture search techniques for already constrained CV and NLP networks executing on highly constrained platforms, such as microcontrollers, and personal mobile devices, etc. In that context, I have developed novel compact model architectures, sub-byte quantization, low-rank matrix factorization, dynamic execution, and pruning techniques. Besides, I contribute to the design of a next-generation neural hardware accelerator and work on developing new instruction definitions and kernel optimizations for high-throughput matrix multiplication. My research work has resulted in multiple high-impact publications in top-tier machine learning conferences and workshops (CVPR, ICCV, MLSys, NeurIPS). My recent work on generative AI algorithms, runtime optimizations, and associated software development helps to demonstrate the full potential of Arm CPUs and other IPs for LLMs and text-to-image generation models.

Before joining Arm in July 2017, I received a Ph.D. in Electrical Engineering specializing in Machine Learning, Computer Architecture, and AI-Assisted Systems Design from the University of Wisconsin-Madison in 2017. During the course of my Ph.D., I conducted original research in designing highly accurate machine learning-guided neural branch prediction for CPU microarchitecture, improving the execution efficiency of PHP scripting language through hardware accelerators and compiler optimizations, and developing efficient memory consistency models for modern processor architecture. My research work has resulted in multiple high-impact publications in top-tier conferences. I received my Master's Degree (M.S.) in Computer Engineering from Texas A&M University. During the course of my Master's degree, I conducted original research in design for testing algorithms, and dynamic CMOS circuits. This research work resulted in publications in top-tier conferences. I received my Bachelor's Degree (B.E.) in Electrical and Electronics Engineering from the Birla Institute of Technology & Science (Pilani), India.

Research Interests

Generative AI, Machine learning for constrained systems, theory and design of deep neural networks for CV and NLP/NLU applications, neural network compression techniques, neural architecture search, Computer Vision, Natural Language Understanding, neural network kernel optimizations, AI-optimized processor and system architecture, computer architecture

Eduction

Ph.D., Electrical Engineering, University of Wisconsin-Madison, 2017
Minor: Computer Sciences

M.S., Computer Engineering, Texas A&M University, College Station, 2011

B.E. (Hons.), Electrical and Electronics Engineering, Birla Institute of Technology & Science, Pilani, India, 2008

Industrial Experience

Principal Engineer, Machine Learning & AI, Arm, Apr. 2024 - Present
Staff Research Engineer, Machine Learning & AI, Arm Research, Apr. 2021 - Mar. 2024
Senior Research Engineer, Machine Learning & AI, Arm Research, Jul. 2017 - Mar. 2021

Co-Op Engineer, AMD Research, Jun. 2015 - Dec. 2015

Co-Op Engineer, AMD, May. 2010 - Aug. 2010

Design Engineer, Freescale Semiconductor, Jul. 2008 - Jul. 2009

Project Intern, Texas Instruments, Jan. 2008 - Jun. 2008

Publications / Patents

Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales (for Text-to-Image Diffusion Models) [Paper]
Shuokai Pan, Gerti Tuzi, Sudarshan Sreeram, and Dibakar Gope
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2025.

Highly Optimized Kernels and Fine-Grained Codebooks for LLM Inference on Arm CPUs [Paper]
Dibakar Gope, David Mansell, Danny Loh, and Ian Bratt
ArXiv, 2024.

Methods and Processing Elements for Compressing and Decompressing Neural Network Weights
Dibakar Gope, David Mansell, Danny Loh, and Ian Bratt
US Patent Application, 2024.

Quantized Winograd Convolution
Shuokai Pan, Gerti Tuzi, and Dibakar Gope
US Patent Application, 2024.

Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers [Paper]
Natalia Frumkin, Dibakar Gope, Diana Marculescu
International Conference on Computer Vision (ICCV), Oct. 2023.

System, Devices and/or Processes for Executing A Neural Network Architecture Search
Gerti Tuzi, and Dibakar Gope
US Patent Application, 2023.

PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices [Paper]
Yuji Chai, Devashree Tripathy, Chuteng Zhou, Dibakar Gope, Igor Fedorov, Ramon Matas, David Brooks, Gu-Yeon Wei, Paul Whatmough
ArXiv, 2023.

A Neural Processing Unit for Attention-Based Inference
Shounak Datta, Dibakar Gope, Jesse Beu, and Mark O’Connor
US Patent Application, 2022.

Restructurable Activation Networks [Paper]
Kartikeya Bhardwaj, James Ward, Caleb Tung*, Dibakar Gope*, Lingchuan Meng, Igor Fedorov, Alex Chalfin, Paul Whatmough, Danny Loh (* Equal Contribution)
ArXiv, 2022.

Collapsible Linear Blocks for Super-Efficient Super Resolution [Paper]
Kartikeya Bhardwaj, Milos Milosavljevic, Liam O'Neil, Dibakar Gope, Ramon Matas, Alex Chalfin, Naveen Suda, Lingchuan Meng, Danny Loh
Fifth Conference on Machine Learning and Systems (MLSys), Aug. 2022.

Super-Efficient Super Resolution for Fast Adversarial Defense at the Edge [Paper]
Kartikeya Bhardwaj, Dibakar Gope, James Ward, Paul N. Whatmough, Danny Loh
Special Initiative on Autonomous Systems Design (ASD) in conjunction with Design, Automation & Test in Europe (DATE), Mar. 2022.

System and Method for Accelerating Neural Networks
Dibakar Gope, Jesse Beu, and Milos Milosavljevic
US Patent Application, 2021.

MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers [Paper]
Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas Navarro, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, Paul N. Whatmough
Fourth Conference on Machine Learning and Systems (MLSys), Apr. 2021.

System, Devices and/or Processes for Adapting Neural Network Processing Devices
Urmish Thakker, Jesse Beu, Dibakar Gope, and Mark O’Connor
US Patent Application, 2021.

Compressing RNNs to Kilobyte budget for IoT devices using Kronecker Products [Paper]
Urmish Thakker, Jesse Beu, Dibakar Gope, Chu Zhou, Igor Fedorov, Ganesh Dasika, and Matthew Mattina
ACM Journal on Emerging Technologies in Computing Systems, 2021.

Rank and Run-time aware compression of NLP Applications [Paper]
Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, Matthew Mattina
First Workshop on Simple and Efficient Natural Language Processing in conjunction with Empirical Methods in Natural Language Processing (EMNLP), Nov. 2020.

Ternary MobileNets via Per-Layer Hybrid Filter Banks [Paper] [Supplemental][arXiv]
Dibakar Gope, Jesse Beu, Urmish Thakker, and Matthew Mattina
Joint Workshop on Efficient Deep Learning in Computer Vision in conjunction with (CVPR 2020), Jun. 2020.

Understanding the Impact of Dynamic Channel Pruning on Conditionally Parameterized Convolutions [Paper]
Ravi Raju*, Dibakar Gope*, Urmish Thakker, and Jesse Beu (* Equal Contribution)
2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things (AIChallengeIoT) in conjunction with ACM (SenSys), Nov. 2020.

Pushing the Envelope of Dynamic Spatial Gating technologies [Paper]
Xueqin Huang, Urmish Thakker, Dibakar Gope, and Jesse Beu
2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things (AIChallengeIoT) in conjunction with ACM (SenSys), Nov. 2020.

High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands [arXiv]
Dibakar Gope, Jesse Beu, and Matthew Mattina.
ArXiv, 2020.

Aggressive Compression of MobileNets Using Hybrid Ternary Layers [Paper] [Poster]
Dibakar Gope, Jesse Beu, Urmish Thakker, and Matthew Mattina
tinyML Summit 2020, Feb. 2020.

Mixed-Element-Size Instruction
Jesse Beu, Dibakar Gope, and David Mansell
US Patent Application, 2020.

Mixed-Precision Computation Unit
Dibakar Gope, Jesse Beu, Paul Whatmough, and Matthew Mattina
US Patent Application, 2020.

Hybrid Filter Banks for Artificial Neural Networks
Dibakar Gope, Jesse Beu, Paul Whatmough, and Matthew Mattina
US Patent Application, 2020.

Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications (Wake Word Detection) [Paper] [Poster]
Dibakar Gope, Ganesh Dasika, and Matthew Mattina
Second Conference on Machine Learning and Systems (MLSys), Mar. 2019.

Pushing the Limits of RNN Compression [Paper]
Urmish Thakker, Igor Fedorov, Jesse Beu, Dibakar Gope, Chu Zhou, Ganesh Dasika, and Matthew Mattina
5th Workshop on Energy Efficient Machine Learning and Cognitive Computing, Co-located with the 33rd Conference on Neural Information Processing Systems (NeurIPS), Dec. 2019.

Run-Time Efficient RNN Compression for Inference on Edge Devices [Paper]
Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, and Matthew Mattina
4th Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications, Co-located with the 46th Int. Symp on Computer Architecture (ISCA), Jun. 2019.

RNN Compression using Hybrid Matrix Decomposition
Urmish Thakker, Ganesh Dasika, Jesse Beu, Dibakar Gope, and Matthew Mattina
tinyML Summit, Mar. 2019.

Scoped Persistence Barriers for Non-Volatile Memories
Arkaprava Basu, Mitesh Meswani, Dibakar Gope, and Sooraj Puthoor
US Patent, 2019.

A Case for Scoped Persist Barriers in GPUs [Paper]
Dibakar Gope, Arkaprava Basu, Sooraj Puthoor, and Mitesh Meswani
11th Workshop on General Purpose Processing using GPU (GPGPU), In conjunction with Symp. on Principles and Practice of Parallel Programming (PPoPP), Feb. 2018.

Apparatus and Method for Bias-Free Branch Prediction
Mikko Lipasti, and Dibakar Gope
US Patent, 2018.

The CURE: Cluster Communication Using Registers [Paper]
Vignyan Reddy Kothinti Naresh, Dibakar Gope, and Mikko H. Lipasti
Proceedings of the Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Oct. 2017.

Architectural Support for Server-Side PHP Processing [Paper]
Dibakar Gope, David J. Schlais, and Mikko H. Lipasti
Proceedings of the 44th Int. Symp. on Computer Architecture (ISCA), Jun. 2017.

Hash Map Inlining [Paper]
Dibakar Gope, and Mikko H. Lipasti
Proceedings of the 25th Int. Conf. on Parallel Architectures and Compilation Techniques (PACT), Sep. 2016.

Statement-Level Parallelism for Scripting Languages [Paper]
Dibakar Gope, and Mikko H. Lipasti
1st Workshop on the High Performance Scripting Languages, In conjunction with Symp. on Principles and Practice of Parallel Programming (PPoPP), Feb. 2015.

Bias-Free Branch Predictor [Paper]
Dibakar Gope, and Mikko H. Lipasti
Proceedings of the 47th IEEE/ACM Int. Symp. on Microarchitecture (MICRO), Dec. 2014.

Bias-Free Neural Predictor [Paper] [Code]
Dibakar Gope, and Mikko H. Lipasti
Proceedings of the 4th JILP Workshop on Computer Architecture Competitions (JWAC-4): Championship Branch Prediction (CBP), Jun. 2014.

Atomic SC for Simple In-order Processors [Paper]
Dibakar Gope, and Mikko H. Lipasti
Proceedings of the 20th IEEE Int. Symp. on High Performance Computer Architecture (HPCA), Feb. 2014.
*Nominated for best paper award

Maximizing Crosstalk-Induced Slowdown during Path Delay Test [Paper]
Dibakar Gope, and Duncan M. (Hank) Walker
Proceedings of the 30th IEEE Int. Conf. on Computer Design (ICCD), Sep. 2012.

Exploring a Circuit Design Approach Based on One-Hot Multi-Valued Domino Logic [Paper]
Dibakar Gope, Kent Lin, and Sunil P. Khatri
Proceedings of the 53rd IEEE Int. Midwest Symp. on Circuits & Systems (MWSCAS), Aug. 2010.

Detection of High Resistance Bridge Defects using Slack Based Dynamic Bridging Fault Model [Paper]
Dibakar Gope, Srinivasulu Alampally, Srinivas Kumar Vooka, and Rubin A. Parekhji
Proceedings of the Synopsys Users Group India (SNUG), 2008.

The gem5 Simulator: Version 20.0+ [ArXiv]
Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, Rico Amslinger, Matteo Andreozzi, Adrià Armejach, Nils Asmussen, Srikant Bharadwaj, Gabe Black, Gedare Bloom, Bobby R. Bruce, Daniel Rodrigues Carvalho, Jeronimo Castrillon, Lizhong Chen, Nicolas Derumigny, Stephan Diestelhorst, Wendy Elsasser, Marjan Fariborz, Amin Farmahini-Farahani, Pouya Fotouhi, Ryan Gambord, Jayneel Gandhi, Dibakar Gope, Thomas Grass, Bagus Hanindhito, Andreas Hansson, Swapnil Haria, Austin Harris, Timothy Hayes, Adrian Herrera, Matthew Horsnell, Syed Ali Raza Jafri, Radhika Jagtap, Hanhwi Jang, Reiley Jeyapaul, Timothy M. Jones, Matthias Jung, Subash Kannoth, Hamidreza Khaleghzadeh, Yuetsu Kodama, Tushar Krishna, Tommaso Marinelli, Christian Menard, Andrea Mondelli, Tiago Mück, Omar Naji, Krishnendra Nathella, Hoa Nguyen, Nikos Nikoleris, Lena E. Olson, Marc Orr, Binh Pham, Pablo Prieto, Trivikram Reddy, Alec Roelke, Mahyar Samani, Andreas Sandberg, Javier Setoain, Boris Shingarov, Matthew D. Sinclair, Tuan Ta, Rahul Thakur, Giacomo Travaglini, Michael Upton, Nilay Vaish, Ilias Vougioukas, Zhengrong Wang, Norbert Wehn, Christian Weis, David A. Wood, Hongil Yoon, Éder F. Zulian.
ArXiv, 2020.

Courses (Machine Learning)

Introduction to Deep Learning, Bayesian Methods for Machine Learning, Practical Reinforcement Learning, Natural Language Processing, Deep Learning in Computer Vision, Neural Networks and Deep Learning, , Convolutional Neural Networks, Sequence Models, Linear Algebra, Multivariate Calculus, Principal Component Analysis