Hojoon Leo Kim
Undergraduate Student, Electrical and Computer Engineering, Seoul National University
hojoon.kim@snu.ac.kr
37, Samseong-ro 51-gil
Gangnam-gu, Seoul 06280
Republic of Korea
Hello, my name is Hojoon Kim, and I also go by Leo.
As an undergraduate researcher passionate about advancing ML systems through hardware-software co-design, I explore how traditional computer architecture principles such as caching, branch prediction, and multiprocessing can address inefficiencies in modern ML workloads. My work spans low-bit quantization, storage-assisted inference, and cache-driven planning for embodied AI agents, with publications at OSDI'25, ICML'25 (Spotlight), and submissions to MLSys'26. I aim to develop practical system architectures that make next-generation ML applications both efficient and deployable at scale by rethinking system abstractions across the entire computing stack.
When I’m not deep in code, you can probably find me on the tennis court🎾!
News
| Nov, 2025 | Our team won the Grand Prize (1st Place) at the 2025 AI Chip Contest - NPU Optimization Track hosted by the Ministry of Science and ICT (MSIT), Republic of Korea, receiving a prize of KRW 10,000,000. |
|---|---|
| Nov, 2025 | Our work on AgenticCache, a cache-driven asynchronous planning system for embodied AI agents, has been submitted to MLSys'26! |
| Nov, 2025 | Our work on QUESO, a storage-assisted quantization error compensation method for on-device LLM inference, has been submitted to MLSys'26! |
| May, 2025 | I am working with Seonghoon Seo on optimizing the prefill stage of large language model (LLM) inference, with a focus on reducing memory and compute overhead during initial context processing. Our work is targeted for submission to NeurIPS2025. |
| May, 2025 | Our DecDEC has been accepted to OSDI'25! |
| May, 2025 | Excited to share that our ICML'25 submission, FlashTP, is currently under consideration for a Spotlight or Oral presentation. Also, it has already been successfully deployed in real-world industrial applications. |
Selected Publications
* indicates equal contribution
- MLSys’26 (In Submission)
AgenticCache: Cache-Driven Asynchronous Planning for Embodied AI Agents2026In submission - MLSys’26 (In Submission)
QUESO: Storage-Assisted Quantization Error Compensation for On-Device LLM Inference2026In submission