Qi Zhu

Table of Contents

Quick Links

Applied Scientist
Bedrock Core Science
AWS AI/ML Services & Infrastructure
Email: qi.zhu.ckc@gmail.com

Bio

Currently, I am an Applied Scientist at AWS Bedrock, working on data-driven optimization of large language models (LLMs). Over the last two years, I contributed to pioneering AI systems for structured data (GraphStorm, GraphRAG) utilizing structured knowledge for applications in retrieval-augmented generation (RAG), graph machine learning, and beyond.

I obtained my Ph.D. in Computer Science from University of Illinois at Urbana-Champaign advised by Prof. Jiawei Han, where I was a member of Data and Information Systems Laboratory (DAIS) and Data Mining Group. Here is my CV (outdated).

Research

My current and past work focuses on the following themes:

  1. Data-Driven LLM Optimization – Mining massive LLM invocations to guide strategic decisions in LLM deployment—determining when to leverage multimodal representations for long-context processing, and how to efficiently select training data based on real-world usage patterns, etc.
  2. LLMs with Structured Knowledge – Harnessing explicit and implicit data structures to to enhance LLM reasoning capabilities and mitigate hallucinations
  3. Graph Representation Learning – Representing objects in heterogenous text-attributed graph with heterogenous learning, and robust to distribution shift.

I. LLMs with Structured Knowledge

We aim to make LLMs more efficient and resilient against hallucinations by harnessing structured knowledge. A key challenge lies in making the language model structure-aware while mitigating performance bottlenecks, such as the lost-in-the-middle phenomenon, To address this, we explore post-training, fine-tuning, and pre-training techniques on graph structured data. [Structured Knowledge for LLMs, KDD Workshop]

  • Graph Retrieval Augmented Generation: We propose HYBGRAG, an agentic system for hybrid question answering over semi-structured knowledge bases. Unlike prior RAG systems that handle only textual or relational information, HYBGRAG synergizes both through a retriever bank with adaptive module selection and a critic module that provides corrective feedback for iterative refinement. This structure-aware approach addresses the lost-in-the-middle phenomenon by precisely routing questions to appropriate retrieval modules and self-correcting extraction errors.

II. Graph Representation Learning

My research aims to make graph representation learning adapt to distribution shift and data heterogeneity.

Awards

  • 2020 Amazon AWS Machine Learning Research Award
  • 2018 ACM WWW Best Poster Honorable Mention