Vic Shihang Li

Research

NEMO: Flexible, High-Fidelity Memory Telemetry

Modern servers use complex memory hierarchies that require active OS management, but the OS lacks low-overhead and accurate visibility into how memory is actually used, resulting in poor policy decisions. Existing approaches either rely on software that is flexible but has high overhead, or hardware counters that are efficient but too coarse and inflexible. Our system breaks this trade-off with HW/SW co-design. It introduces a small, programmable monitoring mechanism inside memory controllers that lets the OS directly track memory activities of interest, achieving both accuracy and flexibility with low overhead.

Self-Defining Systems

Self-Defining Systems (SDS) address a fundamental mismatch in modern infrastructure: hardware, workloads, and models evolve rapidly, but systems software still takes months or years to design, integrate, and optimize, limiting our ability to fully exploit datacenter investments. SDS envisions AI-native systems that continuously define, build, deploy, and evolve themselves end-to-end, using coordinated LLM agents to explore and validate designs across specification, architecture, algorithms, and code while closing the loop with production feedback. As initial evidence, our agent-driven prototypes have autonomously built and optimized complex systems—including LLM inference runtimes and applications with 40+ microservices—achieving order-of-magnitude speedups and near expert-level performance while reducing integration time from weeks to days.

FlowGuard: Information Flow Control for Coding Agents

Coding agents powered by large language models are increasingly being used to automate software development tasks. However, these agents can inadvertently leak sensitive information or execute unauthorized operations when interacting with external systems. FlowGuard addresses these security concerns by applying information flow control to coding agents, ensuring that sensitive data and operations are properly isolated and controlled throughout the agent's execution.

Masa: Microservice SLO Management

Microservice applications face challenges in maintaining consistent performance and meeting service-level objectives (SLOs) across services. Traditional approaches often optimize individual services in isolation, leading to suboptimal end-to-end performance. Masa improves microservice goodput by coordinating SLO enforcement across service boundaries, enabling end-to-end SLO guarantees that account for the complex interactions between microservices.

Loom: Efficient Telemetry for Production Systems

Debugging production systems today requires a difficult trade-off between how much data is collected and how quickly it can be queried: indexing for fast queries often slows ingestion and forces data to be dropped. Furthermore, data collection must not slow down the application being monitored. Loom breaks this trade-off by showing that fast queries do not require indexing every record: coarse summaries over small chunks are sufficient. With a small CPU and memory footprint in production, Loom minimizes application interference.

Quicksand: Unstrand Resources with Granular Computing

Resource stranding causes substantial underutilization in datacenters and is commonly addressed through hardware-based disaggregation on emerging hardware. Quicksand argues that stranded resources can be addressed on hardware today via a new programming framework layer. Its key insight is to decompose applications into fine-grained, resource-specific units that can be scheduled onto stranded resources. Quicksand performs this decomposition transparently and dynamically adjusts it at millisecond timescales to adapt to changing resource availability.