Research

kv-compression-1

KV Cache Compression

  • Institution: The Hong Kong Polytechnic University
  • Category: Machine Learning
  • Highlights: Compression
  • Term: Ongoing

Description:

Attempting to integrate a new lossy compressor written in CUDA to compress cached Key Value data within the GPU architecture and reduce its size within the vllm library.
Link to vllm library: vllm - LLM interfacing & servicing