Home

Eviction Policies

Contents


Predict what future queries will need. Eviction - similar to an online coreset problem Compaction - eviction but with very large chunks Attention matching / STILL – inducing points

Supervised vs RL approach

LoRA 16, Static queries 64

All eviction policies

  • Cache size: 512
  • Chunk size: 16
  • Eviction layers: 11, 12

Attention sinks are a result of the positional encodings. If we remove RoPE perhaps we won’t need the protected chunks for attention sinks.