Deep Delta / Residual Geometry / Grokking

1. Deep Delta Learning

📄 arXiv

2. The Delta Rule (Background)

🔗 Wikipedia

3. Grokking: Generalization Beyond Overfitting

📄 arXiv

4. Why Neural Networks Suddenly Start Generalizing

🔬 OpenAI Research

Nested / Multi-Timescale / Meta Learning

5. Introducing Nested Learning – Google Research Blog

📝 Google Research

6. Learning to Learn by Gradient Descent by Gradient Descent

📄 arXiv

7. Meta-Learning in Neural Networks: A Survey

📄 arXiv

Speculative Decoding / Draft Models

8. Accelerating Large Language Model Decoding with Speculative Sampling

📄 arXiv

9. Speculative Decoding with Draft Models (Original Paper)

📄 arXiv

10. SpecExtend: Scaling Speculative Decoding to Long Contexts

📄 arXiv

11. Dynamic Depth Decoding for Efficient LLM Inference

📄 arXiv

EAGLE / Advanced Speculative Heads

12. EAGLE-3: Efficient Accelerated Generation via Learned Drafting

📄 arXiv

13. From Research to Production: Accelerate OSS LLMs with EAGLE-3 on Vertex AI

☁️ Google Cloud Blog

Long Context / Memory / RLM-Style Ideas

14. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

📄 arXiv

15. LongNet: Scaling Transformers to 1M Tokens

📄 arXiv

N-grams / DeepSeek / Classical Foundations

16. A Tutorial on N-gram Language Models

📚 Stanford PDF

17. DeepSeek-R1: Incentivizing Reasoning Capability via Reinforcement Learning

📄 arXiv

Optimizer Discovery / RL for Training Rules

18. Learning to Optimize

📄 arXiv

19. Discovering Optimization Algorithms via Reinforcement Learning

📄 arXiv

Data / Datasets / Web Corpora

20. FineWeb: A New Large-Scale Web Dataset

🤗 HuggingFace

21. The Common Crawl Dataset

🌐 CommonCrawl

22. The Pile: An 800GB Dataset of Diverse Text

📄 arXiv

23. NVIDIA NeMo Data Curation Overview

📖 NVIDIA Docs

Flash / Systems / Acceleration

24. dFlash: Fast and Accurate LLM Decoding

⚡ Z-Lab

Articles from the Archive You Won’t Regret Reading

How to Time Travel

Brian Chesky

📝 Medium

High Agency

George Mack

🔗 Website

How to Get Lucky

Taylor Pearson

📝 Blog

Childhoods of Exceptional People

Henrik Karlsson

📝 Substack

Earlier Readings

Attention is all you need

Seminal paper on transformers

📄 arXiv

The Curse of Dimensionality

Long time pending from my uni time

📄 PDF