Papers

I had been colllecting and reading academic papers over the years. I tried to pick a few of the highlights. Here they are. Not a random internet search. These are the ideas that are actually shaping AI -the efficiency breakthroughs, and the moments where someone said "what if we did this completely differently?" and it worked.

There is an idea that runs through all of them. One paper introduced the architecture that powers every major AI today. Another proved that training smarter matters more than building bigger. A team cut off from the best hardware built one of the world's strongest models for a fraction of the cost — because they had to. A model learned to reason on its own, with no one showing it how. And someone asked whether the dominant architecture is even the right one.

The pattern isn't brute force. It's elegance — doing more with less, rethinking assumptions, and finding leverage where others see limitations. That's the story these papers tell.

Academic papers can be brutal to get through if you're not directly in the field — dense notation, assumed context, and writing that seems designed to keep people out rather than invite them in. Each summary here breaks down the big idea in plain language, walks through the technical architecture so you actually understand what they built, covers the key results and why they matter, and gives you an honest look at the limitations.

These are a solid way to get up to speed, but nothing beats sitting down with the source paper itself. Every summary includes a direct link to the original — I'd encourage you to read the ones that grab you.

2026

The Dot Product — A One-Page Primer

A plain-English explanation of the dot product, the single most important operation in modern AI.

Original →

Softmax — A One-Page Primer

A plain-English explanation of the softmax function, the operation that turns raw scores into probabilities in AI.

Original →

Why GPUs? A Primer on the Hardware That Makes AI Possible

Why your graphics card is the most important piece of hardware in AI — and what to look for if you want to run models yourself.

Original →

Conditional Memory via Scalable Lookup (Engram)

What if a model could remember patterns it's seen before — instantly, without thinking about it?

→

2025

VL-JEPA: Joint Embedding Predictive Architecture for Vision-Language

AI that doesn't just read — it sees and understands the world the way we do.

→

DeepSeek-R1: Incentivizing Reasoning via RL

What happens when you let an AI figure out how to reason on its own, with no examples to follow?

→

2024

DeepSeek-V3 Technical Report

A Chinese lab built one of the world's best AI models for a fraction of the cost — because they had to.

→

2023

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

What if AI didn't have to re-read everything it's ever seen just to write the next word?

→

2022

Training Compute-Optimal Large Language Models (Chinchilla)

Bigger isn't always better. Training on more data matters just as much as building a bigger model.

→

2017

Attention Is All You Need

The paper that started it all. One architecture replaced everything before it — and still powers every major AI today.

→