Practice · Foundational papers

Implement classic papers from scratch

Pick any paper below and PaperNova generates a guided workbook: an outline, exercise-wise explanations from beginner to advanced, and a downloadable Jupyter notebook you can run locally.

Upload your own paper →How the workbook works

LLM Fine-tuning

2021

INTERMEDIATEFeatured

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Yelong Shen, Phillip Wallis, +5 more

LoRA. Inject low-rank matrices into frozen pretrained weights for cheap, effective fine-tuning — the backbone of most open-source LLM adaptation today.

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, +1 more

CLIP. Contrastive pretraining on 400M image-text pairs produces a zero-shot classifier that rivals supervised ImageNet models — a cornerstone of multimodal learning.

Diffusion Models Beat GANs on Image Synthesis

Prafulla Dhariwal, Alex Nichol

Diffusion models surpass GANs on high-fidelity image synthesis — the bridge to Stable Diffusion, DALL·E and the modern image generation era.

Start workbook →

Large Language Models

2020

ADVANCEDFeatured

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, Nick Ryder, +3 more

GPT-3. Shows that scale alone unlocks in-context learning — a 175B parameter LM can tackle new tasks from a handful of examples in the prompt.

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, +3 more

Vision Transformer (ViT). Applies a pure Transformer to image patches and matches CNNs on ImageNet at scale — the paper that unified vision and language architectures.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, +1 more

Bidirectional masked-language modelling that reshaped NLP benchmarks and set the pretraining-then-finetuning pattern for years to come.

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, +5 more

The foundational Transformer paper. Introduces multi-head self-attention and dispenses with recurrence and convolutions — the blueprint behind every modern large language model.

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, +1 more

ResNet. Residual connections made it possible to train networks with hundreds of layers and became standard plumbing for nearly every deep architecture since.

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

AlexNet. The paper that kicked off the modern deep-learning era in computer vision by winning ImageNet 2012 with a convolutional neural network trained on GPUs.

Start workbook →

Large Language Models

2023

INTERMEDIATE

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, +1 more

LLaMA. Open-weights foundation model family that matched or beat GPT-3 at a fraction of the parameters and catalysed the open-source LLM ecosystem.

Robust Speech Recognition via Large-Scale Weak Supervision

Alec Radford, Jong Wook Kim, Tao Xu, +3 more

Whisper. An encoder-decoder Transformer trained on 680k hours of multilingual weakly-supervised audio — near-human speech recognition, zero-shot.

Start workbook →

Large Language Models

2022

BEGINNER

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Jason Wei, Xuezhi Wang, Dale Schuurmans, +3 more

Chain-of-Thought. A few worked-example prompts dramatically improve LLM reasoning on arithmetic, commonsense and symbolic tasks — zero training required.

Start workbook →

Self-Supervised Learning

2021

INTERMEDIATE

Masked Autoencoders Are Scalable Vision Learners

Kaiming He, Xinlei Chen, Saining Xie, +3 more

MAE. Mask 75% of image patches and reconstruct them — a BERT-style objective that yields strong, scalable vision representations.

Start workbook →

Self-Supervised Learning

2020

INTERMEDIATE

A Simple Framework for Contrastive Learning of Visual Representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, +1 more

SimCLR. A clean, effective contrastive framework that learns visual representations without labels — closing much of the gap with supervised pretraining.

Denoising Diffusion Probabilistic Models

Jonathan Ho, Ajay Jain, Pieter Abbeel

DDPM. The paper that made diffusion models practical — a simple denoising objective that scales to photorealistic generation.

Start workbook →

Reinforcement Learning

2017

INTERMEDIATE

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, +2 more

PPO. A clipped-surrogate policy-gradient method that balances stability and simplicity — the default RL algorithm behind RLHF and most modern agents.

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N. Kipf, Max Welling

GCN. A first-order approximation of spectral graph convolutions that made graph neural networks simple, fast, and widely applicable.

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

XGBoost. An engineering-heavy gradient boosted trees framework that dominated Kaggle and production ML for years with regularised learning objectives.

You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon, Santosh Divvala, Ross Girshick, +1 more

YOLO. Single-shot object detection that frames detection as a regression problem — fast, end-to-end, and the backbone of every real-time vision system since.

Start workbook →

Reinforcement Learning

2016

ADVANCED

Mastering the Game of Go with Deep Neural Networks and Tree Search

David Silver, Aja Huang, Chris J. Maddison, +1 more

AlphaGo. Combines deep policy/value networks with Monte Carlo tree search to beat the world champion — a landmark demonstration of RL at scale.

WaveNet: A Generative Model for Raw Audio

Aaron van den Oord, Sander Dieleman, Heiga Zen, +1 more

WaveNet. An autoregressive convolutional model over raw audio samples that lifted text-to-speech naturalness to near-human levels.

Start workbook →

Training & Optimization

2015

BEGINNER

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

Batch Normalization. Normalising activations per-batch stabilises and speeds up training — a now-standard building block in deep networks.

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, Philipp Fischer, Thomas Brox

U-Net. Encoder-decoder with skip connections that became the default architecture for medical imaging and any dense-prediction task on small datasets.

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, Jimmy Ba

Adam. Adaptive first-order optimizer that became the default for almost every deep learning codebase — a must-implement from scratch.

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, +2 more

Dropout. Randomly zeroing units during training as an implicit ensemble — one of the simplest and most effective regularizers in deep learning.

Generative Adversarial Networks

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, +5 more

The original GAN paper. A generator and a discriminator locked in a minimax game — an elegant framing that opened an entire subfield of generative modelling.

Auto-Encoding Variational Bayes

Diederik P. Kingma, Max Welling

VAE. The variational autoencoder — a principled probabilistic generative model with a learned latent space and the reparameterisation trick.

Start workbook →

Reinforcement Learning

2013

INTERMEDIATE

Playing Atari with Deep Reinforcement Learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, +2 more

DQN. Deep Q-Networks learn to play Atari games from raw pixels, kickstarting the deep reinforcement learning era.

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, Kai Chen, Greg Corrado, +1 more

Word2Vec. Skip-gram and CBOW turned words into dense vectors whose geometry encodes meaning — the bridge between symbolic text and modern deep learning.

Long Short-Term Memory

Sepp Hochreiter, Jürgen Schmidhuber

LSTM. The gated recurrent unit that solved the vanishing gradient problem and ran sequence modelling for two decades.

Start workbook →

Why implement classic papers?

Reading a paper and implementing it are two very different skills. PaperNova's workbook tool bridges that gap: Gemini turns the paper into a sequence of small, self-contained exercises — from a warm-up reimplementation of the core idea up to advanced extensions — then assembles them into a Jupyter notebook you can run, edit and extend.

Prefer to work from your own paper? Upload a PDF and get the same guided workbook tailored to it.