Implement classic papers from scratch
Pick any paper below and PaperNova generates a guided workbook: an outline, exercise-wise explanations from beginner to advanced, and a downloadable Jupyter notebook you can run locally.
LoRA: Low-Rank Adaptation of Large Language Models
Edward J. Hu, Yelong Shen, Phillip Wallis, +5 more
LoRA. Inject low-rank matrices into frozen pretrained weights for cheap, effective fine-tuning — the backbone of most open-source LLM adaptation today.
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, +1 more
CLIP. Contrastive pretraining on 400M image-text pairs produces a zero-shot classifier that rivals supervised ImageNet models — a cornerstone of multimodal learning.
Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal, Alex Nichol
Diffusion models surpass GANs on high-fidelity image synthesis — the bridge to Stable Diffusion, DALL·E and the modern image generation era.
Language Models are Few-Shot Learners
Tom B. Brown, Benjamin Mann, Nick Ryder, +3 more
GPT-3. Shows that scale alone unlocks in-context learning — a 175B parameter LM can tackle new tasks from a handful of examples in the prompt.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, +3 more
Vision Transformer (ViT). Applies a pure Transformer to image patches and matches CNNs on ImageNet at scale — the paper that unified vision and language architectures.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, +1 more
Bidirectional masked-language modelling that reshaped NLP benchmarks and set the pretraining-then-finetuning pattern for years to come.
Attention Is All You Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, +5 more
The foundational Transformer paper. Introduces multi-head self-attention and dispenses with recurrence and convolutions — the blueprint behind every modern large language model.
Deep Residual Learning for Image Recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, +1 more
ResNet. Residual connections made it possible to train networks with hundreds of layers and became standard plumbing for nearly every deep architecture since.
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton
AlexNet. The paper that kicked off the modern deep-learning era in computer vision by winning ImageNet 2012 with a convolutional neural network trained on GPUs.
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, +1 more
LLaMA. Open-weights foundation model family that matched or beat GPT-3 at a fraction of the parameters and catalysed the open-source LLM ecosystem.
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford, Jong Wook Kim, Tao Xu, +3 more
Whisper. An encoder-decoder Transformer trained on 680k hours of multilingual weakly-supervised audio — near-human speech recognition, zero-shot.
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason Wei, Xuezhi Wang, Dale Schuurmans, +3 more
Chain-of-Thought. A few worked-example prompts dramatically improve LLM reasoning on arithmetic, commonsense and symbolic tasks — zero training required.
Masked Autoencoders Are Scalable Vision Learners
Kaiming He, Xinlei Chen, Saining Xie, +3 more
MAE. Mask 75% of image patches and reconstruct them — a BERT-style objective that yields strong, scalable vision representations.
A Simple Framework for Contrastive Learning of Visual Representations
Ting Chen, Simon Kornblith, Mohammad Norouzi, +1 more
SimCLR. A clean, effective contrastive framework that learns visual representations without labels — closing much of the gap with supervised pretraining.
Denoising Diffusion Probabilistic Models
Jonathan Ho, Ajay Jain, Pieter Abbeel
DDPM. The paper that made diffusion models practical — a simple denoising objective that scales to photorealistic generation.
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, +2 more
PPO. A clipped-surrogate policy-gradient method that balances stability and simplicity — the default RL algorithm behind RLHF and most modern agents.
Semi-Supervised Classification with Graph Convolutional Networks
Thomas N. Kipf, Max Welling
GCN. A first-order approximation of spectral graph convolutions that made graph neural networks simple, fast, and widely applicable.
XGBoost: A Scalable Tree Boosting System
Tianqi Chen, Carlos Guestrin
XGBoost. An engineering-heavy gradient boosted trees framework that dominated Kaggle and production ML for years with regularised learning objectives.
You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon, Santosh Divvala, Ross Girshick, +1 more
YOLO. Single-shot object detection that frames detection as a regression problem — fast, end-to-end, and the backbone of every real-time vision system since.
Mastering the Game of Go with Deep Neural Networks and Tree Search
David Silver, Aja Huang, Chris J. Maddison, +1 more
AlphaGo. Combines deep policy/value networks with Monte Carlo tree search to beat the world champion — a landmark demonstration of RL at scale.
WaveNet: A Generative Model for Raw Audio
Aaron van den Oord, Sander Dieleman, Heiga Zen, +1 more
WaveNet. An autoregressive convolutional model over raw audio samples that lifted text-to-speech naturalness to near-human levels.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe, Christian Szegedy
Batch Normalization. Normalising activations per-batch stabilises and speeds up training — a now-standard building block in deep networks.
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger, Philipp Fischer, Thomas Brox
U-Net. Encoder-decoder with skip connections that became the default architecture for medical imaging and any dense-prediction task on small datasets.
Adam: A Method for Stochastic Optimization
Diederik P. Kingma, Jimmy Ba
Adam. Adaptive first-order optimizer that became the default for almost every deep learning codebase — a must-implement from scratch.
Dropout: A Simple Way to Prevent Neural Networks from Overfitting
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, +2 more
Dropout. Randomly zeroing units during training as an implicit ensemble — one of the simplest and most effective regularizers in deep learning.
Generative Adversarial Networks
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, +5 more
The original GAN paper. A generator and a discriminator locked in a minimax game — an elegant framing that opened an entire subfield of generative modelling.
Auto-Encoding Variational Bayes
Diederik P. Kingma, Max Welling
VAE. The variational autoencoder — a principled probabilistic generative model with a learned latent space and the reparameterisation trick.
Playing Atari with Deep Reinforcement Learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, +2 more
DQN. Deep Q-Networks learn to play Atari games from raw pixels, kickstarting the deep reinforcement learning era.
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov, Kai Chen, Greg Corrado, +1 more
Word2Vec. Skip-gram and CBOW turned words into dense vectors whose geometry encodes meaning — the bridge between symbolic text and modern deep learning.
Long Short-Term Memory
Sepp Hochreiter, Jürgen Schmidhuber
LSTM. The gated recurrent unit that solved the vanishing gradient problem and ran sequence modelling for two decades.
Why implement classic papers?
Reading a paper and implementing it are two very different skills. PaperNova's workbook tool bridges that gap: Gemini turns the paper into a sequence of small, self-contained exercises — from a warm-up reimplementation of the core idea up to advanced extensions — then assembles them into a Jupyter notebook you can run, edit and extend.
Prefer to work from your own paper? Upload a PDF and get the same guided workbook tailored to it.