1. Barret Zoph: Automating AI and RLHF
Zoph’s work moved the industry from hand-designed models to AI-designed architectures and led the "alignment" phase for ChatGPT.
Neural Architecture Search with Reinforcement Learning (2016) – The foundational paper for AutoML.
Learning Transferable Architectures for Scalable Image Recognition (2017) – Introduced NASNet.
Searching for Activation Functions (2017) – Discovery of the Swish activation function.
Efficient Neural Architecture Search via Parameter Sharing (ENAS) (2018) – Made NAS computationally feasible.
AutoAugment: Learning Augmentation Policies from Data (2018) – Automated data processing.
SpecAugment: A Simple Data Augmentation Method for ASR (2019) – Revolutionized speech recognition.
Switch Transformers: Scaling to Trillion Parameter Models (2021) – A landmark paper on Mixture of Experts (MoE).
Scaling Instruction-Finetuned Language Models (Flan) (2022) – Key work on instruction tuning.
GPT-4 Technical Report (2023) – (Co-author) Lead for the post-training/alignment sections.
2. Luke Metz: Learned Optimization and Generative Models
Metz’s work focuses on the "meta" level—training models to understand how they should learn and solve complex logic.
Unsupervised Representation Learning with DCGANs (2015) – One of the most famous papers in generative AI history.
Unrolled Generative Adversarial Networks (2016) – Solved the "mode collapse" problem in GANs.
Meta-Learning Update Rules for Unsupervised Representation Learning (2018) – Early work on "learning to learn."
Understanding and Correcting Pathologies in the Training of Learned Optimizers (2019) – Identified why AI optimizers fail.
VeLO: Training Versatile Learned Optimizers by Scaling Up (2022) – Introduced a general-purpose AI optimizer that outperforms human-coded ones.
Gradients are Not All You Need (2022) – Explored alternative ways to update model weights.
Beyond the Imitation Game: Quantifying Capabilities (BIG-bench) (2023) – A massive benchmark for testing LLMs.
ChatGPT System Card / GPT-4o (2024) – Technical work on model safety and system behavior.
3. Samuel Schoenholz: The Physics of Deep Learning
Schoenholz provides the mathematical "blueprints" that allow us to build massive models without them breaking.
Deep Information Propagation (2016) – Defined how signals travel through deep neural networks.
Deep Neural Networks as Gaussian Processes (2017) – Connected deep learning to classical statistics.
Resurrecting the Sigmoid in Deep Learning (2017) – Used "Dynamical Isometry" to train massive 10,000-layer networks.
Wide Neural Networks Evolve as Linear Models (NTK) (2019) – Proved the Neural Tangent Kernel (NTK) theory.
JAX MD: A Framework for Differentiable Physics (2020) – Built the library for AI-powered physics simulations.
Neural Tangents: Fast and Easy Infinite Neural Networks (2019) – A library for studying theoretical AI limits.
Tensor Programs V: Tuning Large Neural Networks ($\mu$P) (2022) – Developed $\mu$Transfer, the method used to scale GPT-4 efficiently.
Scaling Deep Learning for Materials Discovery (GNoME) (2023) – Published in Nature, detailing the discovery of 2.2 million new crystals using AI.

Comments
Post a Comment