Andrej Karpathy Academic Website

Deprecated. It's been a while since I graduated from Stanford. My main webpage has moved to karpathy.ai

Bio. I am the Sr. Director of AI at Tesla, where I lead the team responsible for all neural networks on the Autopilot. Previously, I was a Research Scientist at OpenAI working on Deep Learning in Computer Vision, Generative Modeling and Reinforcement Learning. I received my PhD from Stanford, where I worked with Fei-Fei Li on Convolutional/Recurrent Neural Network architectures and their applications in Computer Vision, Natural Language Processing and their intersection. Over the course of my PhD I squeezed in two internships at Google where I worked on large-scale feature learning over YouTube videos, and in 2015 I interned at DeepMind on the Deep Reinforcement Learning team. Together with Fei-Fei, I designed and was the primary instructor for a new Stanford class on Convolutional Neural Networks for Visual Recognition (CS231n). The class was the first Deep Learning course offering at Stanford and has grown from 150 enrolled in 2015 to 330 students in 2016, and 750 students in 2017.

On a side for fun I blog, blog more, and tweet. I developed a number of Deep Learning libraries in Javascript (e.g. ConvNetJS, RecurrentJS, REINFORCEjs, t-sneJS) because I love the web. I am sometimes jokingly referred to as the reference human for ImageNet (post :)). Whenever I can spare the time I maintain arxiv-sanity.com, which lets you search and sort through almost 100,000 Arxiv papers on Machine Learning over the last 6 years.

Timeline.
2017-now: Sr. Director of AI at Tesla (article) Neural Networks for the Autopilot
2016-2017: Research Scientist at OpenAI Deep Learning, Generative Models, Reinforcement Learning
Summer 2015: DeepMind Internship Deep Reinforcement Learning group
Summer 2013: Google Research Internship Large-Scale Supervised Deep Learning for Videos
2011-2015: Stanford Computer Science Ph.D. student Deep Learning, Computer Vision, Natural Language Processing. Adviser: Fei-Fei Li.
Summer 2011: Google Research Internship Large-Scale Unsupervised Deep Learning for Videos
2009-2011: University of British Columbia: MSc Learning Controllers for Physically-simulated Figures. Adviser: Michiel van de Panne
2005-2009: University of Toronto: BSc Double major in Computer Science and Physics

Talks

AI for Full-Self Driving @ ScaledML 2020

Tesla Autonomy Day, 2019

Multi-Task Learning in the Wilderness @ ICML 2019

Building the Software 2.0 stack @ Spark-AI 2018

2017 RE•WORK Summit with Nathan Benaich

2017 "Heroes of Deep Learning" with Andrew Ng

Deep RL Bootcamp @ Berkeley with Pieter Abbeel et al.

2016 Bay Area Deep Learning School: Convolutional Neural Networks

CVPR 2016 Deep Learning Workshop

RE•WORK Deep Learning Summit 2016

CVPR 2015 Oral

Academic

2017 Automated Image Captioning with ConvNets and Recurrent Nets

ICVSS 2016 Summer School Keynote Invited Speaker

Deep Vision Workshop at CVPR 2016

MIT EECS Special Seminar: Andrej Karpathy "Connecting Images and Natural Language"

Princeton CS Department Colloquium: "Connecting Images and Natural Language"

Deep Vision Workshop at CVPR 2015

Language & Vision Workshop 2015

Bay Area Multimedia Forum: Large-scale Video Classification with CNNs

CVPR 2014 Oral: Large-scale Video Classification with Convolutional Neural Networks

ICRA 2014: Object Discovery in 3D Scenes Via Shape Analysis

On a side

Stanford University and NVIDIA Tech Talks and Hands-on Labs

SF ML meetup: Automated Image Captioning with ConvNets and Recurrent Nets

Oxford Robotics Research Group Seminar

Imperial College London Seminar

Cambridge Computer Science Seminar

London Machine Learning Meetup

London Data Science Meetup

London Deep Learning Meetup

NVIDIA GTC 2015 Keynote

Teaching

Winter 2015/2016: I was the primary instructor for CS231n: Convolutional Neural Networks for Visual Recognition.

Refer to the class notes, lecture slides on the course syllabus page, and on Reddit r/cs231n.

Publications

[PDF] World of Bits: An Open-Domain Platform for Web-Based Agents,

Tianlin (Tim) Shi, Andrej Karpathy, Linxi (Jim) Fan, Jonathan Hernandez, Percy Liang

ICML 2017

[PDF] PixelCNN++: A PixelCNN Implementation with Discretized Logistic Mixture Likelihood and Other Modifications,

Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma, and Yaroslav Bulatov

ICLR 2017

[PDF] Connecting Images and Natural Language,

Andrej Karpathy,

PhD Thesis, 2016

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

Efficiently identify and caption all the things in an image with a single forward pass of a network. Our model is fully differentiable and trained end-to-end without any pipelines. The model is also very efficient (processes a 720x600 image in only 240ms), and evaluation on a large-scale dataset of 94,000 images and 4,100,000 region captions shows that it outperforms baselines based on previous approaches.

Justin Johnson*, Andrej Karpathy*, Li Fei-Fei

CVPR 2016 (Oral)

Visualizing and Understanding Recurrent Networks

We study both qualitatively and quantitatively the performance improvements of Recurrent Networks in Language Modeling tasks compared to finite-horizon models. Our analysis sheds light on the source of improvements , and identifies areas for further potential gains. Among some fun results we find LSTM cells that keep track of long-range dependencies such as line lengths, quotes and brackets.

Andrej Karpathy*, Justin Johnson*, Li Fei-Fei

ICLR 2016 Workshop

Deep Visual-Semantic Alignments for Generating Image Descriptions

We present a model that generates natural language descriptions of full images and their regions. For generating sentences about a given image region we describe a Multimodal Recurrent Neural Network architecture. For inferring the latent alignments between segments of sentences and regions of images we describe a model based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. This work was also featured in a recent New York Times article.

Andrej Karpathy, Li Fei-Fei

CVPR 2015 (Oral)

ImageNet Large Scale Visual Recognition Challenge

Everything you wanted to know about ILSVRC: data collection, results, trends, current computer vision accuracy, even a stab at computer vision vs. human vision accuracy -- all here! My own contribution to this work were the human accuracy evaluation experiments.

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei

IJCV 2015

Deep Fragment Embeddings for Bidirectional Image-Sentence Mapping

We train a multi-modal embedding to associate fragments of images (objects) and sentences (noun and verb phrases) with a structured, max-margin objective. Our model enables efficient and interpretible retrieval of images from sentence descriptions (and vice versa).

Andrej Karpathy, Armand Joulin, Li Fei-Fei

NIPS 2014

Large-Scale Video Classification with Convolutional Neural Networks

We introduce Sports-1M: a dataset of 1.1 million YouTube videos with 487 classes of Sport. This dataset allowed us to train large Convolutional Neural Networks that learn spatio-temporal features from video rather than single, static images.

Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei

CVPR 2014 (Oral)

Project
PDF

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Our model learns to associate images and sentences in a common We use a Recursive Neural Network to compute representation for sentences and a Convolutional Neural Network for images. We then learn a model that associates images and sentences through a structured, max-margin objective.

Richard Socher, Andrej Karpathy, Quoc V. Le, Christopher D. Manning, Andrew Y. Ng

TACL 2013

Emergence of Object-Selective Features in Unsupervised Feature Learning

We introduce an unsupervised feature learning algorithm that is trained explicitly with k-means for simple cells and a form of agglomerative clustering for complex cells. When trained on a large dataset of YouTube frames, the algorithm automatically discovers semantic concepts, such as faces.

Adam Coates, Andrej Karpathy, Andrew Ng

NIPS 2012

Locomotion Skills for Simulated Quadrupeds

We develop an integrated set of gaits and skills for a physics-based simulation of a quadruped. The controllers use a representation based on gait graphs, a dual leg frame model, a flexible spine model, and the extensive use of internal virtual forces applied via the Jacobian transpose.

Stelian Coros, Andrej Karpathy, Benjamin Jones, Lionel Reveret, Michiel van de Panne

SIGGRAPH 2011

Project

Object Discovery in 3D scenes via Shape Analysis

Wouldn't it be great if our robots could drive around our environments and autonomously discovered and learned about objects? In this work we introduce a simple object discovery method that takes as input a scene mesh and outputs a ranked set of segments of the mesh that are likely to constitute objects.

Andrej Karpathy, Stephen Miller, Li Fei-Fei

ICRA 2013

PDF, Code, Data

Curriculum Learning for Motor Skills

My UBC Master's thesis project. My work was on curriculum learning for motor skills. In particular, I was working with a heavily underactuated (single joint) footed acrobot. The acrobot used a devised curriculum to learn a large variety of parameterized motor skill policies, skill connectivites, and also hierarchical skills that depended on previously acquired skills. Almost all of it from scratch. The project was heavily influenced by intuitions about human development and learning (i.e. trial and error learning, the idea of gradually building skill competencies). The ideas in this work were good, but at the time I wasn't savvy enough to formulate them in a mathematically elaborate way. The video is a fun watch!

Andrej Karpathy, Michiel van de Panne

AI 2012

Talks

Teaching

Publications

Pet Projects

Misc