In the past few years, deep convolutional neural networks(CNNs) trained on large image data sets have shown impres-sive visual object recognition performances. Consequently,these models have attracted the attention of the cognitive sci-ence community. Recent studies comparing CNNs with neuraldata from cortical area IT suggest that CNNs may—in addi-tion to providing good engineering solutions—provide goodmodels of biological visual systems. Here, we report evidencethat CNNs are, in fact, not good models of human visual per-ception. We show that a 3D shape inference model explainshuman performance on an object shape similarity task betterthan CNNs. We argue that deep neural networks trained onlarge amounts of image data to maximize object recognitionperformance do not provide adequate models of human vision.