Ravi Teja Mullapudi Dynamic Model Specialization for Efficient Inference, Training and Supervision Degree Type: Ph.D. in Computer Science Advisor(s): Kayvon Fatahalian, Deva Ramanan Graduated: August 2021 Abstract: Abstract Recent supervised learning approaches focus on designing and building models that generalize to a wide range of scenarios. The key ingredients for building these general models are large scale datasets that capture a diverse set of scenarios and computational resources to train large models. This large scale supervised learning approach has well known scalability challenges namely: 1) accurate general models are computationally expensive for training and inference 2) collecting and labeling large datasets requires extensive human effort and 3) datasets need to be repeatedly curated due to shifts in the target distribution. In this thesis, we argue that in many cases creating a set of highly specialized models that span the domain of interest can reduce model inference, training, and supervision costs, compared to creating a single monolithic model that generalizes across the entire domain. Specifically, we exploit temporal specialization for building efficient video segmentation models. We show that continuously specializing a compact model to the content in a video stream enables accurate and efficient inference. We leverage specialization to visually similar categories for building efficient image classification architectures. We show that by specializing model features to discriminate between visually similar categories, one can improve inference efficiency by only computing the subset of features necessary for classifying a specific image. We exploit specialization to individual categories for reducing human labeling effort in building models for rare categories. We show that models specialized for binary classification of individual rare categories reduce human effort in mining large unlabeled data collections for relevant examples. More broadly, we demonstrate that by dynamically specializing to a moment in time, to an input scene, or to a specific object category, it is possible to train accurate models quickly, reduce inference costs, and significantly reduce the amount of supervision required for training. Thesis Committee: Kayvon Fatahalian (Co-Chair) Deva Ramanan (Co-Chair) David G. Andersen Ross Girshick (Facebook) William R. Mark (Google) Srinivasan Seshan, Head, Computer Science Department Mar tial Hebert, Dean, School of Computer Science Keywords: Model Specialization, Computer Vision, Machine Learning, Deep Learning CMU-CS-21-128.pdf (55.38 MB) ( 112 pages) Copyright Notice