What constitutes an object? This has been a longstanding question in computer vision. Towards thi... more What constitutes an object? This has been a longstanding question in computer vision. Towards this goal, numerous learning-free and learning-based approaches have been developed to score objectness. However, they generally do not scale well across new domains and for unseen objects. In this paper, we advocate that existing methods lack a top-down supervision signal governed by human-understandable semantics. To bridge this gap, we explore recent Multi-modal Vision Transformers (MViT) that have been trained with aligned image-text pairs. Our extensive experiments across various domains and novel objects show the state-of-the-art performance of MViTs to localize generic objects in images. Based on these findings, we develop an efficient and flexible MViT architecture using multi-scale feature processing and deformable self-attention that can adaptively generate proposals given a specific language query. We show the significance of MViT proposals in a diverse range of applications incl...
2019 International Conference on Frontiers of Information Technology (FIT), 2019
Traffic situation in smart cities is getting worse day by day, which needs optimized control of t... more Traffic situation in smart cities is getting worse day by day, which needs optimized control of traffic signals. This paper proposes a complete solution for real-time traffic signal control to reduce traffic congestion. For providing efficient solutions, we develop two machine-learning approaches, namely Edge based Traffic Light Control (ETLC) and Global Traffic Light Control (GTLC). The former approach is used to control traffic congestion for a smaller congested area and latter one provides the solution for the whole city. Both the approaches utilize real-time traffic data, generated from a vehicular traffic simulator, to solve the congestion problem in real-time. The ETLC approach identifies the most congested area through K-Means clustering and applies fuzzy logic for congestion removal. The GTLC approach identifies multiple congested areas using an occupancy threshold and reduces congestion. Our model is flexible to run each approach individually or integrate both the approaches to run sequentially. Comparison of the two approaches shows that the ETLC approach works better than GTLC approach for small areas. However, GTLC gives good results in peak congestions spanning larger areas.
Recent research in self-supervised learning (SSL) has shown its capability in learning useful sem... more Recent research in self-supervised learning (SSL) has shown its capability in learning useful semantic representations from images for classification tasks. Through our work, we study the usefulness of SSL for Fine-Grained Visual Categorization (FGVC). FGVC aims to distinguish objects of visually similar sub categories within a general category. The small inter-class, but large intra-class variations within the dataset makes it a challenging task. The limited availability of annotated labels for such a fine-grained data encourages the need for SSL, where additional supervision can boost learning without the cost of extra annotations. Our baseline achieves 86.36% top-1 classification accuracy on CUB-200-2011 dataset by utilizing random crop augmentation during training and center crop augmentation during testing. In this work, we explore the usefulness of various pretext tasks, specifically, rotation, pretext invariant representation learning (PIRL), and deconstruction and constructi...
What constitutes an object? This has been a longstanding question in computer vision. Towards thi... more What constitutes an object? This has been a longstanding question in computer vision. Towards this goal, numerous learning-free and learning-based approaches have been developed to score objectness. However, they generally do not scale well across new domains and for unseen objects. In this paper, we advocate that existing methods lack a top-down supervision signal governed by human-understandable semantics. To bridge this gap, we explore recent Multi-modal Vision Transformers (MViT) that have been trained with aligned image-text pairs. Our extensive experiments across various domains and novel objects show the state-of-the-art performance of MViTs to localize generic objects in images. Based on these findings, we develop an efficient and flexible MViT architecture using multi-scale feature processing and deformable self-attention that can adaptively generate proposals given a specific language query. We show the significance of MViT proposals in a diverse range of applications incl...
2019 International Conference on Frontiers of Information Technology (FIT), 2019
Traffic situation in smart cities is getting worse day by day, which needs optimized control of t... more Traffic situation in smart cities is getting worse day by day, which needs optimized control of traffic signals. This paper proposes a complete solution for real-time traffic signal control to reduce traffic congestion. For providing efficient solutions, we develop two machine-learning approaches, namely Edge based Traffic Light Control (ETLC) and Global Traffic Light Control (GTLC). The former approach is used to control traffic congestion for a smaller congested area and latter one provides the solution for the whole city. Both the approaches utilize real-time traffic data, generated from a vehicular traffic simulator, to solve the congestion problem in real-time. The ETLC approach identifies the most congested area through K-Means clustering and applies fuzzy logic for congestion removal. The GTLC approach identifies multiple congested areas using an occupancy threshold and reduces congestion. Our model is flexible to run each approach individually or integrate both the approaches to run sequentially. Comparison of the two approaches shows that the ETLC approach works better than GTLC approach for small areas. However, GTLC gives good results in peak congestions spanning larger areas.
Recent research in self-supervised learning (SSL) has shown its capability in learning useful sem... more Recent research in self-supervised learning (SSL) has shown its capability in learning useful semantic representations from images for classification tasks. Through our work, we study the usefulness of SSL for Fine-Grained Visual Categorization (FGVC). FGVC aims to distinguish objects of visually similar sub categories within a general category. The small inter-class, but large intra-class variations within the dataset makes it a challenging task. The limited availability of annotated labels for such a fine-grained data encourages the need for SSL, where additional supervision can boost learning without the cost of extra annotations. Our baseline achieves 86.36% top-1 classification accuracy on CUB-200-2011 dataset by utilizing random crop augmentation during training and center crop augmentation during testing. In this work, we explore the usefulness of various pretext tasks, specifically, rotation, pretext invariant representation learning (PIRL), and deconstruction and constructi...
Uploads