Search | arXiv e-print repository

arXiv:2410.10592 [pdf, other]

Voltage-Controlled Magnetic Tunnel Junction based ADC-less Global Shutter Processing-in-Pixel for Extreme-Edge Intelligence

Authors: Md Abdullah-Al Kaiser, Gourav Datta, Jordan Athas, Christian Duffee, Ajey P. Jacob, Pedram Khalili Amiri, Peter A. Beerel, Akhilesh R. Jaiswal

Abstract: The vast amount of data generated by camera sensors has prompted the exploration of energy-efficient processing solutions for deploying computer vision tasks on edge devices. Among the various approaches studied, processing-in-pixel integrates massively parallel analog computational capabilities at the extreme-edge, i.e., within the pixel array and exhibits enhanced energy and bandwidth efficiency… ▽ More The vast amount of data generated by camera sensors has prompted the exploration of energy-efficient processing solutions for deploying computer vision tasks on edge devices. Among the various approaches studied, processing-in-pixel integrates massively parallel analog computational capabilities at the extreme-edge, i.e., within the pixel array and exhibits enhanced energy and bandwidth efficiency by generating the output activations of the first neural network layer rather than the raw sensory data. In this article, we propose an energy and bandwidth efficient ADC-less processing-in-pixel architecture. This architecture implements an optimized binary activation neural network trained using Hoyer regularizer for high accuracy on complex vision tasks. In addition, we also introduce a global shutter burst memory read scheme utilizing fast and disturb-free read operation leveraging innovative use of nanoscale voltage-controlled magnetic tunnel junctions (VC-MTJs). Moreover, we develop an algorithmic framework incorporating device and circuit constraints (characteristic device switching behavior and circuit non-linearity) based on state-of-the-art fabricated VC-MTJ characteristics and extensive circuit simulations using commercial GlobalFoundries 22nm FDX technology. Finally, we evaluate the proposed system's performance on two complex datasets - CIFAR10 and ImageNet, showing improvements in front-end and communication energy efficiency by 8.2x and 8.5x respectively and reduction in bandwidth by 6x compared to traditional computer vision systems, without any significant drop in the test accuracy. △ Less

Submitted 14 October, 2024; originally announced October 2024.

Comments: 25 pages, 9 figures, 1 table

arXiv:2409.17341 [pdf, other]

Energy-Efficient & Real-Time Computer Vision with Intelligent Skipping via Reconfigurable CMOS Image Sensors

Authors: Md Abdullah-Al Kaiser, Sreetama Sarkar, Peter A. Beerel, Akhilesh R. Jaiswal, Gourav Datta

Abstract: Current video-based computer vision (CV) applications typically suffer from high energy consumption due to reading and processing all pixels in a frame, regardless of their significance. While previous works have attempted to reduce this energy by skipping input patches or pixels and using feedback from the end task to guide the skipping algorithm, the skipping is not performed during the sensor r… ▽ More Current video-based computer vision (CV) applications typically suffer from high energy consumption due to reading and processing all pixels in a frame, regardless of their significance. While previous works have attempted to reduce this energy by skipping input patches or pixels and using feedback from the end task to guide the skipping algorithm, the skipping is not performed during the sensor read phase. As a result, these methods can not optimize the front-end sensor energy. Moreover, they may not be suitable for real-time applications due to the long latency of modern CV networks that are deployed in the back-end. To address this challenge, this paper presents a custom-designed reconfigurable CMOS image sensor (CIS) system that improves energy efficiency by selectively skipping uneventful regions or rows within a frame during the sensor's readout phase, and the subsequent analog-to-digital conversion (ADC) phase. A novel masking algorithm intelligently directs the skipping process in real-time, optimizing both the front-end sensor and back-end neural networks for applications including autonomous driving and augmented/virtual reality (AR/VR). Our system can also operate in standard mode without skipping, depending on application needs. We evaluate our hardware-algorithm co-design framework on object detection based on BDD100K and ImageNetVID, and gaze estimation based on OpenEDS, achieving up to 53% reduction in front-end sensor energy while maintaining state-of-the-art (SOTA) accuracy. △ Less

Submitted 25 September, 2024; originally announced September 2024.

Comments: Under review

arXiv:2407.13689 [pdf, other]

Shaded Route Planning Using Active Segmentation and Identification of Satellite Images

Authors: Longchao Da, Rohan Chhibba, Rushabh Jaiswal, Ariane Middel, Hua Wei

Abstract: Heatwaves pose significant health risks, particularly due to prolonged exposure to high summer temperatures. Vulnerable groups, especially pedestrians and cyclists on sun-exposed sidewalks, motivate the development of a route planning method that incorporates somatosensory temperature effects through shade ratio consideration. This paper is the first to introduce a pipeline that utilizes segmentat… ▽ More Heatwaves pose significant health risks, particularly due to prolonged exposure to high summer temperatures. Vulnerable groups, especially pedestrians and cyclists on sun-exposed sidewalks, motivate the development of a route planning method that incorporates somatosensory temperature effects through shade ratio consideration. This paper is the first to introduce a pipeline that utilizes segmentation foundation models to extract shaded areas from high-resolution satellite images. These areas are then integrated into a multi-layered road map, enabling users to customize routes based on a balance between distance and shade exposure, thereby enhancing comfort and health during outdoor activities. Specifically, we construct a graph-based representation of the road map, where links indicate connectivity and are updated with shade ratio data for dynamic route planning. This system is already implemented online, with a video demonstration, and will be specifically adapted to assist travelers during the 2024 Olympic Games in Paris. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: Paper accepted to CIKM24 demo track

MSC Class: 68T45; 68U35 ACM Class: I.2.10; I.4.8

arXiv:2405.13351 [pdf, other]

Quantum (Inspired) $D^2$-sampling with Applications

Authors: Ragesh Jaiswal, Poojan Shah

Abstract: $D^2$-sampling is a fundamental component of sampling-based clustering algorithms such as $k$-means++. Given a dataset $V \subset \mathbb{R}^d$ with $N$ points and a center set $C \subset \mathbb{R}^d$, $D^2$-sampling refers to picking a point from $V$ where the sampling probability of a point is proportional to its squared distance from the nearest center in $C$. Starting with empty $C… ▽ More $D^2$-sampling is a fundamental component of sampling-based clustering algorithms such as $k$-means++. Given a dataset $V \subset \mathbb{R}^d$ with $N$ points and a center set $C \subset \mathbb{R}^d$, $D^2$-sampling refers to picking a point from $V$ where the sampling probability of a point is proportional to its squared distance from the nearest center in $C$. Starting with empty $C$ and iteratively $D^2$-sampling and updating $C$ in $k$ rounds is precisely $k$-means++ seeding that runs in $O(Nkd)$ time and gives $O(\log{k})$-approximation in expectation for the $k$-means problem. We give a quantum algorithm for (approximate) $D^2$-sampling in the QRAM model that results in a quantum implementation of $k$-means++ that runs in time $\tilde{O}(ζ^2 k^2)$. Here $ζ$ is the aspect ratio (i.e., largest to smallest interpoint distance), and $\tilde{O}$ hides polylogarithmic factors in $N, d, k$. It can be shown through a robust approximation analysis of $k$-means++ that the quantum version preserves its $O(\log{k})$ approximation guarantee. Further, we show that our quantum algorithm for $D^2$-sampling can be 'dequantized' using the sample-query access model of Tang (PhD Thesis, Ewin Tang, University of Washington, 2023). This results in a fast quantum-inspired classical implementation of $k$-means++, which we call QI-$k$-means++, with a running time $O(Nd) + \tilde{O}(ζ^2k^2d)$, where the $O(Nd)$ term is for setting up the sample-query access data structure. Experimental investigations show promising results for QI-$k$-means++ on large datasets with bounded aspect ratio. Finally, we use our quantum $D^2$-sampling with the known $ D^2$-sampling-based classical approximation scheme (i.e., $(1+\varepsilon)$-approximation for any given $\varepsilon>0$) to obtain the first quantum approximation scheme for the $k$-means problem with polylogarithmic running time dependence on $N$. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2308.08167

arXiv:2405.02718 [pdf, other]

Zak-OTFS: Pulse Shaping and the Tradeoff between Time/Bandwidth Expansion and Predictability

Authors: Jinu Jayachandran, Rahul Kumar Jaiswal, Saif Khan Mohammed, Ronny Hadani, Ananthanarayanan Chockalingam, Robert Calderbank

Abstract: The Zak-OTFS input/output (I/O) relation is predictable and non-fading when the delay and Doppler periods are greater than the effective channel delay and Doppler spreads, a condition which we refer to as the crystallization condition. When the crystallization condition is satisfied, we describe how to integrate sensing and communication within a single Zak-OTFS subframe by transmitting a pilot in… ▽ More The Zak-OTFS input/output (I/O) relation is predictable and non-fading when the delay and Doppler periods are greater than the effective channel delay and Doppler spreads, a condition which we refer to as the crystallization condition. When the crystallization condition is satisfied, we describe how to integrate sensing and communication within a single Zak-OTFS subframe by transmitting a pilot in the center of the subframe and surrounding the pilot with a pilot region and guard band to mitigate interference between data symbols and pilot. At the receiver we first read off the effective channel taps within the pilot region, and then use the estimated channel taps to recover the data from the symbols received outside the pilot region. We introduce a framework for filter design in the delay-Doppler (DD) domain where the symplectic Fourier transform connects aliasing in the DD domain (predictability of the I/O relation) with time/bandwidth expansion. The choice of pulse shaping filter determines the fraction of pilot energy that lies outside the pilot region and the degradation in BER performance that results from the interference to data symbols. We demonstrate that Gaussian filters in the DD domain provide significant improvements in BER performance over the sinc and root raised cosine filters considered in previous work. We also demonstrate that, by limiting DD domain aliasing, Gaussian filters extend the region where the crystallization condition is satisfied. The Gaussian filters considered in this paper are a particular case of factorizable pulse shaping filters in the DD domain, and this family of filters may be of independent interest. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2404.10305 [pdf, other]

doi 10.1145/3606040.3617444

TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content

Authors: Avinash Anand, Raj Jaiswal, Pijush Bhuyan, Mohit Gupta, Siddhesh Bangar, Md. Modassir Imam, Rajiv Ratn Shah, Shin'ichi Satoh

Abstract: The automatic recognition of tabular data in document images presents a significant challenge due to the diverse range of table styles and complex structures. Tables offer valuable content representation, enhancing the predictive capabilities of various systems such as search engines and Knowledge Graphs. Addressing the two main problems, namely table detection (TD) and table structure recognition… ▽ More The automatic recognition of tabular data in document images presents a significant challenge due to the diverse range of table styles and complex structures. Tables offer valuable content representation, enhancing the predictive capabilities of various systems such as search engines and Knowledge Graphs. Addressing the two main problems, namely table detection (TD) and table structure recognition (TSR), has traditionally been approached independently. In this research, we propose an end-to-end pipeline that integrates deep learning models, including DETR, CascadeTabNet, and PP OCR v2, to achieve comprehensive image-based table recognition. This integrated approach effectively handles diverse table styles, complex structures, and image distortions, resulting in improved accuracy and efficiency compared to existing methods like Table Transformers. Our system achieves simultaneous table detection (TD), table structure recognition (TSR), and table content recognition (TCR), preserving table structures and accurately extracting tabular data from document images. The integration of multiple models addresses the intricacies of table recognition, making our approach a promising solution for image-based table understanding, data extraction, and information retrieval applications. Our proposed approach achieves an IOU of 0.96 and an OCR Accuracy of 78%, showcasing a remarkable improvement of approximately 25% in the OCR Accuracy compared to the previous Table Transformer approach. △ Less

Submitted 19 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: 8 pages, 2 figures, Workshop of 1st MMIR Deep Multimodal Learning for Information Retrieval

arXiv:2404.09530 [pdf, other]

doi 10.1145/3595916.3626448

RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

Authors: Avinash Anand, Raj Jaiswal, Mohit Gupta, Siddhesh S Bangar, Pijush Bhuyan, Naman Lal, Rajeev Singh, Ritika Jha, Rajiv Ratn Shah, Shin'ichi Satoh

Abstract: Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of annotated instances, which is both expensive and time-consuming. As a result, differences between the source and target domains may significantly impact how well these… ▽ More Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of annotated instances, which is both expensive and time-consuming. As a result, differences between the source and target domains may significantly impact how well these models function. To solve this problem, domain adaptation approaches have been developed that use a small quantity of labeled data to adjust the model to the target domain. In this research, we introduced a synthetic document dataset called RanLayNet, enriched with automatically assigned labels denoting spatial positions, ranges, and types of layout elements. The primary aim of this endeavor is to develop a versatile dataset capable of training models with robustness and adaptability to diverse document formats. Through empirical experimentation, we demonstrate that a deep layout identification model trained on our dataset exhibits enhanced performance compared to a model trained solely on actual documents. Moreover, we conduct a comparative analysis by fine-tuning inference models using both PubLayNet and IIIT-AR-13K datasets on the Doclaynet dataset. Our findings emphasize that models enriched with our dataset are optimal for tasks such as achieving 0.398 and 0.588 mAP95 score in the scientific document domain for the TABLE class. △ Less

Submitted 19 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: 8 pages, 6 figures, MMAsia 2023 Proceedings of the 5th ACM International Conference on Multimedia in Asia

Journal ref: In Proceedings of the 5th ACM International Conference on Multimedia in Asia 2023. Association for Computing Machinery, NY, USA, Article 74, pp. 1-6

arXiv:2402.15121 [pdf, other]

Toward High Performance, Programmable Extreme-Edge Intelligence for Neuromorphic Vision Sensors utilizing Magnetic Domain Wall Motion-based MTJ

Authors: Md Abdullah-Al Kaiser, Gourav Datta, Peter A. Beerel, Akhilesh R. Jaiswal

Abstract: The desire to empower resource-limited edge devices with computer vision (CV) must overcome the high energy consumption of collecting and processing vast sensory data. To address the challenge, this work proposes an energy-efficient non-von-Neumann in-pixel processing solution for neuromorphic vision sensors employing emerging (X) magnetic domain wall magnetic tunnel junction (MDWMTJ) for the firs… ▽ More The desire to empower resource-limited edge devices with computer vision (CV) must overcome the high energy consumption of collecting and processing vast sensory data. To address the challenge, this work proposes an energy-efficient non-von-Neumann in-pixel processing solution for neuromorphic vision sensors employing emerging (X) magnetic domain wall magnetic tunnel junction (MDWMTJ) for the first time, in conjunction with CMOS-based neuromorphic pixels. Our hybrid CMOS+X approach performs in-situ massively parallel asynchronous analog convolution, exhibiting low power consumption and high accuracy across various CV applications by leveraging the non-volatility and programmability of the MDWMTJ. Moreover, our developed device-circuit-algorithm co-design framework captures device constraints (low tunnel-magnetoresistance, low dynamic range) and circuit constraints (non-linearity, process variation, area consideration) based on monte-carlo simulations and device parameters utilizing GF22nm FD-SOI technology. Our experimental results suggest we can achieve an average of 45.3% reduction in backend-processor energy, maintaining similar front-end energy compared to the state-of-the-art and high accuracy of 79.17% and 95.99% on the DVS-CIFAR10 and IBM DVS128-Gesture datasets, respectively. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: 11 pages, 7 figures, 2 table

arXiv:2401.06714 [pdf, other]

FPT Approximation for Capacitated Sum of Radii

Authors: Ragesh Jaiswal, Amit Kumar, Jatin Yadav

Abstract: We consider the capacitated clustering problem in general metric spaces where the goal is to identify $k$ clusters and minimize the sum of the radii of the clusters (we call this the Capacitated-$k$-sumRadii problem). We are interested in fixed-parameter tractable (FPT) approximation algorithms where the running time is of the form $f(k) \cdot \text{poly}(n)$, where $f(k)$ can be an exponential fu… ▽ More We consider the capacitated clustering problem in general metric spaces where the goal is to identify $k$ clusters and minimize the sum of the radii of the clusters (we call this the Capacitated-$k$-sumRadii problem). We are interested in fixed-parameter tractable (FPT) approximation algorithms where the running time is of the form $f(k) \cdot \text{poly}(n)$, where $f(k)$ can be an exponential function of $k$ and $n$ is the number of points in the input. In the uniform capacity case, Bandyapadhyay et al. recently gave a $4$-approximation algorithm for this problem. Our first result improves this to an FPT $3$-approximation and extends to a constant factor approximation for any $L_p$ norm of the cluster radii. In the general capacities version, Bandyapadhyay et al. gave an FPT $15$-approximation algorithm. We extend their framework to give an FPT $(4 + \sqrt{13})$-approximation algorithm for this problem. Our framework relies on a novel idea of identifying approximations to optimal clusters by carefully pruning points from an initial candidate set of points. This is in contrast to prior results that rely on guessing suitable points and building balls of appropriate radii around them. On the hardness front, we show that assuming the Exponential Time Hypothesis, there is a constant $c > 1$ such that any $c$-approximation algorithm for the non-uniform capacity version of this problem requires running time $2^{Ω\left(\frac{k}{polylog(k)} \right)}$. △ Less

Submitted 12 January, 2024; originally announced January 2024.

arXiv:2310.16844 [pdf, other]

Hardware-Algorithm Co-design Enabling Processing-in-Pixel-in-Memory (P2M) for Neuromorphic Vision Sensors

Authors: Md Abdullah-Al Kaiser, Akhilesh R. Jaiswal

Abstract: The high volume of data transmission between the edge sensor and the cloud processor leads to energy and throughput bottlenecks for resource-constrained edge devices focused on computer vision. Hence, researchers are investigating different approaches (e.g., near-sensor processing, in-sensor processing, in-pixel processing) by executing computations closer to the sensor to reduce the transmission… ▽ More The high volume of data transmission between the edge sensor and the cloud processor leads to energy and throughput bottlenecks for resource-constrained edge devices focused on computer vision. Hence, researchers are investigating different approaches (e.g., near-sensor processing, in-sensor processing, in-pixel processing) by executing computations closer to the sensor to reduce the transmission bandwidth. Specifically, in-pixel processing for neuromorphic vision sensors (e.g., dynamic vision sensors (DVS)) involves incorporating asynchronous multiply-accumulate (MAC) operations within the pixel array, resulting in improved energy efficiency. In a CMOS implementation, low overhead energy-efficient analog MAC accumulates charges on a passive capacitor; however, the capacitor's limited charge retention time affects the algorithmic integration time choices, impacting the algorithmic accuracy, bandwidth, energy, and training efficiency. Consequently, this results in a design trade-off on the hardware aspect-creating a need for a low-leakage compute unit while maintaining the area and energy benefits. In this work, we present a holistic analysis of the hardware-algorithm co-design trade-off based on the limited integration time posed by the hardware and techniques to improve the leakage performance of the in-pixel analog MAC operations. △ Less

Submitted 7 October, 2023; originally announced October 2023.

Comments: 6 pages, 4 figures, 1 table

arXiv:2308.08167 [pdf, ps, other]

A Quantum Approximation Scheme for k-Means

Authors: Ragesh Jaiswal

Abstract: We give a quantum approximation scheme (i.e., $(1 + \varepsilon)$-approximation for every $\varepsilon > 0$) for the classical $k$-means clustering problem in the QRAM model with a running time that has only polylogarithmic dependence on the number of data points. More specifically, given a dataset $V$ with $N$ points in $\mathbb{R}^d$ stored in QRAM data structure, our quantum algorithm runs in t… ▽ More We give a quantum approximation scheme (i.e., $(1 + \varepsilon)$-approximation for every $\varepsilon > 0$) for the classical $k$-means clustering problem in the QRAM model with a running time that has only polylogarithmic dependence on the number of data points. More specifically, given a dataset $V$ with $N$ points in $\mathbb{R}^d$ stored in QRAM data structure, our quantum algorithm runs in time $\tilde{O} \left( 2^{\tilde{O}(\frac{k}{\varepsilon})} η^2 d\right)$ and with high probability outputs a set $C$ of $k$ centers such that $cost(V, C) \leq (1+\varepsilon) \cdot cost(V, C_{OPT})$. Here $C_{OPT}$ denotes the optimal $k$-centers, $cost(.)$ denotes the standard $k$-means cost function (i.e., the sum of the squared distance of points to the closest center), and $η$ is the aspect ratio (i.e., the ratio of maximum distance to minimum distance). This is the first quantum algorithm with a polylogarithmic running time that gives a provable approximation guarantee of $(1+\varepsilon)$ for the $k$-means problem. Also, unlike previous works on unsupervised learning, our quantum algorithm does not require quantum linear algebra subroutines and has a running time independent of parameters (e.g., condition number) that appear in such procedures. △ Less

Submitted 24 May, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: An extended version of this paper can be found here arXiv:2405.13351

arXiv:2305.16890 [pdf, ps, other]

Universal Weak Coreset

Authors: Ragesh Jaiswal, Amit Kumar

Abstract: Coresets for $k$-means and $k$-median problems yield a small summary of the data, which preserve the clustering cost with respect to any set of $k$ centers. Recently coresets have also been constructed for constrained $k$-means and $k$-median problems. However, the notion of coresets has the drawback that (i) they can only be applied in settings where the input points are allowed to have weights,… ▽ More Coresets for $k$-means and $k$-median problems yield a small summary of the data, which preserve the clustering cost with respect to any set of $k$ centers. Recently coresets have also been constructed for constrained $k$-means and $k$-median problems. However, the notion of coresets has the drawback that (i) they can only be applied in settings where the input points are allowed to have weights, and (ii) in general metric spaces, the size of the coresets can depend logarithmically on the number of points. The notion of weak coresets, which have less stringent requirements than coresets, has been studied in the context of classical $k$-means and $k$-median problems. A weak coreset is a pair $(J,S)$ of subsets of points, where $S$ acts as a summary of the point set and $J$ as a set of potential centers. This pair satisfies the properties that (i) $S$ is a good summary of the data as long as the $k$ centers are chosen from $J$ only, and (ii) there is a good choice of $k$ centers in $J$ with cost close to the optimal cost. We develop this framework, which we call universal weak coresets, for constrained clustering settings. In conjunction with recent coreset constructions for constrained settings, our designs give greater data compression, are conceptually simpler, and apply to a wide range of constrained $k$-median and $k$-means problems. △ Less

Submitted 26 May, 2023; originally announced May 2023.

arXiv:2305.00175 [pdf, ps, other]

Clustering What Matters in Constrained Settings

Authors: Ragesh Jaiswal, Amit Kumar

Abstract: Constrained clustering problems generalize classical clustering formulations, e.g., $k$-median, $k$-means, by imposing additional constraints on the feasibility of clustering. There has been significant recent progress in obtaining approximation algorithms for these problems, both in the metric and the Euclidean settings. However, the outlier version of these problems, where the solution is allowe… ▽ More Constrained clustering problems generalize classical clustering formulations, e.g., $k$-median, $k$-means, by imposing additional constraints on the feasibility of clustering. There has been significant recent progress in obtaining approximation algorithms for these problems, both in the metric and the Euclidean settings. However, the outlier version of these problems, where the solution is allowed to leave out $m$ points from the clustering, is not well understood. In this work, we give a general framework for reducing the outlier version of a constrained $k$-median or $k$-means problem to the corresponding outlier-free version with only $(1+\varepsilon)$-loss in the approximation ratio. The reduction is obtained by mapping the original instance of the problem to $f(k,m, \varepsilon)$ instances of the outlier-free version, where $f(k, m, \varepsilon) = \left( \frac{k+m}{\varepsilon}\right)^{O(m)}$. As specific applications, we get the following results: - First FPT (in the parameters $k$ and $m$) $(1+\varepsilon)$-approximation algorithm for the outlier version of capacitated $k$-median and $k$-means in Euclidean spaces with hard capacities. - First FPT (in the parameters $k$ and $m$) $(3+\varepsilon)$ and $(9+\varepsilon)$ approximation algorithms for the outlier version of capacitated $k$-median and $k$-means, respectively, in general metric spaces with hard capacities. - First FPT (in the parameters $k$ and $m$) $(2-δ)$-approximation algorithm for the outlier version of the $k$-median problem under the Ulam metric. Our work generalizes the known results to a larger class of constrained clustering problems. Further, our reduction works for arbitrary metric spaces and so can extend clustering algorithms for outlier-free versions in both Euclidean and arbitrary metric spaces. △ Less

Submitted 29 April, 2023; originally announced May 2023.

arXiv:2304.02968 [pdf, other]

doi 10.1145/3583781.3590235

Technology-Circuit-Algorithm Tri-Design for Processing-in-Pixel-in-Memory (P2M)

Authors: Md Abdullah-Al Kaiser, Gourav Datta, Sreetama Sarkar, Souvik Kundu, Zihan Yin, Manas Garg, Ajey P. Jacob, Peter A. Beerel, Akhilesh R. Jaiswal

Abstract: The massive amounts of data generated by camera sensors motivate data processing inside pixel arrays, i.e., at the extreme-edge. Several critical developments have fueled recent interest in the processing-in-pixel-in-memory paradigm for a wide range of visual machine intelligence tasks, including (1) advances in 3D integration technology to enable complex processing inside each pixel in a 3D integ… ▽ More The massive amounts of data generated by camera sensors motivate data processing inside pixel arrays, i.e., at the extreme-edge. Several critical developments have fueled recent interest in the processing-in-pixel-in-memory paradigm for a wide range of visual machine intelligence tasks, including (1) advances in 3D integration technology to enable complex processing inside each pixel in a 3D integrated manner while maintaining pixel density, (2) analog processing circuit techniques for massively parallel low-energy in-pixel computations, and (3) algorithmic techniques to mitigate non-idealities associated with analog processing through hardware-aware training schemes. This article presents a comprehensive technology-circuit-algorithm landscape that connects technology capabilities, circuit design strategies, and algorithmic optimizations to power, performance, area, bandwidth reduction, and application-level accuracy metrics. We present our results using a comprehensive co-design framework incorporating hardware and algorithmic optimizations for various complex real-life visual intelligence tasks mapped onto our P2M paradigm. △ Less

Submitted 6 April, 2023; originally announced April 2023.

Journal ref: GLSVLSI '23: Great Lakes Symposium on VLSI 2023 Proceedings

arXiv:2304.02908 [pdf, other]

doi 10.1145/3583781.3590279

A Context-Switching/Dual-Context ROM Augmented RAM using Standard 8T SRAM

Authors: Md Abdullah-Al Kaiser, Edwin Tieu, Ajey P. Jacob, Akhilesh R. Jaiswal

Abstract: The landscape of emerging applications has been continually widening, encompassing various data-intensive applications like artificial intelligence, machine learning, secure encryption, Internet-of-Things, etc. A sustainable approach toward creating dedicated hardware platforms that can cater to multiple applications often requires the underlying hardware to context-switch or support more than one… ▽ More The landscape of emerging applications has been continually widening, encompassing various data-intensive applications like artificial intelligence, machine learning, secure encryption, Internet-of-Things, etc. A sustainable approach toward creating dedicated hardware platforms that can cater to multiple applications often requires the underlying hardware to context-switch or support more than one context simultaneously. This paper presents a context-switching and dual-context memory based on the standard 8T SRAM bit-cell. Specifically, we exploit the availability of multi-VT transistors by selectively choosing the read-port transistors of the 8T SRAM cell to be either high-VT or low-VT. The 8T SRAM cell is thus augmented to store ROM data (represented as the VT of the transistors constituting the read-port) while simultaneously storing RAM data. Further, we propose specific sensing methodologies such that the memory array can support RAM-only or ROM-only mode (context-switching (CS) mode) or RAM and ROM mode simultaneously (dual-context (DC) mode). Extensive Monte-Carlo simulations have verified the robustness of our proposed ROM-augmented CS/DC memory on the Globalfoundries 22nm-FDX technology node. △ Less

Submitted 6 April, 2023; originally announced April 2023.

Journal ref: GLSVLSI '23: Great Lakes Symposium on VLSI 2023 Proceedings

arXiv:2301.13005 [pdf]

Farm Environmental Data Analyzer using a Decentralised system and R

Authors: Aryan Bagade, Rupesh C. Jaiswal

Abstract: Data/Web Hosting is a service that lets enterprises or selves present their data on the internet that users can access. The firm providing such services are web/data host. Apart from that, such services require incessant support, and not everyone can afford a particular centralized data host service. The peer-to-peer(P2P) protocol, the Interplanetary file system(IPFS), is augmenting into a legitim… ▽ More Data/Web Hosting is a service that lets enterprises or selves present their data on the internet that users can access. The firm providing such services are web/data host. Apart from that, such services require incessant support, and not everyone can afford a particular centralized data host service. The peer-to-peer(P2P) protocol, the Interplanetary file system(IPFS), is augmenting into a legitimate alternative to traditional data and web hosting. This paper put forward a decentralized blockchain IPFS-based interactive manageable model, and the work presents an application and schematic that serves as a Proof of Concept(PoC) of decentralized blockchain technology that can be wielded to create an immutable record of energy and resource usage for further analysis and estimation of yield at a scale. IPFS hosts an immutable record that would be independently verifiable and available in perpetuity. First, having the user connect to the service through an IPFS node, then requesting the user upload their data. Then the data is uploaded, and a CID(Content Identification) and a QR code are returned to the user, who can then compute and visualize the results through the application. This system enables the novel application of decentralized data storage to capture, add and visualize yield and environmental data and track it further down the supply chain. △ Less

Submitted 17 December, 2022; originally announced January 2023.

arXiv:2212.10881 [pdf, other]

In-Sensor & Neuromorphic Computing are all you need for Energy Efficient Computer Vision

Authors: Gourav Datta, Zeyu Liu, Md Abdullah-Al Kaiser, Souvik Kundu, Joe Mathai, Zihan Yin, Ajey P. Jacob, Akhilesh R. Jaiswal, Peter A. Beerel

Abstract: Due to the high activation sparsity and use of accumulates (AC) instead of expensive multiply-and-accumulates (MAC), neuromorphic spiking neural networks (SNNs) have emerged as a promising low-power alternative to traditional DNNs for several computer vision (CV) applications. However, most existing SNNs require multiple time steps for acceptable inference accuracy, hindering real-time deployment… ▽ More Due to the high activation sparsity and use of accumulates (AC) instead of expensive multiply-and-accumulates (MAC), neuromorphic spiking neural networks (SNNs) have emerged as a promising low-power alternative to traditional DNNs for several computer vision (CV) applications. However, most existing SNNs require multiple time steps for acceptable inference accuracy, hindering real-time deployment and increasing spiking activity and, consequently, energy consumption. Recent works proposed direct encoding that directly feeds the analog pixel values in the first layer of the SNN in order to significantly reduce the number of time steps. Although the overhead for the first layer MACs with direct encoding is negligible for deep SNNs and the CV processing is efficient using SNNs, the data transfer between the image sensors and the downstream processing costs significant bandwidth and may dominate the total energy. To mitigate this concern, we propose an in-sensor computing hardware-software co-design framework for SNNs targeting image recognition tasks. Our approach reduces the bandwidth between sensing and processing by 12-96x and the resulting total energy by 2.32x compared to traditional CV processing, with a 3.8% reduction in accuracy on ImageNet. △ Less

Submitted 21 December, 2022; originally announced December 2022.

arXiv:2210.05451 [pdf, other]

Enabling ISP-less Low-Power Computer Vision

Authors: Gourav Datta, Zeyu Liu, Zihan Yin, Linyu Sun, Akhilesh R. Jaiswal, Peter A. Beerel

Abstract: In order to deploy current computer vision (CV) models on resource-constrained low-power devices, recent works have proposed in-sensor and in-pixel computing approaches that try to partly/fully bypass the image signal processor (ISP) and yield significant bandwidth reduction between the image sensor and the CV processing unit by downsampling the activation maps in the initial convolutional neural… ▽ More In order to deploy current computer vision (CV) models on resource-constrained low-power devices, recent works have proposed in-sensor and in-pixel computing approaches that try to partly/fully bypass the image signal processor (ISP) and yield significant bandwidth reduction between the image sensor and the CV processing unit by downsampling the activation maps in the initial convolutional neural network (CNN) layers. However, direct inference on the raw images degrades the test accuracy due to the difference in covariance of the raw images captured by the image sensors compared to the ISP-processed images used for training. Moreover, it is difficult to train deep CV models on raw images, because most (if not all) large-scale open-source datasets consist of RGB images. To mitigate this concern, we propose to invert the ISP pipeline, which can convert the RGB images of any dataset to its raw counterparts, and enable model training on raw images. We release the raw version of the COCO dataset, a large-scale benchmark for generic high-level vision tasks. For ISP-less CV systems, training on these raw images result in a 7.1% increase in test accuracy on the visual wake works (VWW) dataset compared to relying on training with traditional ISP-processed RGB datasets. To further improve the accuracy of ISP-less CV models and to increase the energy and bandwidth benefits obtained by in-sensor/in-pixel computing, we propose an energy-efficient form of analog in-pixel demosaicing that may be coupled with in-pixel CNN computations. When evaluated on raw images captured by real sensors from the PASCALRAW dataset, our approach results in a 8.1% increase in mAP. Lastly, we demonstrate a further 20.5% increase in mAP by using a novel application of few-shot learning with thirty shots each for the novel PASCALRAW dataset, constituting 3 classes. △ Less

Submitted 11 October, 2022; originally announced October 2022.

Comments: Accepted to WACV 2023

arXiv:2203.05696 [pdf, other]

Toward Efficient Hyperspectral Image Processing inside Camera Pixels

Authors: Gourav Datta, Zihan Yin, Ajey Jacob, Akhilesh R. Jaiswal, Peter A. Beerel

Abstract: Hyperspectral cameras generate a large amount of data due to the presence of hundreds of spectral bands as opposed to only three channels (red, green, and blue) in traditional cameras. This requires a significant amount of data transmission between the hyperspectral image sensor and a processor used to classify/detect/track the images, frame by frame, expending high energy and causing bandwidth an… ▽ More Hyperspectral cameras generate a large amount of data due to the presence of hundreds of spectral bands as opposed to only three channels (red, green, and blue) in traditional cameras. This requires a significant amount of data transmission between the hyperspectral image sensor and a processor used to classify/detect/track the images, frame by frame, expending high energy and causing bandwidth and security bottlenecks. To mitigate this problem, we propose a form of processing-in-pixel (PIP) that leverages advanced CMOS technologies to enable the pixel array to perform a wide range of complex operations required by the modern convolutional neural networks (CNN) for hyperspectral image recognition (HSI). Consequently, our PIP-optimized custom CNN layers effectively compress the input data, significantly reducing the bandwidth required to transmit the data downstream to the HSI processing unit. This reduces the average energy consumption associated with pixel array of cameras and the CNN processing unit by 25.06x and 3.90x respectively, compared to existing hardware implementations. Our custom models yield average test accuracies within 0.56% of the baseline models for the standard HSI benchmarks. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: 6 pages, 3 figures

arXiv:2203.04737 [pdf, other]

P2M: A Processing-in-Pixel-in-Memory Paradigm for Resource-Constrained TinyML Applications

Authors: Gourav Datta, Souvik Kundu, Zihan Yin, Ravi Teja Lakkireddy, Joe Mathai, Ajey Jacob, Peter A. Beerel, Akhilesh R. Jaiswal

Abstract: The demand to process vast amounts of data generated from state-of-the-art high resolution cameras has motivated novel energy-efficient on-device AI solutions. Visual data in such cameras are usually captured in the form of analog voltages by a sensor pixel array, and then converted to the digital domain for subsequent AI processing using analog-to-digital converters (ADC). Recent research has tri… ▽ More The demand to process vast amounts of data generated from state-of-the-art high resolution cameras has motivated novel energy-efficient on-device AI solutions. Visual data in such cameras are usually captured in the form of analog voltages by a sensor pixel array, and then converted to the digital domain for subsequent AI processing using analog-to-digital converters (ADC). Recent research has tried to take advantage of massively parallel low-power analog/digital computing in the form of near- and in-sensor processing, in which the AI computation is performed partly in the periphery of the pixel array and partly in a separate on-board CPU/accelerator. Unfortunately, high-resolution input images still need to be streamed between the camera and the AI processing unit, frame by frame, causing energy, bandwidth, and security bottlenecks. To mitigate this problem, we propose a novel Processing-in-Pixel-in-memory (P2M) paradigm, that customizes the pixel array by adding support for analog multi-channel, multi-bit convolution, batch normalization, and ReLU (Rectified Linear Units). Our solution includes a holistic algorithm-circuit co-design approach and the resulting P2M paradigm can be used as a drop-in replacement for embedding memory-intensive first few layers of convolutional neural network (CNN) models within foundry-manufacturable CMOS image sensor platforms. Our experimental results indicate that P2M reduces data transfer bandwidth from sensors and analog to digital conversions by ~21x, and the energy-delay product (EDP) incurred in processing a MobileNetV2 model on a TinyML use case for visual wake words dataset (VWW) by up to ~11x compared to standard near-processing or in-sensor implementations, without any significant drop in test accuracy. △ Less

Submitted 16 March, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

Comments: 15 pages, 8 figures

arXiv:2110.14242 [pdf, other]

Tight FPT Approximation for Constrained k-Center and k-Supplier

Authors: Dishant Goyal, Ragesh Jaiswal

Abstract: In this work, we study a range of constrained versions of the $k$-supplier and $k$-center problems such as: capacitated, fault-tolerant, fair, etc. These problems fall under a broad framework of constrained clustering. A unified framework for constrained clustering was proposed by Ding and Xu [SODA 2015] in context of the $k$-median and $k$-means objectives. In this work, we extend this framework… ▽ More In this work, we study a range of constrained versions of the $k$-supplier and $k$-center problems such as: capacitated, fault-tolerant, fair, etc. These problems fall under a broad framework of constrained clustering. A unified framework for constrained clustering was proposed by Ding and Xu [SODA 2015] in context of the $k$-median and $k$-means objectives. In this work, we extend this framework to the $k$-supplier and $k$-center objectives. This unified framework allows us to obtain results simultaneously for the following constrained versions of the $k$-supplier problem: $r$-gather, $r$-capacity, balanced, chromatic, fault-tolerant, strongly private, $\ell$-diversity, and fair $k$-supplier problems, with and without outliers. We obtain the following results: We give $3$ and $2$ approximation algorithms for the constrained $k$-supplier and $k$-center problems, respectively, with $\mathsf{FPT}$ running time $k^{O(k)} \cdot n^{O(1)}$, where $n = |C \cup L|$. Moreover, these approximation guarantees are tight; that is, for any constant $ε>0$, no algorithm can achieve $(3-ε)$ and $(2-ε)$ approximation guarantees for the constrained $k$-supplier and $k$-center problems in $\mathsf{FPT}$ time, assuming $\mathsf{FPT} \neq \mathsf{W}[2]$. Furthermore, we study these constrained problems in outlier setting. Our algorithm gives $3$ and $2$ approximation guarantees for the constrained outlier $k$-supplier and $k$-center problems, respectively, with $\mathsf{FPT}$ running time $(k+m)^{O(k)} \cdot n^{O(1)}$, where $n = |C \cup L|$ and $m$ is the number of outliers. △ Less

Submitted 27 October, 2021; originally announced October 2021.

arXiv:2109.14801 [pdf, other]

Benchmarking a Probabilistic Coprocessor

Authors: Jan Kaiser, Risi Jaiswal, Behtash Behin-Aein, Supriyo Datta

Abstract: Computation in the past decades has been driven by deterministic computers based on classical deterministic bits. Recently, alternative computing paradigms and domain-based computing like quantum computing and probabilistic computing have gained traction. While quantum computers based on q-bits utilize quantum effects to advance computation, probabilistic computers based on probabilistic (p-)bits… ▽ More Computation in the past decades has been driven by deterministic computers based on classical deterministic bits. Recently, alternative computing paradigms and domain-based computing like quantum computing and probabilistic computing have gained traction. While quantum computers based on q-bits utilize quantum effects to advance computation, probabilistic computers based on probabilistic (p-)bits are naturally suited to solve problems that require large amount of random numbers utilized in Monte Carlo and Markov Chain Monte Carlo algorithms. These Monte Carlo techniques are used to solve important problems in the fields of optimization, numerical integration or sampling from probability distributions. However, to efficiently implement Monte Carlo algorithms the generation of random numbers is crucial. In this paper, we present and benchmark a probabilistic coprocessor based on p-bits that are naturally suited to solve these problems. We present multiple examples and project that a nanomagnetic implementation of our probabilistic coprocessor can outperform classical CPU and GPU implementations by multiple orders of magnitude. △ Less

Submitted 29 September, 2021; originally announced September 2021.

arXiv:2107.11979 [pdf, other]

HYPER-SNN: Towards Energy-efficient Quantized Deep Spiking Neural Networks for Hyperspectral Image Classification

Authors: Gourav Datta, Souvik Kundu, Akhilesh R. Jaiswal, Peter A. Beerel

Abstract: Hyper spectral images (HSI) provide rich spectral and spatial information across a series of contiguous spectral bands. However, the accurate processing of the spectral and spatial correlation between the bands requires the use of energy-expensive 3-D Convolutional Neural Networks (CNNs). To address this challenge, we propose the use of Spiking Neural Networks (SNNs) that are generated from iso-ar… ▽ More Hyper spectral images (HSI) provide rich spectral and spatial information across a series of contiguous spectral bands. However, the accurate processing of the spectral and spatial correlation between the bands requires the use of energy-expensive 3-D Convolutional Neural Networks (CNNs). To address this challenge, we propose the use of Spiking Neural Networks (SNNs) that are generated from iso-architecture CNNs and trained with quantization-aware gradient descent to optimize their weights, membrane leak, and firing thresholds. During both training and inference, the analog pixel values of a HSI are directly applied to the input layer of the SNN without the need to convert to a spike-train. The reduced latency of our training technique combined with high activation sparsity yields significant improvements in computational efficiency. We evaluate our proposal using three HSI datasets on a 3-D and a 3-D/2-D hybrid convolutional architecture. We achieve overall accuracy, average accuracy, and kappa coefficient of 98.68%, 98.34%, and 98.20% respectively with 5 time steps (inference latency) and 6-bit weight quantization on the Indian Pines dataset. In particular, our models achieved accuracies similar to state-of-the-art (SOTA) with 560.6 and 44.8 times less compute energy on average over three HSI datasets than an iso-architecture full-precision and 6-bit quantized CNN, respectively. △ Less

Submitted 28 July, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

arXiv:2107.07342 [pdf, other]

Probabilistic analysis of solar cell optical performance using Gaussian processes

Authors: Rahul Jaiswal, Manel Martínez-Ramón, Tito Busani

Abstract: This work investigates application of different machine learning based prediction methodologies to estimate the performance of silicon based textured cells. Concept of confidence bound regions is introduced and advantages of this concept are discussed in detail. Results show that reflection profiles and depth dependent optical generation profiles can be accurately estimated using Gaussian processe… ▽ More This work investigates application of different machine learning based prediction methodologies to estimate the performance of silicon based textured cells. Concept of confidence bound regions is introduced and advantages of this concept are discussed in detail. Results show that reflection profiles and depth dependent optical generation profiles can be accurately estimated using Gaussian processes with exact knowledge of uncertainty in the prediction values.It is also shown that cell design parameters can be estimated for a desired performance metric. △ Less

Submitted 26 June, 2021; originally announced July 2021.

arXiv:2106.06755 [pdf, ps, other]

Tight FPT Approximation for Socially Fair Clustering

Authors: Dishant Goyal, Ragesh Jaiswal

Abstract: In this work, we study the socially fair $k$-median/$k$-means problem. We are given a set of points $P$ in a metric space $\mathcal{X}$ with a distance function $d(.,.)$. There are $\ell$ groups: $P_1,\dotsc,P_{\ell} \subseteq P$. We are also given a set $F$ of feasible centers in $\mathcal{X}$. The goal in the socially fair $k$-median problem is to find a set $C \subseteq F$ of $k$ centers that m… ▽ More In this work, we study the socially fair $k$-median/$k$-means problem. We are given a set of points $P$ in a metric space $\mathcal{X}$ with a distance function $d(.,.)$. There are $\ell$ groups: $P_1,\dotsc,P_{\ell} \subseteq P$. We are also given a set $F$ of feasible centers in $\mathcal{X}$. The goal in the socially fair $k$-median problem is to find a set $C \subseteq F$ of $k$ centers that minimizes the maximum average cost over all the groups. That is, find $C$ that minimizes the objective function $Φ(C,P) \equiv \max_{j} \Big\{ \sum_{x \in P_j} d(C,x)/|P_j| \Big\}$, where $d(C,x)$ is the distance of $x$ to the closest center in $C$. The socially fair $k$-means problem is defined similarly by using squared distances, i.e., $d^{2}(.,.)$ instead of $d(.,.)$. The current best approximation guarantee for both the problems is $O\left( \frac{\log \ell}{\log \log \ell} \right)$ due to Makarychev and Vakilian [COLT 2021]. In this work, we study the fixed parameter tractability of the problems with respect to parameter $k$. We design $(3+\varepsilon)$ and $(9 + \varepsilon)$ approximation algorithms for the socially fair $k$-median and $k$-means problems, respectively, in FPT (fixed parameter tractable) time $f(k,\varepsilon) \cdot n^{O(1)}$, where $f(k,\varepsilon) = (k/\varepsilon)^{{O}(k)}$ and $n = |P \cup F|$. Furthermore, we show that if Gap-ETH holds, then better approximation guarantees are not possible in FPT time. △ Less

Submitted 13 September, 2021; v1 submitted 12 June, 2021; originally announced June 2021.

Comments: The new version gives tight approximation results. However, the old version uses techniques that work in the streaming setting albeit at the cost of weaker approximation guarantees. So, readers interested in the streaming setting may want to see the older version

arXiv:2011.04221 [pdf, other]

doi 10.4230/LIPIcs.APPROX/RANDOM.2021.4

Hardness of Approximation of Euclidean $k$-Median

Authors: Anup Bhattacharya, Dishant Goyal, Ragesh Jaiswal

Abstract: The Euclidean $k$-median problem is defined in the following manner: given a set $\mathcal{X}$ of $n$ points in $\mathbb{R}^{d}$, and an integer $k$, find a set $C \subset \mathbb{R}^{d}$ of $k$ points (called centers) such that the cost function $Φ(C,\mathcal{X}) \equiv \sum_{x \in \mathcal{X}} \min_{c \in C} \|x-c\|_{2}$ is minimized. The Euclidean $k$-means problem is defined similarly by repla… ▽ More The Euclidean $k$-median problem is defined in the following manner: given a set $\mathcal{X}$ of $n$ points in $\mathbb{R}^{d}$, and an integer $k$, find a set $C \subset \mathbb{R}^{d}$ of $k$ points (called centers) such that the cost function $Φ(C,\mathcal{X}) \equiv \sum_{x \in \mathcal{X}} \min_{c \in C} \|x-c\|_{2}$ is minimized. The Euclidean $k$-means problem is defined similarly by replacing the distance with squared distance in the cost function. Various hardness of approximation results are known for the Euclidean $k$-means problem. However, no hardness of approximation results were known for the Euclidean $k$-median problem. In this work, assuming the unique games conjecture (UGC), we provide the first hardness of approximation result for the Euclidean $k$-median problem. Furthermore, we study the hardness of approximation for the Euclidean $k$-means/$k$-median problems in the bi-criteria setting where an algorithm is allowed to choose more than $k$ centers. That is, bi-criteria approximation algorithms are allowed to output $βk$ centers (for constant $β>1$) and the approximation ratio is computed with respect to the optimal $k$-means/$k$-median cost. In this setting, we show the first hardness of approximation result for the Euclidean $k$-median problem for any $β< 1.015$, assuming UGC. We also show a similar bi-criteria hardness of approximation result for the Euclidean $k$-means problem with a stronger bound of $β< 1.28$, again assuming UGC. △ Less

Submitted 9 November, 2020; originally announced November 2020.

arXiv:2007.11773 [pdf, other]

FPT Approximation for Constrained Metric $k$-Median/Means

Authors: Dishant Goyal, Ragesh Jaiswal, Amit Kumar

Abstract: The Metric $k$-median problem over a metric space $(\mathcal{X}, d)$ is defined as follows: given a set $L \subseteq \mathcal{X}$ of facility locations and a set $C \subseteq \mathcal{X}$ of clients, open a set $F \subseteq L$ of $k$ facilities such that the total service cost, defined as $Φ(F, C) \equiv \sum_{x \in C} \min_{f \in F} d(x, f)$, is minimised. The metric $k$-means problem is defined… ▽ More The Metric $k$-median problem over a metric space $(\mathcal{X}, d)$ is defined as follows: given a set $L \subseteq \mathcal{X}$ of facility locations and a set $C \subseteq \mathcal{X}$ of clients, open a set $F \subseteq L$ of $k$ facilities such that the total service cost, defined as $Φ(F, C) \equiv \sum_{x \in C} \min_{f \in F} d(x, f)$, is minimised. The metric $k$-means problem is defined similarly using squared distances. In many applications there are additional constraints that any solution needs to satisfy. This gives rise to different constrained versions of the problem such as $r$-gather, fault-tolerant, outlier $k$-means/$k$-median problem. Surprisingly, for many of these constrained problems, no constant-approximation algorithm is known. We give FPT algorithms with constant approximation guarantee for a range of constrained $k$-median/means problems. For some of the constrained problems, ours is the first constant factor approximation algorithm whereas for others, we improve or match the approximation guarantee of previous works. We work within the unified framework of Ding and Xu that allows us to simultaneously obtain algorithms for a range of constrained problems. In particular, we obtain a $(3+\varepsilon)$-approximation and $(9+\varepsilon)$-approximation for the constrained versions of the $k$-median and $k$-means problem respectively in FPT time. In many practical settings of the $k$-median/means problem, one is allowed to open a facility at any client location, i.e., $C \subseteq L$. For this special case, our algorithm gives a $(2+\varepsilon)$-approximation and $(4+\varepsilon)$-approximation for the constrained versions of $k$-median and $k$-means problem respectively in FPT time. Since our algorithm is based on simple sampling technique, it can also be converted to a constant-pass log-space streaming algorithm. △ Less

Submitted 22 July, 2020; originally announced July 2020.

arXiv:1909.11744 [pdf, ps, other]

Streaming PTAS for Binary $\ell_0$-Low Rank Approximation

Authors: Anup Bhattacharya, Dishant Goyal, Ragesh Jaiswal, Amit Kumar

Abstract: We give a 3-pass, polylog-space streaming PTAS for the constrained binary $k$-means problem and a 4-pass, polylog-space streaming PTAS for the binary $\ell_0$-low rank approximation problem. The connection between the above two problems has recently been studied. We design a streaming PTAS for the former and use this connection to obtain streaming PTAS for the latter. This is the first constant pa… ▽ More We give a 3-pass, polylog-space streaming PTAS for the constrained binary $k$-means problem and a 4-pass, polylog-space streaming PTAS for the binary $\ell_0$-low rank approximation problem. The connection between the above two problems has recently been studied. We design a streaming PTAS for the former and use this connection to obtain streaming PTAS for the latter. This is the first constant pass, polylog-space streaming algorithm for either of the two problems. △ Less

Submitted 25 September, 2019; originally announced September 2019.

arXiv:1909.07515 [pdf, ps, other]

Multiplicative Rank-1 Approximation using Length-Squared Sampling

Authors: Ragesh Jaiswal, Amit Kumar

Abstract: We show that the span of $Ω(\frac{1}{\varepsilon^4})$ rows of any matrix $A \subset \mathbb{R}^{n \times d}$ sampled according to the length-squared distribution contains a rank-$1$ matrix $\tilde{A}$ such that $||A - \tilde{A}||_F^2 \leq (1 + \varepsilon) \cdot ||A - π_1(A)||_F^2$, where $π_1(A)$ denotes the best rank-$1$ approximation of $A$ under the Frobenius norm. Length-squared sampling has… ▽ More We show that the span of $Ω(\frac{1}{\varepsilon^4})$ rows of any matrix $A \subset \mathbb{R}^{n \times d}$ sampled according to the length-squared distribution contains a rank-$1$ matrix $\tilde{A}$ such that $||A - \tilde{A}||_F^2 \leq (1 + \varepsilon) \cdot ||A - π_1(A)||_F^2$, where $π_1(A)$ denotes the best rank-$1$ approximation of $A$ under the Frobenius norm. Length-squared sampling has previously been used in the context of rank-$k$ approximation. However, the approximation obtained was additive in nature. We obtain a multiplicative approximation albeit only for rank-$1$ approximation. △ Less

Submitted 28 October, 2019; v1 submitted 16 September, 2019; originally announced September 2019.

Comments: A section on open problems added in the new version

arXiv:1909.07511 [pdf, other]

Streaming PTAS for Constrained k-Means

Authors: Dishant Goyal, Ragesh Jaiswal, Amit Kumar

Abstract: We generalise the results of Bhattacharya et al. (Journal of Computing Systems, 62(1):93-115, 2018) for the list-$k$-means problem defined as -- for a (unknown) partition $X_1, ..., X_k$ of the dataset $X \subseteq \mathbb{R}^d$, find a list of $k$-center sets (each element in the list is a set of $k$ centers) such that at least one of $k$-center sets $\{c_1, ..., c_k\}$ in the list gives an… ▽ More We generalise the results of Bhattacharya et al. (Journal of Computing Systems, 62(1):93-115, 2018) for the list-$k$-means problem defined as -- for a (unknown) partition $X_1, ..., X_k$ of the dataset $X \subseteq \mathbb{R}^d$, find a list of $k$-center sets (each element in the list is a set of $k$ centers) such that at least one of $k$-center sets $\{c_1, ..., c_k\}$ in the list gives an $(1+\varepsilon)$-approximation with respect to the cost function $\min_{\textrm{permutation } π} \left[ \sum_{i=1}^{k} \sum_{x \in X_i} ||x - c_{π(i)}||^2 \right]$. The list-$k$-means problem is important for the constrained $k$-means problem since algorithms for the former can be converted to PTAS for various versions of the latter. Following are the consequences of our generalisations: - Streaming algorithm: Our $D^2$-sampling based algorithm running in a single iteration allows us to design a 2-pass, logspace streaming algorithm for the list-$k$-means problem. This can be converted to a 4-pass, logspace streaming PTAS for various constrained versions of the $k$-means problem. - Faster PTAS under stability: Our generalisation is also useful in $k$-means clustering scenarios where finding good centers becomes easy once good centers for a few "bad" clusters have been chosen. One such scenario is clustering under stability where the number of such bad clusters is a constant. Using the above idea, we significantly improve the running time of the known algorithm from $O(dn^3) (k \log{n})^{poly(\frac{1}β, \frac{1}{\varepsilon})}$ to $O \left(dn^3 k^{\tilde{O}_{β\varepsilon}(\frac{1}{β\varepsilon})} \right)$. △ Less

Submitted 18 February, 2020; v1 submitted 16 September, 2019; originally announced September 2019.

Comments: Changes from previous version: (i) added discussion on coreset, and (ii) fixed few typos

arXiv:1907.09664 [pdf, other]

doi 10.1109/ACCESS.2020.3018682

Autonomous Probabilistic Coprocessing with Petaflips per Second

Authors: Brian Sutton, Rafatul Faria, Lakshmi A. Ghantasala, Risi Jaiswal, Kerem Y. Camsari, Supriyo Datta

Abstract: In this paper we present a concrete design for a probabilistic (p-) computer based on a network of p-bits, robust classical entities fluctuating between -1 and +1, with probabilities that are controlled through an input constructed from the outputs of other p-bits. The architecture of this probabilistic computer is similar to a stochastic neural network with the p-bit playing the role of a binary… ▽ More In this paper we present a concrete design for a probabilistic (p-) computer based on a network of p-bits, robust classical entities fluctuating between -1 and +1, with probabilities that are controlled through an input constructed from the outputs of other p-bits. The architecture of this probabilistic computer is similar to a stochastic neural network with the p-bit playing the role of a binary stochastic neuron, but with one key difference: there is no sequencer used to enforce an ordering of p-bit updates, as is typically required. Instead, we explore \textit{sequencerless} designs where all p-bits are allowed to flip autonomously and demonstrate that such designs can allow ultrafast operation unconstrained by available clock speeds without compromising the solution's fidelity. Based on experimental results from a hardware benchmark of the autonomous design and benchmarked device models, we project that a nanomagnetic implementation can scale to achieve petaflips per second with millions of neurons. A key contribution of this paper is the focus on a hardware metric $-$ flips per second $-$ as a problem and substrate-independent figure-of-merit for an emerging class of hardware annealers known as Ising Machines. Much like the shrinking feature sizes of transistors that have continually driven Moore's Law, we believe that flips per second can be continually improved in later technology generations of a wide class of probabilistic, domain specific hardware. △ Less

Submitted 22 August, 2020; v1 submitted 22 July, 2019; originally announced July 2019.

Comments: 13 pages, 8 figures, 1 table

Journal ref: IEEE Access (2020)

arXiv:1812.03385 [pdf]

Biometric Recognition System (Algorithm)

Authors: Rahul Kumar Jaiswal, Gaurav Saxena

Abstract: Fingerprints are the most widely deployed form of biometric identification. No two individuals share the same fingerprint because they have unique biometric identifiers. This paper presents an efficient fingerprint verification algorithm which improves matching accuracy. Fingerprint images get degraded and corrupted due to variations in skin and impression conditions. Thus, image enhancement techn… ▽ More Fingerprints are the most widely deployed form of biometric identification. No two individuals share the same fingerprint because they have unique biometric identifiers. This paper presents an efficient fingerprint verification algorithm which improves matching accuracy. Fingerprint images get degraded and corrupted due to variations in skin and impression conditions. Thus, image enhancement techniques are employed prior to singular point detection and minutiae extraction. Singular point is the point of maximum curvature. It is determined by the normal of each fingerprint ridge, and then following them inward towards the centre. The local ridge features known as minutiae is extracted using cross-number method to find ridge endings and ridge bifurcations. The proposed algorithm chooses a radius and draws a circle with core point as centre, making fingerprint images rotationally invariant and uniform. The radius can be varied according to the accuracy depending on the particular application. Morphological techniques such as clean, spur and H-break is employed to remove noise, followed by removing spurious minutiae. Templates are created based on feature vector extraction and databases are made for verification and identification for the fingerprint images taken from Fingerprint Verification Competition (FVC2002). Minimum Euclidean distance is calculated between saved template and the test fingerprint image template and compared with the set threshold for matching decision. For the performance evaluation of the proposed algorithm various measures, equal error rate (EER), Dmin at EER, accuracy and threshold are evaluated and plotted. The measures demonstrate that the proposed algorithm is more effective and robust. △ Less

Submitted 8 December, 2018; originally announced December 2018.

Comments: Conference

arXiv:1712.06865 [pdf, ps, other]

Approximate Correlation Clustering Using Same-Cluster Queries

Authors: Nir Ailon, Anup Bhattacharya, Ragesh Jaiswal

Abstract: Ashtiani et al. (NIPS 2016) introduced a semi-supervised framework for clustering (SSAC) where a learner is allowed to make same-cluster queries. More specifically, in their model, there is a query oracle that answers queries of the form given any two vertices, do they belong to the same optimal cluster?. Ashtiani et al. showed the usefulness of such a query framework by giving a polynomial time a… ▽ More Ashtiani et al. (NIPS 2016) introduced a semi-supervised framework for clustering (SSAC) where a learner is allowed to make same-cluster queries. More specifically, in their model, there is a query oracle that answers queries of the form given any two vertices, do they belong to the same optimal cluster?. Ashtiani et al. showed the usefulness of such a query framework by giving a polynomial time algorithm for the k-means clustering problem where the input dataset satisfies some separation condition. Ailon et al. extended the above work to the approximation setting by giving an efficient (1+\eps)-approximation algorithm for k-means for any small \eps > 0 and any dataset within the SSAC framework. In this work, we extend this line of study to the correlation clustering problem. Correlation clustering is a graph clustering problem where pairwise similarity (or dissimilarity) information is given for every pair of vertices and the objective is to partition the vertices into clusters that minimise the disagreement (or maximises agreement) with the pairwise information given as input. These problems are popularly known as MinDisAgree and MaxAgree problems, and MinDisAgree[k] and MaxAgree[k] are versions of these problems where the number of optimal clusters is at most k. There exist Polynomial Time Approximation Schemes (PTAS) for MinDisAgree[k] and MaxAgree[k] where the approximation guarantee is (1+\eps) for any small \eps and the running time is polynomial in the input parameters but exponential in k and 1/\eps. We obtain an (1+\eps)-approximation algorithm for any small \eps with running time that is polynomial in the input parameters and also in k and 1/\eps. We also give non-trivial upper and lower bounds on the number of same-cluster queries, the lower bound being based on the Exponential Time Hypothesis (ETH). △ Less

Submitted 19 December, 2017; originally announced December 2017.

Comments: To appear in LATIN 2018

arXiv:1704.05232 [pdf, other]

On the k-Means/Median Cost Function

Authors: Anup Bhattacharya, Yoav Freund, Ragesh Jaiswal

Abstract: In this work, we study the $k$-means cost function. Given a dataset $X \subseteq \mathbb{R}^d$ and an integer $k$, the goal of the Euclidean $k$-means problem is to find a set of $k$ centers $C \subseteq \mathbb{R}^d$ such that $Φ(C, X) \equiv \sum_{x \in X} \min_{c \in C} ||x - c||^2$ is minimized. Let $Δ(X,k) \equiv \min_{C \subseteq \mathbb{R}^d} Φ(C, X)$ denote the cost of the optimal $k$-mean… ▽ More In this work, we study the $k$-means cost function. Given a dataset $X \subseteq \mathbb{R}^d$ and an integer $k$, the goal of the Euclidean $k$-means problem is to find a set of $k$ centers $C \subseteq \mathbb{R}^d$ such that $Φ(C, X) \equiv \sum_{x \in X} \min_{c \in C} ||x - c||^2$ is minimized. Let $Δ(X,k) \equiv \min_{C \subseteq \mathbb{R}^d} Φ(C, X)$ denote the cost of the optimal $k$-means solution. For any dataset $X$, $Δ(X,k)$ decreases as $k$ increases. In this work, we try to understand this behaviour more precisely. For any dataset $X \subseteq \mathbb{R}^d$, integer $k \geq 1$, and a precision parameter $\varepsilon > 0$, let $L(X, k, \varepsilon)$ denote the smallest integer such that $Δ(X, L(X, k, \varepsilon)) \leq \varepsilon \cdot Δ(X,k)$. We show upper and lower bounds on this quantity. Our techniques generalize for the metric $k$-median problem in arbitrary metric spaces and we give bounds in terms of the doubling dimension of the metric. Finally, we observe that for any dataset $X$, we can compute a set $S$ of size $O \left(L(X, k, \varepsilon/c) \right)$ using $D^2$-sampling such that $Φ(S,X) \leq \varepsilon \cdot Δ(X,k)$ for some fixed constant $c$. We also discuss some applications of our bounds. △ Less

Submitted 9 September, 2021; v1 submitted 18 April, 2017; originally announced April 2017.

Comments: This update includes minor improvements and a new section on Dimension Estimation

ACM Class: I.5.3; H.3.3; F.2

arXiv:1704.01862 [pdf, ps, other]

Approximate Clustering with Same-Cluster Queries

Authors: Nir Ailon, Anup Bhattacharya, Ragesh Jaiswal, Amit Kumar

Abstract: Ashtiani et al. proposed a Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to make adaptive queries to a domain expert. The queries are of the kind "do two given points belong to the same optimal cluster?" There are many clustering contexts where such same-cluster queries are feasible. Ashtiani et al. exhibited the power of such queries by showing that any instance… ▽ More Ashtiani et al. proposed a Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to make adaptive queries to a domain expert. The queries are of the kind "do two given points belong to the same optimal cluster?" There are many clustering contexts where such same-cluster queries are feasible. Ashtiani et al. exhibited the power of such queries by showing that any instance of the $k$-means clustering problem, with additional margin assumption, can be solved efficiently if one is allowed $O(k^2 \log{k} + k \log{n})$ same-cluster queries. This is interesting since the $k$-means problem, even with the margin assumption, is $\mathsf{NP}$-hard. In this paper, we extend the work of Ashtiani et al. to the approximation setting showing that a few of such same-cluster queries enables one to get a polynomial-time $(1 + \varepsilon)$-approximation algorithm for the $k$-means problem without any margin assumption on the input dataset. Again, this is interesting since the $k$-means problem is $\mathsf{NP}$-hard to approximate within a factor $(1 + c)$ for a fixed constant $0 < c < 1$. The number of same-cluster queries used is $\textrm{poly}(k/\varepsilon)$ which is independent of the size $n$ of the dataset. Our algorithm is based on the $D^2$-sampling technique. We also give a conditional lower bound on the number of same-cluster queries showing that if the Exponential Time Hypothesis (ETH) holds, then any such efficient query algorithm needs to make $Ω\left(\frac{k}{poly \log k} \right)$ same-cluster queries. Our algorithm can be extended for the case when the oracle is faulty. Another result we show with respect to the $k$-means++ seeding algorithm is that a small modification to the $k$-means++ seeding algorithm within the SSAC framework converts it to a constant factor approximation algorithm instead of the well known $O(\log{k})$-approximation algorithm. △ Less

Submitted 4 October, 2017; v1 submitted 6 April, 2017; originally announced April 2017.

Comments: Updated version has results for faulty queries

arXiv:1504.02564 [pdf, ps, other]

Faster Algorithms for the Constrained k-means Problem

Authors: Anup Bhattacharya, Ragesh Jaiswal, Amit Kumar

Abstract: The classical center based clustering problems such as $k$-means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise in machine learning where the optimal clusters do not follow such a locality property. Consider a variant of the $k$-means problem that may be regarded as a ge… ▽ More The classical center based clustering problems such as $k$-means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise in machine learning where the optimal clusters do not follow such a locality property. Consider a variant of the $k$-means problem that may be regarded as a general version of such problems. Here, the optimal clusters $O_1, ..., O_k$ are an arbitrary partition of the dataset and the goal is to output $k$-centers $c_1, ..., c_k$ such that the objective function $\sum_{i=1}^{k} \sum_{x \in O_{i}} ||x - c_{i}||^2$ is minimized. It is not difficult to argue that any algorithm (without knowing the optimal clusters) that outputs a single set of $k$ centers, will not behave well as far as optimizing the above objective function is concerned. However, this does not rule out the existence of algorithms that output a list of such $k$ centers such that at least one of these $k$ centers behaves well. Given an error parameter $\varepsilon > 0$, let $\ell$ denote the size of the smallest list of $k$-centers such that at least one of the $k$-centers gives a $(1+\varepsilon)$ approximation w.r.t. the objective function above. In this paper, we show an upper bound on $\ell$ by giving a randomized algorithm that outputs a list of $2^{\tilde{O}(k/\varepsilon)}$ $k$-centers. We also give a closely matching lower bound of $2^{\tildeΩ(k/\sqrt{\varepsilon})}$. Moreover, our algorithm runs in time $O \left(n d \cdot 2^{\tilde{O}(k/\varepsilon)} \right)$. This is a significant improvement over the previous result of Ding and Xu who gave an algorithm with running time $O \left(n d \cdot (\log{n})^{k} \cdot 2^{poly(k/\varepsilon)} \right)$ and output a list of size $O \left((\log{n})^k \cdot 2^{poly(k/\varepsilon)} \right)$. △ Less

Submitted 10 April, 2015; originally announced April 2015.

arXiv:1407.1689 [pdf, other]

Sampling in Space Restricted Settings

Authors: Anup Bhattacharya, Davis Issac, Ragesh Jaiswal, Amit Kumar

Abstract: Space efficient algorithms play a central role in dealing with large amount of data. In such settings, one would like to analyse the large data using small amount of "working space". One of the key steps in many algorithms for analysing large data is to maintain a (or a small number) random sample from the data points. In this paper, we consider two space restricted settings -- (i) streaming model… ▽ More Space efficient algorithms play a central role in dealing with large amount of data. In such settings, one would like to analyse the large data using small amount of "working space". One of the key steps in many algorithms for analysing large data is to maintain a (or a small number) random sample from the data points. In this paper, we consider two space restricted settings -- (i) streaming model, where data arrives over time and one can use only a small amount of storage, and (ii) query model, where we can structure the data in low space and answer sampling queries. In this paper, we prove the following results in above two settings: - In the streaming setting, we would like to maintain a random sample from the elements seen so far. We prove that one can maintain a random sample using $O(\log n)$ random bits and $O(\log n)$ space, where $n$ is the number of elements seen so far. We can extend this to the case when elements have weights as well. - In the query model, there are $n$ elements with weights $w_1, ..., w_n$ (which are $w$-bit integers) and one would like to sample a random element with probability proportional to its weight. Bringmann and Larsen (STOC 2013) showed how to sample such an element using $nw +1 $ space (whereas, the information theoretic lower bound is $n w$). We consider the approximate sampling problem, where we are given an error parameter $\varepsilon$, and the sampling probability of an element can be off by an $\varepsilon$ factor. We give matching upper and lower bounds for this problem. △ Less

Submitted 15 January, 2015; v1 submitted 7 July, 2014; originally announced July 2014.

arXiv:1404.5169 [pdf, ps, other]

A note on the relation between XOR and Selective XOR Lemmas

Authors: Ragesh Jaiswal

Abstract: Given an unpredictable Boolean function $f: \{0, 1\}^n \rightarrow \{0, 1\}$, the standard Yao's XOR lemma is a statement about the unpredictability of computing $\oplus_{i \in [k]}f(x_i)$ given $x_1, ..., x_k \in \{0, 1\}^n$, whereas the Selective XOR lemma is a statement about the unpredictability of computing $\oplus_{i \in S}f(x_i)$ given $x_1, ..., x_k \in \{0, 1\}^n$ and… ▽ More Given an unpredictable Boolean function $f: \{0, 1\}^n \rightarrow \{0, 1\}$, the standard Yao's XOR lemma is a statement about the unpredictability of computing $\oplus_{i \in [k]}f(x_i)$ given $x_1, ..., x_k \in \{0, 1\}^n$, whereas the Selective XOR lemma is a statement about the unpredictability of computing $\oplus_{i \in S}f(x_i)$ given $x_1, ..., x_k \in \{0, 1\}^n$ and $S \subseteq \{1, ..., k\}$. We give a reduction from the Selective XOR lemma to the standard XOR lemma. Our reduction gives better quantitative bounds for certain choice of parameters and does not require the assumption of being able to sample $(x, f(x))$ pairs. △ Less

Submitted 15 August, 2019; v1 submitted 21 April, 2014; originally announced April 2014.

Comments: The previous version has been significantly simplified to highlight the main result

arXiv:1401.3685 [pdf, ps, other]

Improved analysis of D2-sampling based PTAS for k-means and other Clustering problems

Authors: Ragesh Jaiswal, Mehul Kumar, Pulkit Yadav

Abstract: We give an improved analysis of the simple $D^2$-sampling based PTAS for the $k$-means clustering problem given by Jaiswal, Kumar, and Sen (Algorithmica, 2013). The improvement on the running time is from $O\left(nd \cdot 2^{\tilde{O}(k^2/ε)}\right)$ to $O\left(nd \cdot 2^{\tilde{O}(k/ε)}\right)$. We give an improved analysis of the simple $D^2$-sampling based PTAS for the $k$-means clustering problem given by Jaiswal, Kumar, and Sen (Algorithmica, 2013). The improvement on the running time is from $O\left(nd \cdot 2^{\tilde{O}(k^2/ε)}\right)$ to $O\left(nd \cdot 2^{\tilde{O}(k/ε)}\right)$. △ Less

Submitted 15 January, 2014; originally announced January 2014.

Comments: arXiv admin note: substantial text overlap with arXiv:1201.4206

arXiv:1401.2912 [pdf, other]

A tight lower bound instance for k-means++ in constant dimension

Authors: Anup Bhattacharya, Ragesh Jaiswal, Nir Ailon

Abstract: The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial $k$ centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: Pick the first center randomly from the given points. For $i > 1$, pick a point to be the $i^{th}$ center with probability proportional to the square of the Euclidean di… ▽ More The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial $k$ centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: Pick the first center randomly from the given points. For $i > 1$, pick a point to be the $i^{th}$ center with probability proportional to the square of the Euclidean distance of this point to the closest previously $(i-1)$ chosen centers. The k-means++ seeding algorithm is not only simple and fast but also gives an $O(\log{k})$ approximation in expectation as shown by Arthur and Vassilvitskii. There are datasets on which this seeding algorithm gives an approximation factor of $Ω(\log{k})$ in expectation. However, it is not clear from these results if the algorithm achieves good approximation factor with reasonably high probability (say $1/poly(k)$). Brunsch and Röglin gave a dataset where the k-means++ seeding algorithm achieves an $O(\log{k})$ approximation ratio with probability that is exponentially small in $k$. However, this and all other known lower-bound examples are high dimensional. So, an open problem was to understand the behavior of the algorithm on low dimensional datasets. In this work, we give a simple two dimensional dataset on which the seeding algorithm achieves an $O(\log{k})$ approximation ratio with probability exponentially small in $k$. This solves open problems posed by Mahajan et al. and by Brunsch and Röglin. △ Less

Submitted 13 January, 2014; v1 submitted 13 January, 2014; originally announced January 2014.

Comments: To appear in TAMC 2014. arXiv admin note: text overlap with arXiv:1306.4207

arXiv:1308.1351

An $O^*(1.0821^n)$-Time Algorithm for Computing Maximum Independent Set in Graphs with Bounded Degree 3

Authors: Davis Issac, Ragesh Jaiswal

Abstract: We give an $O^*(1.0821^n)$-time, polynomial space algorithm for computing Maximum Independent Set in graphs with bounded degree 3. This improves all the previous running time bounds known for the problem. We give an $O^*(1.0821^n)$-time, polynomial space algorithm for computing Maximum Independent Set in graphs with bounded degree 3. This improves all the previous running time bounds known for the problem. △ Less

Submitted 17 June, 2022; v1 submitted 6 August, 2013; originally announced August 2013.

Comments: While working on an updated version, we observed a bug in one of the cases of our extensive case analysis. We are withdrawing this paper while we work to fix the bug. We will add an updated version once we manage to fix the bug

arXiv:1306.4207 [pdf, other]

A bad 2-dimensional instance for k-means++

Authors: Ragesh Jaiswal, Prachi Jain, Saumya Yadav

Abstract: The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial $k$ centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: {quote} Pick the first center randomly from among the given points. For $i > 1$, pick a point to be the $i^{th}$ center with probability proportional to the square of th… ▽ More The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial $k$ centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: {quote} Pick the first center randomly from among the given points. For $i > 1$, pick a point to be the $i^{th}$ center with probability proportional to the square of the Euclidean distance of this point to the previously $(i-1)$ chosen centers. {quote} The k-means++ seeding algorithm is not only simple and fast but gives an $O(\log{k})$ approximation in expectation as shown by Arthur and Vassilvitskii \cite{av07}. There are datasets \cite{av07,adk09} on which this seeding algorithm gives an approximation factor $Ω(\log{k})$ in expectation. However, it is not clear from these results if the algorithm achieves good approximation factor with reasonably large probability (say $1/poly(k)$). Brunsch and Röglin \cite{br11} gave a dataset where the k-means++ seeding algorithm achieves an approximation ratio of $(2/3 - ε)\cdot \log{k}$ only with probability that is exponentially small in $k$. However, this and all other known {\em lower-bound examples} \cite{av07,adk09} are high dimensional. So, an open problem is to understand the behavior of the algorithm on low dimensional datasets. In this work, we give a simple two dimensional dataset on which the seeding algorithm achieves an approximation ratio $c$ (for some universal constant $c$) only with probability exponentially small in $k$. This is the first step towards solving open problems posed by Mahajan et al \cite{mnv12} and by Brunsch and Röglin \cite{br11}. △ Less

Submitted 18 June, 2013; originally announced June 2013.

arXiv:1305.5750 [pdf]

doi 10.5121/ijbb.2013.3103

Reconstruction and Analysis of Cancer-specific Gene Regulatory Networks from Gene Expression Profiles

Authors: Khalid Raza, Rajni Jaiswal

Abstract: The main goal of Systems Biology research is to reconstruct biological networks for its topological analysis so that reconstructed networks can be used for the identification of various kinds of disease. The availability of high-throughput data generated by microarray experiments fueled researchers to use whole-genome gene expression profiles to understand cancer and to reconstruct key cancer-spec… ▽ More The main goal of Systems Biology research is to reconstruct biological networks for its topological analysis so that reconstructed networks can be used for the identification of various kinds of disease. The availability of high-throughput data generated by microarray experiments fueled researchers to use whole-genome gene expression profiles to understand cancer and to reconstruct key cancer-specific gene regulatory network. Now, the researchers are taking a keen interest in the development of algorithm for the reconstruction of gene regulatory network from whole genome expression profiles. In this study, a cancer-specific gene regulatory network (prostate cancer) has been constructed using a simple and novel statistics based approach. First, significant genes differentially expressing them self in the disease condition has been identified using a two-stage filtering approach t-test and fold-change measure. Next, regulatory relationships between the identified genes has been computed using Pearson correlation coefficient. The obtained results has been validated with the available databases and literature. We obtained a cancer-specific regulatory network of 29 genes with a total of 55 regulatory relations in which some of the genes has been identified as hub genes that can act as drug target for the cancer diagnosis. △ Less

Submitted 30 June, 2013; v1 submitted 23 May, 2013; originally announced May 2013.

Comments: 10 pages, 1 figure, 2 tables

Journal ref: International Journal on Bioinformatics & Biosciences (IJBB), 3(2):25-34, June 2013

arXiv:1202.6680 [pdf, other]

On the Distribution of the Fourier Spectrum of Halfspaces

Authors: Ilias Diakonikolas, Ragesh Jaiswal, Rocco A. Servedio, Li-Yang Tan, Andrew Wan

Abstract: Bourgain showed that any noise stable Boolean function $f$ can be well-approximated by a junta. In this note we give an exponential sharpening of the parameters of Bourgain's result under the additional assumption that $f$ is a halfspace. Bourgain showed that any noise stable Boolean function $f$ can be well-approximated by a junta. In this note we give an exponential sharpening of the parameters of Bourgain's result under the additional assumption that $f$ is a halfspace. △ Less

Submitted 29 February, 2012; originally announced February 2012.

arXiv:1201.4206 [pdf, ps, other]

A simple D^2-sampling based PTAS for k-means and other Clustering Problems

Authors: Ragesh Jaiswal, Amit Kumar, Sandeep Sen

Abstract: Given a set of points $P \subset \mathbb{R}^d$, the $k$-means clustering problem is to find a set of $k$ {\em centers} $C = \{c_1,...,c_k\}, c_i \in \mathbb{R}^d,$ such that the objective function $\sum_{x \in P} d(x,C)^2$, where $d(x,C)$ denotes the distance between $x$ and the closest center in $C$, is minimized. This is one of the most prominent objective functions that have been studied with r… ▽ More Given a set of points $P \subset \mathbb{R}^d$, the $k$-means clustering problem is to find a set of $k$ {\em centers} $C = \{c_1,...,c_k\}, c_i \in \mathbb{R}^d,$ such that the objective function $\sum_{x \in P} d(x,C)^2$, where $d(x,C)$ denotes the distance between $x$ and the closest center in $C$, is minimized. This is one of the most prominent objective functions that have been studied with respect to clustering. $D^2$-sampling \cite{ArthurV07} is a simple non-uniform sampling technique for choosing points from a set of points. It works as follows: given a set of points $P \subseteq \mathbb{R}^d$, the first point is chosen uniformly at random from $P$. Subsequently, a point from $P$ is chosen as the next sample with probability proportional to the square of the distance of this point to the nearest previously sampled points. $D^2$-sampling has been shown to have nice properties with respect to the $k$-means clustering problem. Arthur and Vassilvitskii \cite{ArthurV07} show that $k$ points chosen as centers from $P$ using $D^2$-sampling gives an $O(\log{k})$ approximation in expectation. Ailon et. al. \cite{AJMonteleoni09} and Aggarwal et. al. \cite{AggarwalDK09} extended results of \cite{ArthurV07} to show that $O(k)$ points chosen as centers using $D^2$-sampling give $O(1)$ approximation to the $k$-means objective function with high probability. In this paper, we further demonstrate the power of $D^2$-sampling by giving a simple randomized $(1 + ε)$-approximation algorithm that uses the $D^2$-sampling in its core. △ Less

Submitted 20 January, 2012; originally announced January 2012.

ACM Class: I.5.3

arXiv:0902.3757 [pdf, ps, other]

Bounded Independence Fools Halfspaces

Authors: Ilias Diakonikolas, Parikshit Gopalan, Ragesh Jaiswal, Rocco Servedio, Emanuele Viola

Abstract: We show that any distribution on {-1,1}^n that is k-wise independent fools any halfspace h with error \eps for k = O(\log^2(1/\eps) /\eps^2). Up to logarithmic factors, our result matches a lower bound by Benjamini, Gurel-Gurevich, and Peled (2007) showing that k = Ω(1/(\eps^2 \cdot \log(1/\eps))). Using standard constructions of k-wise independent distributions, we obtain the first explicit pse… ▽ More We show that any distribution on {-1,1}^n that is k-wise independent fools any halfspace h with error \eps for k = O(\log^2(1/\eps) /\eps^2). Up to logarithmic factors, our result matches a lower bound by Benjamini, Gurel-Gurevich, and Peled (2007) showing that k = Ω(1/(\eps^2 \cdot \log(1/\eps))). Using standard constructions of k-wise independent distributions, we obtain the first explicit pseudorandom generators G: {-1,1}^s --> {-1,1}^n that fool halfspaces. Specifically, we fool halfspaces with error eps and seed length s = k \log n = O(\log n \cdot \log^2(1/\eps) /\eps^2). Our approach combines classical tools from real approximation theory with structural results on halfspaces by Servedio (Computational Complexity 2007). △ Less

Submitted 21 February, 2009; originally announced February 2009.

Showing 1–46 of 46 results for author: Jaiswal, R