Search | arXiv e-print repository

Optimal Workload Placement on Multi-Instance GPUs

Authors: Bekir Turkkan, Pavankumar Murali, Pavithra Harsha, Rohan Arora, Gerard Vanloo, Chandra Narayanaswami

Abstract: There is an urgent and pressing need to optimize usage of Graphical Processing Units (GPUs), which have arguably become one of the most expensive and sought after IT resources. To help with this goal, several of the current generation of GPUs support a partitioning feature, called Multi-Instance GPU (MIG) to allow multiple workloads to share a GPU, albeit with some constraints. In this paper we in… ▽ More There is an urgent and pressing need to optimize usage of Graphical Processing Units (GPUs), which have arguably become one of the most expensive and sought after IT resources. To help with this goal, several of the current generation of GPUs support a partitioning feature, called Multi-Instance GPU (MIG) to allow multiple workloads to share a GPU, albeit with some constraints. In this paper we investigate how to optimize the placement of Large Language Model (LLM)-based AI Inferencing workloads on GPUs. We first identify and present several use cases that are encountered in practice that require workloads to be efficiently placed or migrated to other GPUs to make room for incoming workloads. The overarching goal is to use as few GPUs as possible and to further minimize memory and compute wastage on GPUs that are utilized. We have developed two approaches to address this problem: an optimization method and a heuristic method. We benchmark these with two workload scheduling heuristics for multiple use cases. Our results show up to 2.85x improvement in the number of GPUs used and up to 70% reduction in GPU wastage over baseline heuristics. We plan to enable the SRE community to leverage our proposed method in production environments. △ Less

Submitted 10 September, 2024; originally announced September 2024.

Comments: 14 pages

arXiv:2307.16254 [pdf, other]

Touch if it's transparent! ACTOR: Active Tactile-based Category-Level Transparent Object Reconstruction

Authors: Prajval Kumar Murali, Bernd Porr, Mohsen Kaboli

Abstract: Accurate shape reconstruction of transparent objects is a challenging task due to their non-Lambertian surfaces and yet necessary for robots for accurate pose perception and safe manipulation. As vision-based sensing can produce erroneous measurements for transparent objects, the tactile modality is not sensitive to object transparency and can be used for reconstructing the object's shape. We prop… ▽ More Accurate shape reconstruction of transparent objects is a challenging task due to their non-Lambertian surfaces and yet necessary for robots for accurate pose perception and safe manipulation. As vision-based sensing can produce erroneous measurements for transparent objects, the tactile modality is not sensitive to object transparency and can be used for reconstructing the object's shape. We propose ACTOR, a novel framework for ACtive tactile-based category-level Transparent Object Reconstruction. ACTOR leverages large datasets of synthetic object with our proposed self-supervised learning approach for object shape reconstruction as the collection of real-world tactile data is prohibitively expensive. ACTOR can be used during inference with tactile data from category-level unknown transparent objects for reconstruction. Furthermore, we propose an active-tactile object exploration strategy as probing every part of the object surface can be sample inefficient. We also demonstrate tactile-based category-level object pose estimation task using ACTOR. We perform an extensive evaluation of our proposed methodology with real-world robotic experiments with comprehensive comparison studies with state-of-the-art approaches. Our proposed method outperforms these approaches in terms of tactile-based object reconstruction and object pose estimation. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: Accepted for publication at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023)

arXiv:2303.04032 [pdf, other]

doi 10.1109/ICRA48891.2023.10161215

GMCR: Graph-based Maximum Consensus Estimation for Point Cloud Registration

Authors: Michael Gentner, Prajval Kumar Murali, Mohsen Kaboli

Abstract: Point cloud registration is a fundamental and challenging problem for autonomous robots interacting in unstructured environments for applications such as object pose estimation, simultaneous localization and mapping, robot-sensor calibration, and so on. In global correspondence-based point cloud registration, data association is a highly brittle task and commonly produces high amounts of outliers.… ▽ More Point cloud registration is a fundamental and challenging problem for autonomous robots interacting in unstructured environments for applications such as object pose estimation, simultaneous localization and mapping, robot-sensor calibration, and so on. In global correspondence-based point cloud registration, data association is a highly brittle task and commonly produces high amounts of outliers. Failure to reject outliers can lead to errors propagating to downstream perception tasks. Maximum Consensus (MC) is a widely used technique for robust estimation, which is however known to be NP-hard. Exact methods struggle to scale to realistic problem instances, whereas high outlier rates are challenging for approximate methods. To this end, we propose Graph-based Maximum Consensus Registration (GMCR), which is highly robust to outliers and scales to realistic problem instances. We propose novel consensus functions to map the decoupled MC-objective to the graph domain, wherein we find a tight approximation to the maximum consensus set as the maximum clique. The final pose estimate is given in closed-form. We extensively evaluated our proposed GMCR on a synthetic registration benchmark, robotic object localization task, and additionally on a scan matching benchmark. Our proposed method shows high accuracy and time efficiency compared to other state-of-the-art MC methods and compares favorably to other robust registration methods. △ Less

Submitted 28 September, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

Comments: Accepted at icra 2023

arXiv:2211.07629 [pdf, other]

Assessing requirements to scale to practical quantum advantage

Authors: Michael E. Beverland, Prakash Murali, Matthias Troyer, Krysta M. Svore, Torsten Hoefler, Vadym Kliuchnikov, Guang Hao Low, Mathias Soeken, Aarthi Sundaram, Alexander Vaschillo

Abstract: While quantum computers promise to solve some scientifically and commercially valuable problems thought intractable for classical machines, delivering on this promise will require a large-scale quantum machine. Understanding the impact of architecture design choices for a scaled quantum stack for specific applications, prior to full realization of the quantum system, is an important open challenge… ▽ More While quantum computers promise to solve some scientifically and commercially valuable problems thought intractable for classical machines, delivering on this promise will require a large-scale quantum machine. Understanding the impact of architecture design choices for a scaled quantum stack for specific applications, prior to full realization of the quantum system, is an important open challenge. To this end, we develop a framework for quantum resource estimation, abstracting the layers of the stack, to estimate resources required across these layers for large-scale quantum applications. Using a tool that implements this framework, we assess three scaled quantum applications and find that hundreds of thousands to millions of physical qubits are needed to achieve practical quantum advantage. We identify three qubit parameters, namely size, speed, and controllability, that are critical at scale to rendering these applications practical. A goal of our work is to accelerate progress towards practical quantum advantage by enabling the broader community to explore design choices across the stack, from algorithms to qubits. △ Less

Submitted 14 November, 2022; originally announced November 2022.

arXiv:2210.12446 [pdf, other]

Learning Classifiers for Imbalanced and Overlapping Data

Authors: Shivaditya Shivganesh, Nitin Narayanan N, Pranav Murali, Ajaykumar M

Abstract: This study is about inducing classifiers using data that is imbalanced, with a minority class being under-represented in relation to the majority classes. The first section of this research focuses on the main characteristics of data that generate this problem. Following a study of previous, relevant research, a variety of artificial, imbalanced data sets influenced by important elements were crea… ▽ More This study is about inducing classifiers using data that is imbalanced, with a minority class being under-represented in relation to the majority classes. The first section of this research focuses on the main characteristics of data that generate this problem. Following a study of previous, relevant research, a variety of artificial, imbalanced data sets influenced by important elements were created. These data sets were used to create decision trees and rule-based classifiers. The second section of this research looks into how to improve classifiers by pre-processing data with resampling approaches. The results of the following trials are compared to the performance of distinct pre-processing re-sampling methods: two variants of random over-sampling and focused under-sampling NCR. This paper further optimises class imbalance with a new method called Sparsity. The data is made more sparse from its class centers, hence making it more homogenous. △ Less

Submitted 22 October, 2022; originally announced October 2022.

arXiv:2205.04697 [pdf, other]

An Empirical Evaluation of Various Information Gain Criteria for Active Tactile Action Selection for Pose Estimation

Authors: Prajval Kumar Murali, Ravinder Dahiya, Mohsen Kaboli

Abstract: Accurate object pose estimation using multi-modal perception such as visual and tactile sensing have been used for autonomous robotic manipulators in literature. Due to variation in density of visual and tactile data, we previously proposed a novel probabilistic Bayesian filter-based approach termed translation-invariant Quaternion filter (TIQF) for pose estimation. As tactile data collection is t… ▽ More Accurate object pose estimation using multi-modal perception such as visual and tactile sensing have been used for autonomous robotic manipulators in literature. Due to variation in density of visual and tactile data, we previously proposed a novel probabilistic Bayesian filter-based approach termed translation-invariant Quaternion filter (TIQF) for pose estimation. As tactile data collection is time consuming, active tactile data collection is preferred by reasoning over multiple potential actions for maximal expected information gain. In this paper, we empirically evaluate various information gain criteria for action selection in the context of object pose estimation. We demonstrate the adaptability and effectiveness of our proposed TIQF pose estimation approach with various information gain criteria. We find similar performance in terms of pose accuracy with sparse measurements across all the selected criteria. △ Less

Submitted 10 May, 2022; originally announced May 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2109.13540

arXiv:2205.03654 [pdf, other]

Towards Robust 3D Object Recognition with Dense-to-Sparse Deep Domain Adaptation

Authors: Prajval Kumar Murali, Cong Wang, Ravinder Dahiya, Mohsen Kaboli

Abstract: Three-dimensional (3D) object recognition is crucial for intelligent autonomous agents such as autonomous vehicles and robots alike to operate effectively in unstructured environments. Most state-of-art approaches rely on relatively dense point clouds and performance drops significantly for sparse point clouds. Unsupervised domain adaption allows to minimise the discrepancy between dense and spars… ▽ More Three-dimensional (3D) object recognition is crucial for intelligent autonomous agents such as autonomous vehicles and robots alike to operate effectively in unstructured environments. Most state-of-art approaches rely on relatively dense point clouds and performance drops significantly for sparse point clouds. Unsupervised domain adaption allows to minimise the discrepancy between dense and sparse point clouds with minimal unlabelled sparse point clouds, thereby saving additional sparse data collection, annotation and retraining costs. In this work, we propose a novel method for point cloud based object recognition with competitive performance with state-of-art methods on dense and sparse point clouds while being trained only with dense point clouds. △ Less

Submitted 7 May, 2022; originally announced May 2022.

arXiv:2203.13260 [pdf, other]

Adaptive job and resource management for the growing quantum cloud

Authors: Gokul Subramanian Ravi, Kaitlin N. Smith, Prakash Murali, Frederic T. Chong

Abstract: As the popularity of quantum computing continues to grow, efficient quantum machine access over the cloud is critical to both academic and industry researchers across the globe. And as cloud quantum computing demands increase exponentially, the analysis of resource consumption and execution characteristics are key to efficient management of jobs and resources at both the vendor-end as well as the… ▽ More As the popularity of quantum computing continues to grow, efficient quantum machine access over the cloud is critical to both academic and industry researchers across the globe. And as cloud quantum computing demands increase exponentially, the analysis of resource consumption and execution characteristics are key to efficient management of jobs and resources at both the vendor-end as well as the client-end. While the analysis and optimization of job / resource consumption and management are popular in the classical HPC domain, it is severely lacking for more nascent technology like quantum computing. This paper proposes optimized adaptive job scheduling to the quantum cloud taking note of primary characteristics such as queuing times and fidelity trends across machines, as well as other characteristics such as quality of service guarantees and machine calibration constraints. Key components of the proposal include a) a prediction model which predicts fidelity trends across machine based on compiled circuit features such as circuit depth and different forms of errors, as well as b) queuing time prediction for each machine based on execution time estimations. Overall, this proposal is evaluated on simulated IBM machines across a diverse set of quantum applications and system loading scenarios, and is able to reduce wait times by over 3x and improve fidelity by over 40\% on specific usecases, when compared to traditional job schedulers. △ Less

Submitted 24 March, 2022; originally announced March 2022.

Comments: Appeared at the 2021 IEEE International Conference on Quantum Computing and Engineering. arXiv admin note: text overlap with arXiv:2203.13121. substantial text overlap with arXiv:2203.13121

arXiv:2202.02207 [pdf, other]

doi 10.1109/LRA.2022.3150045

Active Visuo-Tactile Interactive Robotic Perception for Accurate Object Pose Estimation in Dense Clutter

Authors: Prajval Kumar Murali, Anirvan Dutta, Michael Gentner, Etienne Burdet, Ravinder Dahiya, Mohsen Kaboli

Abstract: This work presents a novel active visuo-tactile based framework for robotic systems to accurately estimate pose of objects in dense cluttered environments. The scene representation is derived using a novel declutter graph (DG) which describes the relationship among objects in the scene for decluttering by leveraging semantic segmentation and grasp affordances networks. The graph formulation allows… ▽ More This work presents a novel active visuo-tactile based framework for robotic systems to accurately estimate pose of objects in dense cluttered environments. The scene representation is derived using a novel declutter graph (DG) which describes the relationship among objects in the scene for decluttering by leveraging semantic segmentation and grasp affordances networks. The graph formulation allows robots to efficiently declutter the workspace by autonomously selecting the next best object to remove and the optimal action (prehensile or non-prehensile) to perform. Furthermore, we propose a novel translation-invariant Quaternion filter (TIQF) for active vision and active tactile based pose estimation. Both active visual and active tactile points are selected by maximizing the expected information gain. We evaluate our proposed framework on a system with two robots coordinating on randomized scenes of dense cluttered objects and perform ablation studies with static vision and active vision based estimation prior and post decluttering as baselines. Our proposed active visuo-tactile interactive perception framework shows upto 36% improvement in pose accuracy compared to the active vision baseline. △ Less

Submitted 4 February, 2022; originally announced February 2022.

Comments: Accepted for publication at IEEE Robotics and Automation Letters and IEEE International Conference on Robotics and Automation (ICRA) 2022

arXiv:2109.13540 [pdf, other]

Comparison of Information-Gain Criteria for Action Selection

Authors: Prajval Kumar Murali, Mohsen Kaboli

Abstract: Accurate object pose estimation using multi-modal perception such as visual and tactile sensing have been used for autonomous robotic manipulators in literature. Due to variation in density of visual and tactile data, a novel probabilistic Bayesian filter-based approach termed translation-invariant Quaternion filter (TIQF) is proposed for pose estimation using point cloud registration. Active tact… ▽ More Accurate object pose estimation using multi-modal perception such as visual and tactile sensing have been used for autonomous robotic manipulators in literature. Due to variation in density of visual and tactile data, a novel probabilistic Bayesian filter-based approach termed translation-invariant Quaternion filter (TIQF) is proposed for pose estimation using point cloud registration. Active tactile data collection is preferred by reasoning over multiple potential actions for maximal expected information gain as tactile data collection is time consuming. In this paper, we empirically evaluate various information gain criteria for action selection in the context of object pose estimation. We demonstrate the adaptability and effectiveness of our proposed TIQF pose estimation approach with various information gain criteria. We find similar performance in terms of pose accuracy with sparse measurements (<15 points) across all the selected criteria. Furthermore, we explore the use of uncommon information theoretic criteria in the robotics domain for action selection. △ Less

Submitted 7 May, 2022; v1 submitted 28 September, 2021; originally announced September 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2108.04015

arXiv:2108.04015 [pdf, other]

Active Visuo-Tactile Point Cloud Registration for Accurate Pose Estimation of Objects in an Unknown Workspace

Authors: Prajval Kumar Murali, Michael Gentner, Mohsen Kaboli

Abstract: This paper proposes a novel active visuo-tactile based methodology wherein the accurate estimation of the time-invariant SE(3) pose of objects is considered for autonomous robotic manipulators. The robot equipped with tactile sensors on the gripper is guided by a vision estimate to actively explore and localize the objects in the unknown workspace. The robot is capable of reasoning over multiple p… ▽ More This paper proposes a novel active visuo-tactile based methodology wherein the accurate estimation of the time-invariant SE(3) pose of objects is considered for autonomous robotic manipulators. The robot equipped with tactile sensors on the gripper is guided by a vision estimate to actively explore and localize the objects in the unknown workspace. The robot is capable of reasoning over multiple potential actions, and execute the action to maximize information gain to update the current belief of the object. We formulate the pose estimation process as a linear translation invariant quaternion filter (TIQF) by decoupling the estimation of translation and rotation and formulating the update and measurement model in linear form. We perform pose estimation sequentially on acquired measurements using very sparse point cloud as acquiring each measurement using tactile sensing is time consuming. Furthermore, our proposed method is computationally efficient to perform an exhaustive uncertainty-based active touch selection strategy in real-time without the need for trading information gain with execution time. We evaluated the performance of our approach extensively in simulation and by a robotic system. △ Less

Submitted 9 August, 2021; originally announced August 2021.

Comments: Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)

arXiv:2106.15490 [pdf, other]

doi 10.1109/ISCA52012.2021.00071

Designing calibration and expressivity-efficient instruction sets for quantum computing

Authors: Prakash Murali, Lingling Lao, Margaret Martonosi, Dan Browne

Abstract: Near-term quantum computing (QC) systems have limited qubit counts, high gate (instruction) error rates, and typically support a minimal instruction set having one type of two-qubit gate (2Q). To reduce program instruction counts and improve application expressivity, vendors have proposed, and shown proof-of-concept demonstrations of richer instruction sets such as XY gates (Rigetti) and fSim gate… ▽ More Near-term quantum computing (QC) systems have limited qubit counts, high gate (instruction) error rates, and typically support a minimal instruction set having one type of two-qubit gate (2Q). To reduce program instruction counts and improve application expressivity, vendors have proposed, and shown proof-of-concept demonstrations of richer instruction sets such as XY gates (Rigetti) and fSim gates (Google). These instruction sets comprise of families of 2Q gate types parameterized by continuous qubit rotation angles. However, having such a large number of gate types is problematic because each gate type has to be calibrated periodically, across the full system, to obtain high fidelity implementations. This results in substantial recurring calibration overheads even on current systems which use only a few gate types. Our work aims to navigate this tradeoff between application expressivity and calibration overhead, and identify what instructions vendors should implement to get the best expressivity with acceptable calibration time. We develop NuOp, a flexible compilation pass based on numerical optimization, to efficiently decompose application operations into arbitrary hardware gate types. Using NuOp and four important quantum applications, we study the instruction set proposals of Rigetti and Google, with realistic noise simulations and a calibration model. Our experiments show that implementing 4-8 types of 2Q gates is sufficient to attain nearly the same expressivity as a full continuous gate family, while reducing the calibration overhead by two orders of magnitude. With several vendors proposing rich gate families as means to higher fidelity, our work has potential to provide valuable instruction set design guidance for near-term QC systems. △ Less

Submitted 29 June, 2021; originally announced June 2021.

Comments: 14 pages, 11 figures

Journal ref: 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA):846-859

arXiv:2103.11981 [pdf, other]

doi 10.1109/ICRA48506.2021.9561055

In Situ Translational Hand-Eye Calibration of Laser Profile Sensors using Arbitrary Objects

Authors: Prajval Kumar Murali, Ines Sorrentino, Angelo Rendiniello, Claudio Fantacci, Enrico Villagrossi, Andrea Polo, Alessandro Ardesi, Marco Maggiali, Lorenzo Natale, Daniele Pucci, Silvio Traversaro

Abstract: Hand-eye calibration of laser profile sensors is the process of extracting the homogeneous transformation between the laser profile sensor frame and the end-effector frame of a robot in order to express the data extracted by the sensor in the robot's global coordinate system. For laser profile scanners this is a challenging procedure, as they provide data only in two dimensions and state-of-the-ar… ▽ More Hand-eye calibration of laser profile sensors is the process of extracting the homogeneous transformation between the laser profile sensor frame and the end-effector frame of a robot in order to express the data extracted by the sensor in the robot's global coordinate system. For laser profile scanners this is a challenging procedure, as they provide data only in two dimensions and state-of-the-art calibration procedures require the use of specialised calibration targets. This paper presents a novel method to extract the translation-part of the hand-eye calibration matrix with rotation-part known a priori in a target-agnostic way. Our methodology is applicable to any 2D image or 3D object as a calibration target and can also be performed in situ in the final application. The method is experimentally validated on a real robot-sensor setup with 2D and 3D targets. △ Less

Submitted 22 March, 2021; originally announced March 2021.

Comments: The first two authors contributed equally to this work. Accepted to the IEEE International Conference on Robotics and Automation (ICRA) 2021

Journal ref: IEEE International Conference on Robotics and Automation, 2021

arXiv:2101.12284 [pdf]

doi 10.1145/3411764.3445235

AffectiveSpotlight: Facilitating the Communication of Affective Responses from Audience Members during Online Presentations

Authors: Prasanth Murali, Javier Hernandez, Daniel McDuff, Kael Rowan, Jina Suh, Mary Czerwinski

Abstract: The ability to monitor audience reactions is critical when delivering presentations. However, current videoconferencing platforms offer limited solutions to support this. This work leverages recent advances in affect sensing to capture and facilitate communication of relevant audience signals. Using an exploratory survey (N = 175), we assessed the most relevant audience responses such as confusion… ▽ More The ability to monitor audience reactions is critical when delivering presentations. However, current videoconferencing platforms offer limited solutions to support this. This work leverages recent advances in affect sensing to capture and facilitate communication of relevant audience signals. Using an exploratory survey (N = 175), we assessed the most relevant audience responses such as confusion, engagement, and head-nods. We then implemented AffectiveSpotlight, a Microsoft Teams bot that analyzes facial responses and head gestures of audience members and dynamically spotlights the most expressive ones. In a within-subjects study with 14 groups (N = 117), we observed that the system made presenters significantly more aware of their audience, speak for a longer period of time, and self-assess the quality of their talk more similarly to the audience members, compared to two control conditions (randomly-selected spotlight and default platform UI). We provide design recommendations for future affective interfaces for online presentations based on feedback from the study. △ Less

Submitted 28 January, 2021; originally announced January 2021.

arXiv:2012.04156 [pdf]

doi 10.1109/RAICS51191.2020.9332470

An Efficient Analyses of the Behavior of One Dimensional Chaotic Maps using 0-1 Test and Three State Test

Authors: Joan S. Muthu, Aditya Jyoti Paul, P. Murali

Abstract: In this paper, a rigorous analysis of the behavior of the standard logistic map, Logistic Tent system (LTS), Logistic-Sine system (LSS) and Tent-Sine system (TSS) is performed using 0-1 test and three state test (3ST). In this work, it has been proved that the strength of the chaotic behavior is not uniform. Through extensive experiment and analysis, the strong and weak chaotic regions of LTS, LSS… ▽ More In this paper, a rigorous analysis of the behavior of the standard logistic map, Logistic Tent system (LTS), Logistic-Sine system (LSS) and Tent-Sine system (TSS) is performed using 0-1 test and three state test (3ST). In this work, it has been proved that the strength of the chaotic behavior is not uniform. Through extensive experiment and analysis, the strong and weak chaotic regions of LTS, LSS and TSS have been identified. This would enable researchers using these maps, to have better choices of control parameters as key values, for stronger encryption. In addition, this paper serves as a precursor to stronger testing practices in cryptosystem research, as Lyapunov exponent alone has been shown to fail as a true representation of the chaotic nature of a map. △ Less

Submitted 13 February, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

Comments: 6 pages, Published in IEEE RAICS 2020, see https://www.raics.in

MSC Class: 37H20; 34F10; 34H10; 49J15; 49K15; 47J15 ACM Class: G.1.0; G.1.2; G.1.3; G.2.3; G.4; C.3; E.3; I.6.4

Journal ref: 2020 IEEE Recent Advances in Intelligent Computational Systems (RAICS), 2020, pp. 125-130

arXiv:2011.03375 [pdf, other]

A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees

Authors: Haoran Zhu, Pavankumar Murali, Dzung T. Phan, Lam M. Nguyen, Jayant R. Kalagnanam

Abstract: Several recent publications report advances in training optimal decision trees (ODT) using mixed-integer programs (MIP), due to algorithmic advances in integer programming and a growing interest in addressing the inherent suboptimality of heuristic approaches such as CART. In this paper, we propose a novel MIP formulation, based on a 1-norm support vector machine model, to train a multivariate ODT… ▽ More Several recent publications report advances in training optimal decision trees (ODT) using mixed-integer programs (MIP), due to algorithmic advances in integer programming and a growing interest in addressing the inherent suboptimality of heuristic approaches such as CART. In this paper, we propose a novel MIP formulation, based on a 1-norm support vector machine model, to train a multivariate ODT for classification problems. We provide cutting plane techniques that tighten the linear relaxation of the MIP formulation, in order to improve run times to reach optimality. Using 36 data-sets from the University of California Irvine Machine Learning Repository, we demonstrate that our formulation outperforms its counterparts in the literature by an average of about 10% in terms of mean out-of-sample testing accuracy across the data-sets. We provide a scalable framework to train multivariate ODT on large data-sets by introducing a novel linear programming (LP) based data selection method to choose a subset of the data for training. Our method is able to routinely handle large data-sets with more than 7,000 sample points and outperform heuristics methods and other MIP based techniques. We present results on data-sets containing up to 245,000 samples. Existing MIP-based methods do not scale well on training data-sets beyond 5,500 samples. △ Less

Submitted 6 November, 2020; originally announced November 2020.

arXiv:2008.08292 [pdf, other]

End-to-End Predictions-Based Resource Management Framework for Supercomputer Jobs

Authors: Swetha Hariharan, Prakash Murali, Abhishek Pasari, Sathish Vadhiyar

Abstract: Job submissions of parallel applications to production supercomputer systems will have to be carefully tuned in terms of the job submission parameters to obtain minimum response times. In this work, we have developed an end-to-end resource management framework that uses predictions of queue waiting and execution times to minimize response times of user jobs submitted to supercomputer systems. Our… ▽ More Job submissions of parallel applications to production supercomputer systems will have to be carefully tuned in terms of the job submission parameters to obtain minimum response times. In this work, we have developed an end-to-end resource management framework that uses predictions of queue waiting and execution times to minimize response times of user jobs submitted to supercomputer systems. Our method for predicting queue waiting times adaptively chooses a prediction method based on the cluster structure of similar jobs. Our strategy for execution time predictions dynamically learns the impact of load on execution times and uses this to predict a set of execution time ranges for the target job. We have developed two resource management techniques that employ these predictions, one that selects the number of processors for execution and the other that also dynamically changes the job submission time. Using workload simulations of large supercomputer traces, we show large-scale improvements in predictions and reductions in response times over existing techniques and baseline strategies. △ Less

Submitted 19 August, 2020; originally announced August 2020.

arXiv:2007.06720 [pdf, other]

doi 10.1007/s11370-020-00332-9

Deployment and Evaluation of a Flexible Human-Robot Collaboration Model Based on AND/OR Graphs in a Manufacturing Environment

Authors: Prajval Kumar Murali, Kourosh Darvish, Fulvio Mastrogiovanni

Abstract: The Industry 4.0 paradigm promises shorter development times, increased ergonomy, higher flexibility, and resource efficiency in manufacturing environments. Collaborative robots are an important tangible technology for implementing such a paradigm. A major bottleneck to effectively deploy collaborative robots to manufacturing industries is developing task planning algorithms that enable them to re… ▽ More The Industry 4.0 paradigm promises shorter development times, increased ergonomy, higher flexibility, and resource efficiency in manufacturing environments. Collaborative robots are an important tangible technology for implementing such a paradigm. A major bottleneck to effectively deploy collaborative robots to manufacturing industries is developing task planning algorithms that enable them to recognize and naturally adapt to varying and even unpredictable human actions while simultaneously ensuring an overall efficiency in terms of production cycle time. In this context, an architecture encompassing task representation, task planning, sensing, and robot control has been designed, developed and evaluated in a real industrial environment. A pick-and-place palletization task, which requires the collaboration between humans and robots, is investigated. The architecture uses AND/OR graphs for representing and reasoning upon human-robot collaboration models online. Furthermore, objective measures of the overall computational performance and subjective measures of naturalness in human-robot collaboration have been evaluated by performing experiments with production-line operators. The results of this user study demonstrate how human-robot collaboration models like the one we propose can leverage the flexibility and the comfort of operators in the workplace. In this regard, an extensive comparison study among recent models has been carried out. △ Less

Submitted 13 July, 2020; originally announced July 2020.

arXiv:2001.02826 [pdf, other]

doi 10.1145/3373376.3378477

Software Mitigation of Crosstalk on Noisy Intermediate-Scale Quantum Computers

Authors: Prakash Murali, David C. McKay, Margaret Martonosi, Ali Javadi-Abhari

Abstract: Crosstalk is a major source of noise in Noisy Intermediate-Scale Quantum (NISQ) systems and is a fundamental challenge for hardware design. When multiple instructions are executed in parallel, crosstalk between the instructions can corrupt the quantum state and lead to incorrect program execution. Our goal is to mitigate the application impact of crosstalk noise through software techniques. This r… ▽ More Crosstalk is a major source of noise in Noisy Intermediate-Scale Quantum (NISQ) systems and is a fundamental challenge for hardware design. When multiple instructions are executed in parallel, crosstalk between the instructions can corrupt the quantum state and lead to incorrect program execution. Our goal is to mitigate the application impact of crosstalk noise through software techniques. This requires (i) accurate characterization of hardware crosstalk, and (ii) intelligent instruction scheduling to serialize the affected operations. Since crosstalk characterization is computationally expensive, we develop optimizations which reduce the characterization overhead. On three 20-qubit IBMQ systems, we demonstrate two orders of magnitude reduction in characterization time (compute time on the QC device) compared to all-pairs crosstalk measurements. Informed by these characterization, we develop a scheduler that judiciously serializes high crosstalk instructions balancing the need to mitigate crosstalk and exponential decoherence errors from serialization. On real-system runs on three IBMQ systems, our scheduler improves the error rate of application circuits by up to 5.6x, compared to the IBM instruction scheduler and offers near-optimal crosstalk mitigation in practice. In a broader picture, the difficulty of mitigating crosstalk has recently driven QC vendors to move towards sparser qubit connectivity or disabling nearby operations entirely in hardware, which can be detrimental to performance. Our work makes the case for software mitigation of crosstalk errors. △ Less

Submitted 8 January, 2020; originally announced January 2020.

Comments: To appear in ASPLOS 2020

arXiv:1903.03276 [pdf, ps, other]

doi 10.1016/j.micpro.2019.02.005

Formal Constraint-based Compilation for Noisy Intermediate-Scale Quantum Systems

Authors: Prakash Murali, Ali Javadi-Abhari, Frederic T. Chong, Margaret Martonosi

Abstract: Noisy, intermediate-scale quantum (NISQ) systems are expected to have a few hundred qubits, minimal or no error correction, limited connectivity and limits on the number of gates that can be performed within the short coherence window of the machine. The past decade's research on quantum programming languages and compilers is directed towards large systems with thousands of qubits. For near term q… ▽ More Noisy, intermediate-scale quantum (NISQ) systems are expected to have a few hundred qubits, minimal or no error correction, limited connectivity and limits on the number of gates that can be performed within the short coherence window of the machine. The past decade's research on quantum programming languages and compilers is directed towards large systems with thousands of qubits. For near term quantum systems, it is crucial to design tool flows which make efficient use of the hardware resources without sacrificing the ease and portability of a high-level programming environment. In this paper, we present a compiler for the Scaffold quantum programming language in which aggressive optimization specifically targets NISQ machines with hundreds of qubits. Our compiler extracts gates from a Scaffold program, and formulates a constrained optimization problem which considers both program characteristics and machine constraints. Using the Z3 SMT solver, the compiler maps program qubits to hardware qubits, schedules gates, and inserts CNOT routing operations while optimizing the overall execution time. The output of the optimization is used to produce target code in the OpenQASM language, which can be executed on existing quantum hardware such as the 16-qubit IBM machine. Using real and synthetic benchmarks, we show that it is feasible to synthesize near-optimal compiled code for current and small NISQ systems. For large programs and machine sizes, the SMT optimization approach can be used to synthesize compiled code that is guaranteed to finish within the coherence window of the machine. △ Less

Submitted 7 March, 2019; originally announced March 2019.

Comments: Invited paper in Special Issue on Quantum Computer Architecture: a full-stack overview, Microprocessors and Microsystems

Journal ref: Microprocessors and Microsystems 2019

arXiv:1901.11054 [pdf, other]

Noise-Adaptive Compiler Mappings for Noisy Intermediate-Scale Quantum Computers

Authors: Prakash Murali, Jonathan M. Baker, Ali Javadi Abhari, Frederic T. Chong, Margaret Martonosi

Abstract: A massive gap exists between current quantum computing (QC) prototypes, and the size and scale required for many proposed QC algorithms. Current QC implementations are prone to noise and variability which affect their reliability, and yet with less than 80 quantum bits (qubits) total, they are too resource-constrained to implement error correction. The term Noisy Intermediate-Scale Quantum (NISQ)… ▽ More A massive gap exists between current quantum computing (QC) prototypes, and the size and scale required for many proposed QC algorithms. Current QC implementations are prone to noise and variability which affect their reliability, and yet with less than 80 quantum bits (qubits) total, they are too resource-constrained to implement error correction. The term Noisy Intermediate-Scale Quantum (NISQ) refers to these current and near-term systems of 1000 qubits or less. Given NISQ's severe resource constraints, low reliability, and high variability in physical characteristics such as coherence time or error rates, it is of pressing importance to map computations onto them in ways that use resources efficiently and maximize the likelihood of successful runs. This paper proposes and evaluates backend compiler approaches to map and optimize high-level QC programs to execute with high reliability on NISQ systems with diverse hardware characteristics. Our techniques all start from an LLVM intermediate representation of the quantum program (such as would be generated from high-level QC languages like Scaffold) and generate QC executables runnable on the IBM Q public QC machine. We then use this framework to implement and evaluate several optimal and heuristic mapping methods. These methods vary in how they account for the availability of dynamic machine calibration data, the relative importance of various noise parameters, the different possible routing strategies, and the relative importance of compile-time scalability versus runtime success. Using real-system measurements, we show that fine grained spatial and temporal variations in hardware parameters can be exploited to obtain an average $2.9$x (and up to $18$x) improvement in program success rate over the industry standard IBM Qiskit compiler. △ Less

Submitted 30 January, 2019; originally announced January 2019.

Comments: To appear in ASPLOS'19

arXiv:1804.09494 [pdf, other]

On Optimizing Distributed Tucker Decomposition for Sparse Tensors

Authors: Venkatesan T. Chakaravarthy, Jee W. Choi, Douglas J. Joseph, Prakash Murali, Shivmaran S. Pandian, Yogish Sabharwal, Dheeraj Sreedhar

Abstract: The Tucker decomposition generalizes the notion of Singular Value Decomposition (SVD) to tensors, the higher dimensional analogues of matrices. We study the problem of constructing the Tucker decomposition of sparse tensors on distributed memory systems via the HOOI procedure, a popular iterative method. The scheme used for distributing the input tensor among the processors (MPI ranks) critically… ▽ More The Tucker decomposition generalizes the notion of Singular Value Decomposition (SVD) to tensors, the higher dimensional analogues of matrices. We study the problem of constructing the Tucker decomposition of sparse tensors on distributed memory systems via the HOOI procedure, a popular iterative method. The scheme used for distributing the input tensor among the processors (MPI ranks) critically influences the HOOI execution time. Prior work has proposed different distribution schemes: an offline scheme based on sophisticated hypergraph partitioning method and simple, lightweight alternatives that can be used real-time. While the hypergraph based scheme typically results in faster HOOI execution time, being complex, the time taken for determining the distribution is an order of magnitude higher than the execution time of a single HOOI iteration. Our main contribution is a lightweight distribution scheme, which achieves the best of both worlds. We show that the scheme is near-optimal on certain fundamental metrics associated with the HOOI procedure and as a result, near-optimal on the computational load (FLOPs). Though the scheme may incur higher communication volume, the computation time is the dominant factor and as the result, the scheme achieves better performance on the overall HOOI execution time. Our experimental evaluation on large real-life tensors (having up to 4 billion elements) shows that the scheme outperforms the prior schemes on the HOOI execution time by a factor of up to 3x. On the other hand, its distribution time is comparable to the prior lightweight schemes and is typically lesser than the execution time of a single HOOI iteration. △ Less

Submitted 18 January, 2020; v1 submitted 25 April, 2018; originally announced April 2018.

Comments: Abridged version of the paper to appear in the proceedings of ICS'18

arXiv:1712.10201 [pdf, other]

doi 10.1109/TPDS.2017.2769082

Metascheduling of HPC Jobs in Day-Ahead Electricity Markets

Authors: Prakash Murali, Sathish Vadhiyar

Abstract: High performance grid computing is a key enabler of large scale collaborative computational science. With the promise of exascale computing, high performance grid systems are expected to incur electricity bills that grow super-linearly over time. In order to achieve cost effectiveness in these systems, it is essential for the scheduling algorithms to exploit electricity price variations, both in s… ▽ More High performance grid computing is a key enabler of large scale collaborative computational science. With the promise of exascale computing, high performance grid systems are expected to incur electricity bills that grow super-linearly over time. In order to achieve cost effectiveness in these systems, it is essential for the scheduling algorithms to exploit electricity price variations, both in space and time, that are prevalent in the dynamic electricity price markets. In this paper, we present a metascheduling algorithm to optimize the placement of jobs in a compute grid which consumes electricity from the day-ahead wholesale market. We formulate the scheduling problem as a Minimum Cost Maximum Flow problem and leverage queue waiting time and electricity price predictions to accurately estimate the cost of job execution at a system. Using trace based simulation with real and synthetic workload traces, and real electricity price data sets, we demonstrate our approach on two currently operational grids, XSEDE and NorduGrid. Our experimental setup collectively constitute more than 433K processors spread across 58 compute systems in 17 geographically distributed locations. Experiments show that our approach simultaneously optimizes the total electricity cost and the average response time of the grid, without being unfair to users of the local batch systems. △ Less

Submitted 29 December, 2017; originally announced December 2017.

Comments: Appears in IEEE Transactions on Parallel and Distributed Systems

arXiv:1707.05594 [pdf, other]

On Optimizing Distributed Tucker Decomposition for Dense Tensors

Authors: Venkatesan T Chakaravarthy, Jee W Choi, Douglas J Joseph, Xing Liu, Prakash Murali, Yogish Sabharwal, Dheeraj Sreedhar

Abstract: The Tucker decomposition expresses a given tensor as the product of a small core tensor and a set of factor matrices. Apart from providing data compression, the construction is useful in performing analysis such as principal component analysis (PCA)and finds applications in diverse domains such as signal processing, computer vision and text analytics. Our objective is to develop an efficient distr… ▽ More The Tucker decomposition expresses a given tensor as the product of a small core tensor and a set of factor matrices. Apart from providing data compression, the construction is useful in performing analysis such as principal component analysis (PCA)and finds applications in diverse domains such as signal processing, computer vision and text analytics. Our objective is to develop an efficient distributed implementation for the case of dense tensors. The implementation is based on the HOOI (Higher Order Orthogonal Iterator) procedure, wherein the tensor-times-matrix product forms the core routine. Prior work have proposed heuristics for reducing the computational load and communication volume incurred by the routine. We study the two metrics in a formal and systematic manner, and design strategies that are optimal under the two fundamental metrics. Our experimental evaluation on a large benchmark of tensors shows that the optimal strategies provide significant reduction in load and volume compared to prior heuristics, and provide up to 7x speed-up in the overall running time. △ Less

Submitted 18 July, 2017; originally announced July 2017.

Comments: Preliminary version of the paper appears in the proceedings of IPDPS'17

arXiv:1602.04478 [pdf, other]

Subgraph Counting: Color Coding Beyond Trees

Authors: Venkatesan T. Chakaravarthy, Michael Kapralov, Prakash Murali, Fabrizio Petrini, Xinyu Que, Yogish Sabharwal, Baruch Schieber

Abstract: The problem of counting occurrences of query graphs in a large data graph, known as subgraph counting, is fundamental to several domains such as genomics and social network analysis. Many important special cases (e.g. triangle counting) have received significant attention. Color coding is a very general and powerful algorithmic technique for subgraph counting. Color coding has been shown to be eff… ▽ More The problem of counting occurrences of query graphs in a large data graph, known as subgraph counting, is fundamental to several domains such as genomics and social network analysis. Many important special cases (e.g. triangle counting) have received significant attention. Color coding is a very general and powerful algorithmic technique for subgraph counting. Color coding has been shown to be effective in several applications, but scalable implementations are only known for the special case of {\em tree queries} (i.e. queries of treewidth one). In this paper we present the first efficient distributed implementation for color coding that goes beyond tree queries: our algorithm applies to any query graph of treewidth $2$. Since tree queries can be solved in time linear in the size of the data graph, our contribution is the first step into the realm of colour coding for queries that require superlinear running time in the worst case. This superlinear complexity leads to significant load balancing problems on graphs with heavy tailed degree distributions. Our algorithm structures the computation to work around high degree nodes in the data graph, and achieves very good runtime and scalability on a diverse collection of data and query graph pairs as a result. We also provide theoretical analysis of our algorithmic techniques, showing asymptotic improvements in runtime on random graphs with power law degree distributions, a popular model for real world graphs. △ Less

Submitted 2 April, 2016; v1 submitted 14 February, 2016; originally announced February 2016.

Showing 1–25 of 25 results for author: Murali, P