Search | arXiv e-print repository

Diffeomorphic Transformer-based Abdomen MRI-CT Deformable Image Registration

Authors: Yang Lei, Luke A. Matkovic, Justin Roper, Tonghe Wang, Jun Zhou, Beth Ghavidel, Mark McDonald, Pretesh Patel, Xiaofeng Yang

Abstract: This paper aims to create a deep learning framework that can estimate the deformation vector field (DVF) for directly registering abdominal MRI-CT images. The proposed method assumed a diffeomorphic deformation. By using topology-preserved deformation features extracted from the probabilistic diffeomorphic registration model, abdominal motion can be accurately obtained and utilized for DVF estimat… ▽ More This paper aims to create a deep learning framework that can estimate the deformation vector field (DVF) for directly registering abdominal MRI-CT images. The proposed method assumed a diffeomorphic deformation. By using topology-preserved deformation features extracted from the probabilistic diffeomorphic registration model, abdominal motion can be accurately obtained and utilized for DVF estimation. The model integrated Swin transformers, which have demonstrated superior performance in motion tracking, into the convolutional neural network (CNN) for deformation feature extraction. The model was optimized using a cross-modality image similarity loss and a surface matching loss. To compute the image loss, a modality-independent neighborhood descriptor (MIND) was used between the deformed MRI and CT images. The surface matching loss was determined by measuring the distance between the warped coordinates of the surfaces of contoured structures on the MRI and CT images. The deformed MRI image was assessed against the CT image using the target registration error (TRE), Dice similarity coefficient (DSC), and mean surface distance (MSD) between the deformed contours of the MRI image and manual contours of the CT image. When compared to only rigid registration, DIR with the proposed method resulted in an increase of the mean DSC values of the liver and portal vein from 0.850 and 0.628 to 0.903 and 0.763, a decrease of the mean MSD of the liver from 7.216 mm to 3.232 mm, and a decrease of the TRE from 26.238 mm to 8.492 mm. The proposed deformable image registration method based on a diffeomorphic transformer provides an effective and efficient way to generate an accurate DVF from an MRI-CT image pair of the abdomen. It could be utilized in the current treatment planning workflow for liver radiotherapy. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: 18 pages and 4 figures

arXiv:2404.16752 [pdf, other]

TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

Authors: Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Yao Feng, Michael J. Black

Abstract: We address the problem of regressing 3D human pose and shape from a single image, with a focus on 3D accuracy. The current best methods leverage large datasets of 3D pseudo-ground-truth (p-GT) and 2D keypoints, leading to robust performance. With such methods, we observe a paradoxical decline in 3D pose accuracy with increasing 2D accuracy. This is caused by biases in the p-GT and the use of an ap… ▽ More We address the problem of regressing 3D human pose and shape from a single image, with a focus on 3D accuracy. The current best methods leverage large datasets of 3D pseudo-ground-truth (p-GT) and 2D keypoints, leading to robust performance. With such methods, we observe a paradoxical decline in 3D pose accuracy with increasing 2D accuracy. This is caused by biases in the p-GT and the use of an approximate camera projection model. We quantify the error induced by current camera models and show that fitting 2D keypoints and p-GT accurately causes incorrect 3D poses. Our analysis defines the invalid distances within which minimizing 2D and p-GT losses is detrimental. We use this to formulate a new loss Threshold-Adaptive Loss Scaling (TALS) that penalizes gross 2D and p-GT losses but not smaller ones. With such a loss, there are many 3D poses that could equally explain the 2D evidence. To reduce this ambiguity we need a prior over valid human poses but such priors can introduce unwanted bias. To address this, we exploit a tokenized representation of human pose and reformulate the problem as token prediction. This restricts the estimated poses to the space of valid poses, effectively providing a uniform prior. Extensive experiments on the EMDB and 3DPW datasets show that our reformulated keypoint loss and tokenization allows us to train on in-the-wild data while improving 3D accuracy over the state-of-the-art. Our models and code are available for research at https://tokenhmr.is.tue.mpg.de. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2403.20199 [pdf]

NeuraLunaDTNet: Feedforward Neural Network-Based Routing Protocol for Delay-Tolerant Lunar Communication Networks

Authors: Parth Patel, Milena Radenkovic

Abstract: Space Communication poses challenges such as severe delays, hard-to-predict routes and communication disruptions. The Delay Tolerant Network architecture, having been specifically designed keeping such scenarios in mind, is suitable to address some challenges. The traditional DTN routing protocols fall short of delivering optimal performance, due to the inherent complexities of space communication… ▽ More Space Communication poses challenges such as severe delays, hard-to-predict routes and communication disruptions. The Delay Tolerant Network architecture, having been specifically designed keeping such scenarios in mind, is suitable to address some challenges. The traditional DTN routing protocols fall short of delivering optimal performance, due to the inherent complexities of space communication. Researchers have aimed at using recent advancements in AI to mitigate some routing challenges [9]. We propose utilising a feedforward neural network to develop a novel protocol NeuraLunaDTNet, which enhances the efficiency of the PRoPHET routing protocol for lunar communication, by learning contact plans in dynamically changing spatio-temporal graph. △ Less

Submitted 7 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.14770 [pdf, other]

Beehive: A Flexible Network Stack for Direct-Attached Accelerators

Authors: Katie Lim, Matthew Giordano, Theano Stavrinos, Pratyush Patel, Jacob Nelson, Irene Zhang, Baris Kasikci, Tom Anderson

Abstract: Direct-attached accelerators, where application accelerators are directly connected to the datacenter network via a hardware network stack, offer substantial benefits in terms of reduced latency, CPU overhead, and energy use. However, a key challenge is that modern datacenter network stacks are complex, with interleaved protocol layers, network management functions, and virtualization support. To… ▽ More Direct-attached accelerators, where application accelerators are directly connected to the datacenter network via a hardware network stack, offer substantial benefits in terms of reduced latency, CPU overhead, and energy use. However, a key challenge is that modern datacenter network stacks are complex, with interleaved protocol layers, network management functions, and virtualization support. To operators, network feature agility, diagnostics, and manageability are often considered just as important as raw performance. By contrast, existing hardware network stacks only support basic protocols and are often difficult to extend since they use fixed processing pipelines. We propose Beehive, a new, open-source FPGA network stack for direct-attached accelerators designed to enable flexible and adaptive construction of complex network functionality in hardware. Application and network protocol elements are modularized as tiles over a network-on-chip substrate. Elements can be added or scaled up/down to match workload characteristics with minimal effort or changes to other elements. Flexible diagnostics and control are integral, with tooling to ensure deadlock safety. Our implementation interoperates with standard Linux TCP and UDP clients, with a 4x improvement in end-to-end remote procedure call tail latency for Linux UDP clients versus a CPU-attached accelerator △ Less

Submitted 30 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

arXiv:2401.03267 [pdf]

Autonomous Navigation in Complex Environments

Authors: Andrew Gerstenslager, Jomol Lewis, Liam McKenna, Poorva Patel

Abstract: This paper explores the application of CNN-DNN network fusion to construct a robot navigation controller within a simulated environment. The simulated environment is constructed to model a subterranean rescue situation, such that an autonomous agent is tasked with finding a goal within an unknown cavernous system. Imitation learning is used to train the control algorithm to use LiDAR and camera da… ▽ More This paper explores the application of CNN-DNN network fusion to construct a robot navigation controller within a simulated environment. The simulated environment is constructed to model a subterranean rescue situation, such that an autonomous agent is tasked with finding a goal within an unknown cavernous system. Imitation learning is used to train the control algorithm to use LiDAR and camera data to navigate the space and find the goal. The trained model is then tested for robustness using Monte-Carlo. △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: 7 pages, 3 figures, independent paper

arXiv:2312.01059 [pdf, other]

Swarm-GPT: Combining Large Language Models with Safe Motion Planning for Robot Choreography Design

Authors: Aoran Jiao, Tanmay P. Patel, Sanjmi Khurana, Anna-Mariya Korol, Lukas Brunke, Vivek K. Adajania, Utku Culha, Siqi Zhou, Angela P. Schoellig

Abstract: This paper presents Swarm-GPT, a system that integrates large language models (LLMs) with safe swarm motion planning - offering an automated and novel approach to deployable drone swarm choreography. Swarm-GPT enables users to automatically generate synchronized drone performances through natural language instructions. With an emphasis on safety and creativity, Swarm-GPT addresses a critical gap i… ▽ More This paper presents Swarm-GPT, a system that integrates large language models (LLMs) with safe swarm motion planning - offering an automated and novel approach to deployable drone swarm choreography. Swarm-GPT enables users to automatically generate synchronized drone performances through natural language instructions. With an emphasis on safety and creativity, Swarm-GPT addresses a critical gap in the field of drone choreography by integrating the creative power of generative models with the effectiveness and safety of model-based planning algorithms. This goal is achieved by prompting the LLM to generate a unique set of waypoints based on extracted audio data. A trajectory planner processes these waypoints to guarantee collision-free and feasible motion. Results can be viewed in simulation prior to execution and modified through dynamic re-prompting. Sim-to-real transfer experiments demonstrate Swarm-GPT's ability to accurately replicate simulated drone trajectories, with a mean sim-to-real root mean square error (RMSE) of 28.7 mm. To date, Swarm-GPT has been successfully showcased at three live events, exemplifying safe real-world deployment of pre-trained models. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: 10 pages, 9 figures

arXiv:2311.18836 [pdf, other]

ChatPose: Chatting about 3D Human Pose

Authors: Yao Feng, Jing Lin, Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Michael J. Black

Abstract: We introduce ChatPose, a framework employing Large Language Models (LLMs) to understand and reason about 3D human poses from images or textual descriptions. Our work is motivated by the human ability to intuitively understand postures from a single image or a brief description, a process that intertwines image interpretation, world knowledge, and an understanding of body language. Traditional huma… ▽ More We introduce ChatPose, a framework employing Large Language Models (LLMs) to understand and reason about 3D human poses from images or textual descriptions. Our work is motivated by the human ability to intuitively understand postures from a single image or a brief description, a process that intertwines image interpretation, world knowledge, and an understanding of body language. Traditional human pose estimation and generation methods often operate in isolation, lacking semantic understanding and reasoning abilities. ChatPose addresses these limitations by embedding SMPL poses as distinct signal tokens within a multimodal LLM, enabling the direct generation of 3D body poses from both textual and visual inputs. Leveraging the powerful capabilities of multimodal LLMs, ChatPose unifies classical 3D human pose and generation tasks while offering user interactions. Additionally, ChatPose empowers LLMs to apply their extensive world knowledge in reasoning about human poses, leading to two advanced tasks: speculative pose generation and reasoning about pose estimation. These tasks involve reasoning about humans to generate 3D poses from subtle text queries, possibly accompanied by images. We establish benchmarks for these tasks, moving beyond traditional 3D pose generation and estimation methods. Our results show that ChatPose outperforms existing multimodal LLMs and task-specific methods on these newly proposed tasks. Furthermore, ChatPose's ability to understand and generate 3D human poses based on complex reasoning opens new directions in human pose analysis. △ Less

Submitted 23 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: Home page: https://yfeng95.github.io/ChatPose/

arXiv:2311.18677 [pdf, other]

Splitwise: Efficient generative LLM inference using phase splitting

Authors: Pratyush Patel, Esha Choukse, Chaojie Zhang, Aashaka Shah, Íñigo Goiri, Saeed Maleki, Ricardo Bianchini

Abstract: Recent innovations in generative large language models (LLMs) have made their applications and use-cases ubiquitous. This has led to large-scale deployments of these models, using complex, expensive, and power-hungry AI accelerators, most commonly GPUs. These developments make LLM inference efficiency an important challenge. Based on our extensive characterization, we find that there are two main… ▽ More Recent innovations in generative large language models (LLMs) have made their applications and use-cases ubiquitous. This has led to large-scale deployments of these models, using complex, expensive, and power-hungry AI accelerators, most commonly GPUs. These developments make LLM inference efficiency an important challenge. Based on our extensive characterization, we find that there are two main phases during an LLM inference request: a compute-intensive prompt computation, and a memory-intensive token generation, each with distinct latency, throughput, memory, and power characteristics. Despite state-of-the-art batching and scheduling, the token generation phase underutilizes compute resources. Specifically, unlike compute-intensive prompt computation phases, token generation phases do not require the compute capability of the latest GPUs, and can be run with lower power and cost. With Splitwise, we propose splitting the two phases of a LLM inference request on to separate machines. This allows us to use hardware that is well-suited for each phase, and provision resources independently per phase. However, splitting an inference request across machines requires state transfer from the machine running prompt computation over to the machine generating tokens. We implement and optimize this state transfer using the fast back-plane interconnects available in today's GPU clusters. We use the Splitwise technique to design LLM inference clusters using the same or different types of machines for the prompt computation and token generation phases. Our clusters are optimized for three key objectives: throughput, cost, and power. In particular, we show that we can achieve 1.4x higher throughput at 20% lower cost than current designs. Alternatively, we can achieve 2.35x more throughput with the same cost and power budgets. △ Less

Submitted 20 May, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: 12 pages, 19 figures

MSC Class: I.2.0; I.3.1; C.4

arXiv:2311.01500 [pdf, other]

E(2) Equivariant Neural Networks for Robust Galaxy Morphology Classification

Authors: Sneh Pandya, Purvik Patel, Franc O, Jonathan Blazek

Abstract: We propose the use of group convolutional neural network architectures (GCNNs) equivariant to the 2D Euclidean group, $E(2)$, for the task of galaxy morphology classification by utilizing symmetries of the data present in galaxy images as an inductive bias in the architecture. We conduct robustness studies by introducing artificial perturbations via Poisson noise insertion and one-pixel adversaria… ▽ More We propose the use of group convolutional neural network architectures (GCNNs) equivariant to the 2D Euclidean group, $E(2)$, for the task of galaxy morphology classification by utilizing symmetries of the data present in galaxy images as an inductive bias in the architecture. We conduct robustness studies by introducing artificial perturbations via Poisson noise insertion and one-pixel adversarial attacks to simulate the effects of limited observational capabilities. We train, validate, and test GCNNs equivariant to discrete subgroups of $E(2)$ - the cyclic and dihedral groups of order $N$ - on the Galaxy10 DECals dataset and find that GCNNs achieve higher classification accuracy and are consistently more robust than their non-equivariant counterparts, with an architecture equivariant to the group $D_{16}$ achieving a $95.52 \pm 0.18\%$ test-set accuracy. We also find that the model loses $<6\%$ accuracy on a $50\%$-noise dataset and all GCNNs are less susceptible to one-pixel perturbations than an identically constructed CNN. Our code is publicly available at https://github.com/snehjp2/GCNNMorphology. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: 10 pages, 4 figures, 3 tables, Accepted to the Machine Learning and the Physical Sciences Workshop at NeurIPS 2023

arXiv:2308.12908 [pdf, other]

POLCA: Power Oversubscription in LLM Cloud Providers

Authors: Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, Ricardo Bianchini

Abstract: Recent innovation in large language models (LLMs), and their myriad use-cases have rapidly driven up the compute capacity demand for datacenter GPUs. Several cloud providers and other enterprises have made substantial plans of growth in their datacenters to support these new workloads. One of the key bottleneck resources in datacenters is power, and given the increasing model sizes of LLMs, they a… ▽ More Recent innovation in large language models (LLMs), and their myriad use-cases have rapidly driven up the compute capacity demand for datacenter GPUs. Several cloud providers and other enterprises have made substantial plans of growth in their datacenters to support these new workloads. One of the key bottleneck resources in datacenters is power, and given the increasing model sizes of LLMs, they are becoming increasingly power intensive. In this paper, we show that there is a significant opportunity to oversubscribe power in LLM clusters. Power oversubscription improves the power efficiency of these datacenters, allowing more deployable servers per datacenter, and reduces the deployment time, since building new datacenters is slow. We extensively characterize the power consumption patterns of a variety of LLMs and their configurations. We identify the differences between the inference and training power consumption patterns. Based on our analysis of these LLMs, we claim that the average and peak power utilization in LLM clusters for inference should not be very high. Our deductions align with the data from production LLM clusters, revealing that inference workloads offer substantial headroom for power oversubscription. However, the stringent set of telemetry and controls that GPUs offer in a virtualized environment, makes it challenging to have a reliable and robust power oversubscription mechanism. We propose POLCA, our framework for power oversubscription that is robust, reliable, and readily deployable for GPU clusters. Using open-source models to replicate the power patterns observed in production, we simulate POLCA and demonstrate that we can deploy 30% more servers in the same GPU cluster for inference, with minimal performance loss △ Less

Submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.07954 [pdf, other]

doi 10.1073/pnas.2311888121

APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics

Authors: Hyun Park, Parth Patel, Roland Haas, E. A. Huerta

Abstract: The prediction of protein 3D structure from amino acid sequence is a computational grand challenge in biophysics, and plays a key role in robust protein structure prediction algorithms, from drug discovery to genome interpretation. The advent of AI models, such as AlphaFold, is revolutionizing applications that depend on robust protein structure prediction algorithms. To maximize the impact, and e… ▽ More The prediction of protein 3D structure from amino acid sequence is a computational grand challenge in biophysics, and plays a key role in robust protein structure prediction algorithms, from drug discovery to genome interpretation. The advent of AI models, such as AlphaFold, is revolutionizing applications that depend on robust protein structure prediction algorithms. To maximize the impact, and ease the usability, of these novel AI tools we introduce APACE, AlphaFold2 and advanced computing as a service, a novel computational framework that effectively handles this AI model and its TB-size database to conduct accelerated protein structure prediction analyses in modern supercomputing environments. We deployed APACE in the Delta and Polaris supercomputers, and quantified its performance for accurate protein structure predictions using four exemplar proteins: 6AWO, 6OAN, 7MEZ, and 6D6U. Using up to 300 ensembles, distributed across 200 NVIDIA A100 GPUs, we found that APACE is up to two orders of magnitude faster than off-the-self AlphaFold2 implementations, reducing time-to-solution from weeks to minutes. This computational approach may be readily linked with robotics laboratories to automate and accelerate scientific discovery. △ Less

Submitted 1 July, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

Comments: 7 pages, 4 figures, 2 tables

ACM Class: I.2

Journal ref: Proceedings of the National Academy of Sciences, 121, 27, (2024)

arXiv:2306.16940 [pdf, other]

BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion

Authors: Michael J. Black, Priyanka Patel, Joachim Tesch, Jinlong Yang

Abstract: We show, for the first time, that neural networks trained only on synthetic data achieve state-of-the-art accuracy on the problem of 3D human pose and shape (HPS) estimation from real images. Previous synthetic datasets have been small, unrealistic, or lacked realistic clothing. Achieving sufficient realism is non-trivial and we show how to do this for full bodies in motion. Specifically, our BEDL… ▽ More We show, for the first time, that neural networks trained only on synthetic data achieve state-of-the-art accuracy on the problem of 3D human pose and shape (HPS) estimation from real images. Previous synthetic datasets have been small, unrealistic, or lacked realistic clothing. Achieving sufficient realism is non-trivial and we show how to do this for full bodies in motion. Specifically, our BEDLAM dataset contains monocular RGB videos with ground-truth 3D bodies in SMPL-X format. It includes a diversity of body shapes, motions, skin tones, hair, and clothing. The clothing is realistically simulated on the moving bodies using commercial clothing physics simulation. We render varying numbers of people in realistic scenes with varied lighting and camera motions. We then train various HPS regressors using BEDLAM and achieve state-of-the-art accuracy on real-image benchmarks despite training with synthetic data. We use BEDLAM to gain insights into what model design choices are important for accuracy. With good synthetic training data, we find that a basic method like HMR approaches the accuracy of the current SOTA method (CLIFF). BEDLAM is useful for a variety of tasks and all images, ground truth bodies, 3D clothing, support code, and more are available for research purposes. Additionally, we provide detailed information about our synthetic data generation pipeline, enabling others to generate their own datasets. See the project page: https://bedlam.is.tue.mpg.de/. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Journal ref: CVPR 2023

arXiv:2306.03494 [pdf, other]

LegoNet: Alternating Model Blocks for Medical Image Segmentation

Authors: Ikboljon Sobirov, Cheng Xie, Muhammad Siddique, Parijat Patel, Kenneth Chan, Thomas Halborg, Christos Kotanidis, Zarqiash Fatima, Henry West, Keith Channon, Stefan Neubauer, Charalambos Antoniades, Mohammad Yaqub

Abstract: Since the emergence of convolutional neural networks (CNNs), and later vision transformers (ViTs), the common paradigm for model development has always been using a set of identical block types with varying parameters/hyper-parameters. To leverage the benefits of different architectural designs (e.g. CNNs and ViTs), we propose to alternate structurally different types of blocks to generate a new a… ▽ More Since the emergence of convolutional neural networks (CNNs), and later vision transformers (ViTs), the common paradigm for model development has always been using a set of identical block types with varying parameters/hyper-parameters. To leverage the benefits of different architectural designs (e.g. CNNs and ViTs), we propose to alternate structurally different types of blocks to generate a new architecture, mimicking how Lego blocks can be assembled together. Using two CNN-based and one SwinViT-based blocks, we investigate three variations to the so-called LegoNet that applies the new concept of block alternation for the segmentation task in medical imaging. We also study a new clinical problem which has not been investigated before, namely the right internal mammary artery (RIMA) and perivascular space segmentation from computed tomography angiography (CTA) which has demonstrated a prognostic value to major cardiovascular outcomes. We compare the model performance against popular CNN and ViT architectures using two large datasets (e.g. achieving 0.749 dice similarity coefficient (DSC) on the larger dataset). We evaluate the performance of the model on three external testing cohorts as well, where an expert clinician made corrections to the model segmented results (DSC>0.90 for the three cohorts). To assess our proposed model for suitability in clinical use, we perform intra- and inter-observer variability analysis. Finally, we investigate a joint self-supervised learning approach to assess its impact on model performance. The code and the pretrained model weights will be available upon acceptance. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: 12 pages, 5 figures, 4 tables

arXiv:2305.19467 [pdf]

Synthetic CT Generation from MRI using 3D Transformer-based Denoising Diffusion Model

Authors: Shaoyan Pan, Elham Abouei, Jacob Wynne, Tonghe Wang, Richard L. J. Qiu, Yuheng Li, Chih-Wei Chang, Junbo Peng, Justin Roper, Pretesh Patel, David S. Yu, Hui Mao, Xiaofeng Yang

Abstract: Magnetic resonance imaging (MRI)-based synthetic computed tomography (sCT) simplifies radiation therapy treatment planning by eliminating the need for CT simulation and error-prone image registration, ultimately reducing patient radiation dose and setup uncertainty. We propose an MRI-to-CT transformer-based denoising diffusion probabilistic model (MC-DDPM) to transform MRI into high-quality sCT to… ▽ More Magnetic resonance imaging (MRI)-based synthetic computed tomography (sCT) simplifies radiation therapy treatment planning by eliminating the need for CT simulation and error-prone image registration, ultimately reducing patient radiation dose and setup uncertainty. We propose an MRI-to-CT transformer-based denoising diffusion probabilistic model (MC-DDPM) to transform MRI into high-quality sCT to facilitate radiation treatment planning. MC-DDPM implements diffusion processes with a shifted-window transformer network to generate sCT from MRI. The proposed model consists of two processes: a forward process which adds Gaussian noise to real CT scans, and a reverse process in which a shifted-window transformer V-net (Swin-Vnet) denoises the noisy CT scans conditioned on the MRI from the same patient to produce noise-free CT scans. With an optimally trained Swin-Vnet, the reverse diffusion process was used to generate sCT scans matching MRI anatomy. We evaluated the proposed method by generating sCT from MRI on a brain dataset and a prostate dataset. Qualitative evaluation was performed using the mean absolute error (MAE) of Hounsfield unit (HU), peak signal to noise ratio (PSNR), multi-scale Structure Similarity index (MS-SSIM) and normalized cross correlation (NCC) indexes between ground truth CTs and sCTs. MC-DDPM generated brain sCTs with state-of-the-art quantitative results with MAE 43.317 HU, PSNR 27.046 dB, SSIM 0.965, and NCC 0.983. For the prostate dataset, MC-DDPM achieved MAE 59.953 HU, PSNR 26.920 dB, SSIM 0.849, and NCC 0.948. In conclusion, we have developed and validated a novel approach for generating CT images from routine MRIs using a transformer-based DDPM. This model effectively captures the complex relationship between CT and MRI images, allowing for robust and high-quality synthetic CT (sCT) images to be generated in minutes. △ Less

Submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.02850 [pdf, other]

Impossibility of Depth Reduction in Explainable Clustering

Authors: Chengyuan Deng, Surya Teja Gavva, Karthik C. S., Parth Patel, Adarsh Srinivasan

Abstract: Over the last few years Explainable Clustering has gathered a lot of attention. Dasgupta et al. [ICML'20] initiated the study of explainable k-means and k-median clustering problems where the explanation is captured by a threshold decision tree which partitions the space at each node using axis parallel hyperplanes. Recently, Laber et al. [Pattern Recognition'23] made a case to consider the depth… ▽ More Over the last few years Explainable Clustering has gathered a lot of attention. Dasgupta et al. [ICML'20] initiated the study of explainable k-means and k-median clustering problems where the explanation is captured by a threshold decision tree which partitions the space at each node using axis parallel hyperplanes. Recently, Laber et al. [Pattern Recognition'23] made a case to consider the depth of the decision tree as an additional complexity measure of interest. In this work, we prove that even when the input points are in the Euclidean plane, then any depth reduction in the explanation incurs unbounded loss in the k-means and k-median cost. Formally, we show that there exists a data set X in the Euclidean plane, for which there is a decision tree of depth k-1 whose k-means/k-median cost matches the optimal clustering cost of X, but every decision tree of depth less than k-1 has unbounded cost w.r.t. the optimal cost of clustering. We extend our results to the k-center objective as well, albeit with weaker guarantees. △ Less

Submitted 4 May, 2023; originally announced May 2023.

arXiv:2305.00385 [pdf]

Cross-Shaped Windows Transformer with Self-supervised Pretraining for Clinically Significant Prostate Cancer Detection in Bi-parametric MRI

Authors: Yuheng Li, Jacob Wynne, Jing Wang, Richard L. J. Qiu, Justin Roper, Shaoyan Pan, Ashesh B. Jani, Tian Liu, Pretesh R. Patel, Hui Mao, Xiaofeng Yang

Abstract: Biparametric magnetic resonance imaging (bpMRI) has demonstrated promising results in prostate cancer (PCa) detection using convolutional neural networks (CNNs). Recently, transformers have achieved competitive performance compared to CNNs in computer vision. Large scale transformers need abundant annotated data for training, which are difficult to obtain in medical imaging. Self-supervised learni… ▽ More Biparametric magnetic resonance imaging (bpMRI) has demonstrated promising results in prostate cancer (PCa) detection using convolutional neural networks (CNNs). Recently, transformers have achieved competitive performance compared to CNNs in computer vision. Large scale transformers need abundant annotated data for training, which are difficult to obtain in medical imaging. Self-supervised learning (SSL) utilizes unlabeled data to generate meaningful semantic representations without the need for costly annotations, enhancing model performance on tasks with limited labeled data. We introduce a novel end-to-end Cross-Shaped windows (CSwin) transformer UNet model, CSwin UNet, to detect clinically significant prostate cancer (csPCa) in prostate bi-parametric MR imaging (bpMRI) and demonstrate the effectiveness of our proposed self-supervised pre-training framework. Using a large prostate bpMRI dataset with 1500 patients, we first pretrain CSwin transformer using multi-task self-supervised learning to improve data-efficiency and network generalizability. We then finetune using lesion annotations to perform csPCa detection. Five-fold cross validation shows that self-supervised CSwin UNet achieves 0.888 AUC and 0.545 Average Precision (AP), significantly outperforming four comparable models (Swin UNETR, DynUNet, Attention UNet, UNet). Using a separate bpMRI dataset with 158 patients, we evaluate our method robustness to external hold-out data. Self-supervised CSwin UNet achieves 0.79 AUC and 0.45 AP, still outperforming all other comparable methods and demonstrating good generalization to external data. △ Less

Submitted 17 March, 2024; v1 submitted 30 April, 2023; originally announced May 2023.

arXiv:2304.04488 [pdf, other]

Hybrid Computing for Interactive Datacenter Applications

Authors: Pratyush Patel, Katie Lim, Kushal Jhunjhunwalla, Ashlie Martinez, Max Demoulin, Jacob Nelson, Irene Zhang, Thomas Anderson

Abstract: Field-Programmable Gate Arrays (FPGAs) are more energy efficient and cost effective than CPUs for a wide variety of datacenter applications. Yet, for latency-sensitive and bursty workloads, this advantage can be difficult to harness due to high FPGA spin-up costs. We propose that a hybrid FPGA and CPU computing framework can harness the energy efficiency benefits of FPGAs for such workloads at rea… ▽ More Field-Programmable Gate Arrays (FPGAs) are more energy efficient and cost effective than CPUs for a wide variety of datacenter applications. Yet, for latency-sensitive and bursty workloads, this advantage can be difficult to harness due to high FPGA spin-up costs. We propose that a hybrid FPGA and CPU computing framework can harness the energy efficiency benefits of FPGAs for such workloads at reasonable cost. Our key insight is to use FPGAs for stable-state workload and CPUs for short-term workload bursts. Using this insight, we design Spork, a lightweight hybrid scheduler that can realize these energy efficiency and cost benefits in practice. Depending on the desired objective, Spork can trade off energy efficiency for cost reduction and vice versa. It is parameterized with key differences between FPGAs and CPUs in terms of power draw, performance, cost, and spin-up latency. We vary this parameter space and analyze various application and worker configurations on production and synthetic traces. Our evaluation of cloud workloads shows that energy-optimized Spork is not only more energy efficient but it is also cheaper than homogeneous platforms--for short application requests with tight deadlines, it is 1.53x more energy efficient and 2.14x cheaper than using only FPGAs. Relative to an idealized version of an existing cost-optimized hybrid scheduler, energy-optimized Spork provides 1.2-2.4x higher energy efficiency at comparable cost, while cost-optimized Spork provides 1.1-2x higher energy efficiency at 1.06-1.2x lower cost. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: 13 pages

arXiv:2301.02998 [pdf, other]

InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers

Authors: Leonid Boytsov, Preksha Patel, Vivek Sourabh, Riddhi Nisar, Sayani Kundu, Ramya Ramanathan, Eric Nyberg

Abstract: We carried out a reproducibility study of InPars, which is a method for unsupervised training of neural rankers (Bonifacio et al., 2022). As a by-product, we developed InPars-light, which is a simple-yet-effective modification of InPars. Unlike InPars, InPars-light uses 7x-100x smaller ranking models and only a freely available language model BLOOM, which -- as we found out -- produced more accura… ▽ More We carried out a reproducibility study of InPars, which is a method for unsupervised training of neural rankers (Bonifacio et al., 2022). As a by-product, we developed InPars-light, which is a simple-yet-effective modification of InPars. Unlike InPars, InPars-light uses 7x-100x smaller ranking models and only a freely available language model BLOOM, which -- as we found out -- produced more accurate rankers compared to a proprietary GPT-3 model. On all five English retrieval collections (used in the original InPars study) we obtained substantial (7%-30%) and statistically significant improvements over BM25 (in nDCG and MRR) using only a 30M parameter six-layer MiniLM-30M ranker and a single three-shot prompt. In contrast, in the InPars study only a 100x larger monoT5-3B model consistently outperformed BM25, whereas their smaller monoT5-220M model (which is still 7x larger than our MiniLM ranker) outperformed BM25 only on MS MARCO and TREC DL 2020. In the same three-shot prompting scenario, our 435M parameter DeBERTA v3 ranker was at par with the 7x larger monoT5-3B (average gain over BM25 of 1.3 vs 1.32): In fact, on three out of five datasets, DeBERTA slightly outperformed monoT5-3B. Finally, these good results were achieved by re-ranking only 100 candidate documents compared to 1000 used by Bonifacio et al. (2022). We believe that InPars-light is the first truly cost-effective prompt-based unsupervised recipe to train and deploy neural ranking models that outperform BM25. Our code and data is publicly available. https://github.com/searchivarius/inpars_light/ △ Less

Submitted 20 February, 2024; v1 submitted 8 January, 2023; originally announced January 2023.

arXiv:2212.04621 [pdf, other]

A systematic literature review on Virtual Reality and Augmented Reality in terms of privacy, authorization and data-leaks

Authors: Parth Dipakkumar Patel, Prem Trivedi

Abstract: In recent years, VR and AR has exploded into a multimillionaire market. As this emerging technology has spread to a variety of businesses and is rapidly increasing among users. It is critical to address potential privacy and security concerns that these technologies might pose. In this study, we discuss the current status of privacy and security in VR and AR. We analyse possible problems and risks… ▽ More In recent years, VR and AR has exploded into a multimillionaire market. As this emerging technology has spread to a variety of businesses and is rapidly increasing among users. It is critical to address potential privacy and security concerns that these technologies might pose. In this study, we discuss the current status of privacy and security in VR and AR. We analyse possible problems and risks. Besides, we will look in detail at a few of the major concerns issues and related security solutions for AR and VR. Additionally, as VR and AR authentication is the most thoroughly studied aspect of the problem, we concentrate on the research that has already been done in this area. △ Less

Submitted 8 December, 2022; originally announced December 2022.

Comments: 9 Pages, 4 figures

arXiv:2211.10028 [pdf]

Comparative evaluation of different methods of "Homomorphic Encryption" and "Traditional Encryption" on a dataset with current problems and developments

Authors: Tanvi S. Patel, Srinivasakranthikiran Kolachina, Daxesh P. Patel, Pranav S. Shrivastav

Abstract: A database is a prime target for cyber-attacks as it contains confidential, sensitive, or protected information. With the increasing sophistication of the internet and dependencies on internet data transmission, it has become vital to be aware of various encryption technologies and trends. It can assist in safeguarding private information and sensitive data, as well as improve the security of clie… ▽ More A database is a prime target for cyber-attacks as it contains confidential, sensitive, or protected information. With the increasing sophistication of the internet and dependencies on internet data transmission, it has become vital to be aware of various encryption technologies and trends. It can assist in safeguarding private information and sensitive data, as well as improve the security of client-server communication. Database encryption is a procedure that employs an algorithm to convert data contained in a database into "cipher text," which is incomprehensible until decoded. Homomorphic encryption technology, which works with encrypted data, can be utilized in both symmetric and asymmetric systems. In this paper, we evaluated homomorphic encryption techniques based on recent highly cited articles, as well as compared all database encryption problems and developments since 2018. The benefits and drawbacks of homomorphic approaches were examined over classic encryption methods including Transparent Database Encryption, Column Level Encryption, Field Level Encryption, File System Level Encryption, and Encrypting File System Encryption in this review. Additionally, popular databases that provide encryption services to their customers to protect their data are also examined. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: 20 pages, 4 Figures

arXiv:2209.06952 [pdf]

doi 10.1088/1361-6501/acb5b3

Landmark Tracking in Liver US images Using Cascade Convolutional Neural Networks with Long Short-Term Memory

Authors: Yupei Zhang, Xianjin Dai, Zhen Tian, Yang Lei, Jacob F. Wynne, Pretesh Patel, Yue Chen, Tian Liu, Xiaofeng Yang

Abstract: This study proposed a deep learning-based tracking method for ultrasound (US) image-guided radiation therapy. The proposed cascade deep learning model is composed of an attention network, a mask region-based convolutional neural network (mask R-CNN), and a long short-term memory (LSTM) network. The attention network learns a mapping from a US image to a suspected area of landmark motion in order t… ▽ More This study proposed a deep learning-based tracking method for ultrasound (US) image-guided radiation therapy. The proposed cascade deep learning model is composed of an attention network, a mask region-based convolutional neural network (mask R-CNN), and a long short-term memory (LSTM) network. The attention network learns a mapping from a US image to a suspected area of landmark motion in order to reduce the search region. The mask R-CNN then produces multiple region-of-interest (ROI) proposals in the reduced region and identifies the proposed landmark via three network heads: bounding box regression, proposal classification, and landmark segmentation. The LSTM network models the temporal relationship among the successive image frames for bounding box regression and proposal classification. To consolidate the final proposal, a selection method is designed according to the similarities between sequential frames. The proposed method was tested on the liver US tracking datasets used in the Medical Image Computing and Computer Assisted Interventions (MICCAI) 2015 challenges, where the landmarks were annotated by three experienced observers to obtain their mean positions. Five-fold cross-validation on the 24 given US sequences with ground truths shows that the mean tracking error for all landmarks is 0.65+/-0.56 mm, and the errors of all landmarks are within 2 mm. We further tested the proposed model on 69 landmarks from the testing dataset that has a similar image pattern to the training pattern, resulting in a mean tracking error of 0.94+/-0.83 mm. Our experimental results have demonstrated the feasibility and accuracy of our proposed method in tracking liver anatomic landmarks using US images, providing a potential solution for real-time liver tracking for active motion management during radiation therapy. △ Less

Submitted 14 September, 2022; originally announced September 2022.

arXiv:2208.13686 [pdf, other]

doi 10.1088/1361-6560/acc721

Deformable Image Registration using Unsupervised Deep Learning for CBCT-guided Abdominal Radiotherapy

Authors: Huiqiao Xie, Yang Lei, Yabo Fu, Tonghe Wang, Justin Roper, Jeffrey D. Bradley, Pretesh Patel, Tian Liu, Xiaofeng Yang

Abstract: CBCTs in image-guided radiotherapy provide crucial anatomy information for patient setup and plan evaluation. Longitudinal CBCT image registration could quantify the inter-fractional anatomic changes. The purpose of this study is to propose an unsupervised deep learning based CBCT-CBCT deformable image registration. The proposed deformable registration workflow consists of training and inference s… ▽ More CBCTs in image-guided radiotherapy provide crucial anatomy information for patient setup and plan evaluation. Longitudinal CBCT image registration could quantify the inter-fractional anatomic changes. The purpose of this study is to propose an unsupervised deep learning based CBCT-CBCT deformable image registration. The proposed deformable registration workflow consists of training and inference stages that share the same feed-forward path through a spatial transformation-based network (STN). The STN consists of a global generative adversarial network (GlobalGAN) and a local GAN (LocalGAN) to predict the coarse- and fine-scale motions, respectively. The network was trained by minimizing the image similarity loss and the deformable vector field (DVF) regularization loss without the supervision of ground truth DVFs. During the inference stage, patches of local DVF were predicted by the trained LocalGAN and fused to form a whole-image DVF. The local whole-image DVF was subsequently combined with the GlobalGAN generated DVF to obtain final DVF. The proposed method was evaluated using 100 fractional CBCTs from 20 abdominal cancer patients in the experiments and 105 fractional CBCTs from a cohort of 21 different abdominal cancer patients in a holdout test. Qualitatively, the registration results show great alignment between the deformed CBCT images and the target CBCT image. Quantitatively, the average target registration error (TRE) calculated on the fiducial markers and manually identified landmarks was 1.91+-1.11 mm. The average mean absolute error (MAE), normalized cross correlation (NCC) between the deformed CBCT and target CBCT were 33.42+-7.48 HU, 0.94+-0.04, respectively. This promising registration method could provide fast and accurate longitudinal CBCT alignment to facilitate inter-fractional anatomic changes analysis and prediction. △ Less

Submitted 29 August, 2022; originally announced August 2022.

arXiv:2205.12538 [pdf, other]

Is a Question Decomposition Unit All We Need?

Authors: Pruthvi Patel, Swaroop Mishra, Mihir Parmar, Chitta Baral

Abstract: Large Language Models (LMs) have achieved state-of-the-art performance on many Natural Language Processing (NLP) benchmarks. With the growing number of new benchmarks, we build bigger and more complex LMs. However, building new LMs may not be an ideal option owing to the cost, time and environmental impact associated with it. We explore an alternative route: can we modify data by expressing it in… ▽ More Large Language Models (LMs) have achieved state-of-the-art performance on many Natural Language Processing (NLP) benchmarks. With the growing number of new benchmarks, we build bigger and more complex LMs. However, building new LMs may not be an ideal option owing to the cost, time and environmental impact associated with it. We explore an alternative route: can we modify data by expressing it in terms of the model's strengths, so that a question becomes easier for models to answer? We investigate if humans can decompose a hard question into a set of simpler questions that are relatively easier for models to solve. We analyze a range of datasets involving various forms of reasoning and find that it is indeed possible to significantly improve model performance (24% for GPT3 and 29% for RoBERTa-SQuAD along with a symbolic calculator) via decomposition. Our approach provides a viable option to involve people in NLP research in a meaningful way. Our findings indicate that Human-in-the-loop Question Decomposition (HQD) can potentially provide an alternate path to building large LMs. Code and data is available at https://github.com/Pruthvi98/QuestionDecomposition △ Less

Submitted 26 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: EMNLP 2022 (17 pages)

arXiv:2204.11835 [pdf, other]

A Novel Scalable Apache Spark Based Feature Extraction Approaches for Huge Protein Sequence and their Clustering Performance Analysis

Authors: Preeti Jha, Aruna Tiwari, Neha Bharill, Milind Ratnaparkhe, Om Prakash Patel, Nilagiri Harshith, Mukkamalla Mounika, Neha Nagendra

Abstract: Genome sequencing projects are rapidly increasing the number of high-dimensional protein sequence datasets. Clustering a high-dimensional protein sequence dataset using traditional machine learning approaches poses many challenges. Many different feature extraction methods exist and are widely used. However, extracting features from millions of protein sequences becomes impractical because they ar… ▽ More Genome sequencing projects are rapidly increasing the number of high-dimensional protein sequence datasets. Clustering a high-dimensional protein sequence dataset using traditional machine learning approaches poses many challenges. Many different feature extraction methods exist and are widely used. However, extracting features from millions of protein sequences becomes impractical because they are not scalable with current algorithms. Therefore, there is a need for an efficient feature extraction approach that extracts significant features. We have proposed two scalable feature extraction approaches for extracting features from huge protein sequences using Apache Spark, which are termed 60d-SPF (60-dimensional Scalable Protein Feature) and 6d-SCPSF (6-dimensional Scalable Co-occurrence-based Probability-Specific Feature). The proposed 60d-SPF and 6d-SCPSF approaches capture the statistical properties of amino acids to create a fixed-length numeric feature vector that represents each protein sequence in terms of 60-dimensional and 6-dimensional features, respectively. The preprocessed huge protein sequences are used as an input in two clustering algorithms, i.e., Scalable Random Sampling with Iterative Optimization Fuzzy c-Means (SRSIO-FCM) and Scalable Literal Fuzzy C-Means (SLFCM) for clustering. We have conducted extensive experiments on various soybean protein datasets to demonstrate the effectiveness of the proposed feature extraction methods, 60d-SPF, 6d-SCPSF, and existing feature extraction methods on SRSIO-FCM and SLFCM clustering algorithms. The reported results in terms of the Silhouette index and the Davies-Bouldin index show that the proposed 60d-SPF extraction method on SRSIO-FCM and SLFCM clustering algorithms achieves significantly better results than the proposed 6d-SCPSF and existing feature extraction approaches. △ Less

Submitted 21 April, 2022; originally announced April 2022.

arXiv:2204.10183 [pdf, other]

Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware

Authors: Bharath Sudharsan, Dineshkumar Sundaram, Pankesh Patel, John G. Breslin, Muhammad Intizar Ali, Schahram Dustdar, Albert Zomaya, Rajiv Ranjan

Abstract: The majority of IoT devices like smartwatches, smart plugs, HVAC controllers, etc., are powered by hardware with a constrained specification (low memory, clock speed and processor) which is insufficient to accommodate and execute large, high-quality models. On such resource-constrained devices, manufacturers still manage to provide attractive functionalities (to boost sales) by following the tradi… ▽ More The majority of IoT devices like smartwatches, smart plugs, HVAC controllers, etc., are powered by hardware with a constrained specification (low memory, clock speed and processor) which is insufficient to accommodate and execute large, high-quality models. On such resource-constrained devices, manufacturers still manage to provide attractive functionalities (to boost sales) by following the traditional approach of programming IoT devices/products to collect and transmit data (image, audio, sensor readings, etc.) to their cloud-based ML analytics platforms. For decades, this online approach has been facing issues such as compromised data streams, non-real-time analytics due to latency, bandwidth constraints, costly subscriptions, recent privacy issues raised by users and the GDPR guidelines, etc. In this paper, to enable ultra-fast and accurate AI-based offline analytics on resource-constrained IoT devices, we present an end-to-end multi-component model optimization sequence and open-source its implementation. Researchers and developers can use our optimization sequence to optimize high memory, computation demanding models in multiple aspects in order to produce small size, low latency, low-power consuming models that can comfortably fit and execute on resource-constrained hardware. The experimental results show that our optimization components can produce models that are; (i) 12.06 x times compressed; (ii) 0.13% to 0.27% more accurate; (iii) Orders of magnitude faster unit inference at 0.06 ms. Our optimization sequence is generic and can be applied to any state-of-the-art models trained for anomaly detection, predictive maintenance, robotics, voice recognition, and machine vision. △ Less

Submitted 20 April, 2022; originally announced April 2022.

arXiv:2204.09904 [pdf, other]

doi 10.1111/cgf.14527

Infographics Wizard: Flexible Infographics Authoring and Design Exploration

Authors: Anjul Tyagi, Jian Zhao, Pushkar Patel, Swasti Khurana, Klaus Mueller

Abstract: Infographics are an aesthetic visual representation of information following specific design principles of human perception. Designing infographics can be a tedious process for non-experts and time-consuming, even for professional designers. With the help of designers, we propose a semi-automated infographic framework for general structured and flow-based infographic design generation. For novice… ▽ More Infographics are an aesthetic visual representation of information following specific design principles of human perception. Designing infographics can be a tedious process for non-experts and time-consuming, even for professional designers. With the help of designers, we propose a semi-automated infographic framework for general structured and flow-based infographic design generation. For novice designers, our framework automatically creates and ranks infographic designs for a user-provided text with no requirement for design input. However, expert designers can still provide custom design inputs to customize the infographics. We will also contribute an individual visual group (VG) designs dataset (in SVG), along with a 1k complete infographic image dataset with segmented VGs in this work. Evaluation results confirm that by using our framework, designers from all expertise levels can generate generic infographic designs faster than existing methods while maintaining the same quality as hand-designed infographics templates. △ Less

Submitted 8 May, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

Comments: Preprint of the EUROVIS 22 accepted paper. arXiv admin note: substantial text overlap with arXiv:2108.11914

ACM Class: H.5.2; I.4.6; J.5

Journal ref: Computer Graphics Forum, 2022, 41: 121-132

arXiv:2203.05931 [pdf, other]

FedSyn: Synthetic Data Generation using Federated Learning

Authors: Monik Raj Behera, Sudhir Upadhyay, Suresh Shetty, Sudha Priyadarshini, Palka Patel, Ker Farn Lee

Abstract: As Deep Learning algorithms continue to evolve and become more sophisticated, they require massive datasets for model training and efficacy of models. Some of those data requirements can be met with the help of existing datasets within the organizations. Current Machine Learning practices can be leveraged to generate synthetic data from an existing dataset. Further, it is well established that div… ▽ More As Deep Learning algorithms continue to evolve and become more sophisticated, they require massive datasets for model training and efficacy of models. Some of those data requirements can be met with the help of existing datasets within the organizations. Current Machine Learning practices can be leveraged to generate synthetic data from an existing dataset. Further, it is well established that diversity in generated synthetic data relies on (and is perhaps limited by) statistical properties of available dataset within a single organization or entity. The more diverse an existing dataset is, the more expressive and generic synthetic data can be. However, given the scarcity of underlying data, it is challenging to collate big data in one organization. The diverse, non-overlapping dataset across distinct organizations provides an opportunity for them to contribute their limited distinct data to a larger pool that can be leveraged to further synthesize. Unfortunately, this raises data privacy concerns that some institutions may not be comfortable with. This paper proposes a novel approach to generate synthetic data - FedSyn. FedSyn is a collaborative, privacy preserving approach to generate synthetic data among multiple participants in a federated and collaborative network. FedSyn creates a synthetic data generation model, which can generate synthetic data consisting of statistical distribution of almost all the participants in the network. FedSyn does not require access to the data of an individual participant, hence protecting the privacy of participant's data. The proposed technique in this paper leverages federated machine learning and generative adversarial network (GAN) as neural network architecture for synthetic data generation. The proposed method can be extended to many machine learning problem classes in finance, health, governance, technology and many more. △ Less

Submitted 5 April, 2022; v1 submitted 11 March, 2022; originally announced March 2022.

arXiv:2201.02269 [pdf]

Investigating Expectation Violations in Mobile Apps

Authors: Sherlock A. Licorish, Helen E. Owen, Bastin Tony Roy Savarimuthu, Priyanka Patel

Abstract: Information technology and software services are pervasive, occupying the centre of most aspects of contemporary societies. This has given rise to commonly expected norms and expectations around how such systems should work, appropriate penalties for violating these expectations, and more importantly, indicators of how to reduce the consequences of violations and sanctions. Evidence for expectatio… ▽ More Information technology and software services are pervasive, occupying the centre of most aspects of contemporary societies. This has given rise to commonly expected norms and expectations around how such systems should work, appropriate penalties for violating these expectations, and more importantly, indicators of how to reduce the consequences of violations and sanctions. Evidence for expectation violations and ensuing sanctions exists in a range of portals used by individuals and groups to start new friendships, explore new ideas, and provide feedback for products and services. Therein lies insights that could lead to functional socio-technical systems, and general awareness and anticipations of human actions (and interactions) when using information technology and software services. However, limited previous work has examined such artifacts to provide these understandings. To contribute to such understandings and theoretical advancement we study expectation violations in mobile apps, considered among the most engaging socio-technical systems. We used content analysis and expectancy violation theory (EVT) and expectation confirmation theory (ECT) to explore the evidence and nature of sanctions in app reviews for a specific domain of apps. Our outcomes show that users respond to expectation violation with sanctions when their app does not work as anticipated, developers seem to target specific market niches when providing services in an app domain, and users within an app domain respond with similar sanctions. We contribute to the advancement of expectation violation theories, and we provide practical insights for the mobile app community. △ Less

Submitted 6 January, 2022; originally announced January 2022.

Comments: 32 pages, 4 figures, 8 tables

ACM Class: D.2.1; D.2.7; H.3.1; H.5.2; I.7.1; J.4; K.4.2; K.4.3

arXiv:2201.01283 [pdf, other]

Self-supervised Learning from 100 Million Medical Images

Authors: Florin C. Ghesu, Bogdan Georgescu, Awais Mansoor, Youngjin Yoo, Dominik Neumann, Pragneshkumar Patel, R. S. Vishwanath, James M. Balter, Yue Cao, Sasa Grbic, Dorin Comaniciu

Abstract: Building accurate and robust artificial intelligence systems for medical image assessment requires not only the research and design of advanced deep learning models but also the creation of large and curated sets of annotated training examples. Constructing such datasets, however, is often very costly -- due to the complex nature of annotation tasks and the high level of expertise required for the… ▽ More Building accurate and robust artificial intelligence systems for medical image assessment requires not only the research and design of advanced deep learning models but also the creation of large and curated sets of annotated training examples. Constructing such datasets, however, is often very costly -- due to the complex nature of annotation tasks and the high level of expertise required for the interpretation of medical images (e.g., expert radiologists). To counter this limitation, we propose a method for self-supervised learning of rich image features based on contrastive learning and online feature clustering. For this purpose we leverage large training datasets of over 100,000,000 medical images of various modalities, including radiography, computed tomography (CT), magnetic resonance (MR) imaging and ultrasonography. We propose to use these features to guide model training in supervised and hybrid self-supervised/supervised regime on various downstream tasks. We highlight a number of advantages of this strategy on challenging image assessment problems in radiography, CT and MR: 1) Significant increase in accuracy compared to the state-of-the-art (e.g., AUC boost of 3-7% for detection of abnormalities from chest radiography scans and hemorrhage detection on brain CT); 2) Acceleration of model convergence during training by up to 85% compared to using no pretraining (e.g., 83% when training a model for detection of brain metastases in MR scans); 3) Increase in robustness to various image augmentations, such as intensity variations, rotations or scaling reflective of data variation seen in the field. △ Less

Submitted 4 January, 2022; originally announced January 2022.

arXiv:2111.07060 [pdf, other]

PAMMELA: Policy Administration Methodology using Machine Learning

Authors: Varun Gumma, Barsha Mitra, Soumyadeep Dey, Pratik Shashikantbhai Patel, Sourabh Suman, Saptarshi Das

Abstract: In recent years, Attribute-Based Access Control (ABAC) has become quite popular and effective for enforcing access control in dynamic and collaborative environments. Implementation of ABAC requires the creation of a set of attribute-based rules which cumulatively form a policy. Designing an ABAC policy ab initio demands a substantial amount of effort from the system administrator. Moreover, organi… ▽ More In recent years, Attribute-Based Access Control (ABAC) has become quite popular and effective for enforcing access control in dynamic and collaborative environments. Implementation of ABAC requires the creation of a set of attribute-based rules which cumulatively form a policy. Designing an ABAC policy ab initio demands a substantial amount of effort from the system administrator. Moreover, organizational changes may necessitate the inclusion of new rules in an already deployed policy. In such a case, re-mining the entire ABAC policy will require a considerable amount of time and administrative effort. Instead, it is better to incrementally augment the policy. Keeping these aspects of reducing administrative overhead in mind, in this paper, we propose PAMMELA, a Policy Administration Methodology using Machine Learning to help system administrators in creating new ABAC policies as well as augmenting existing ones. PAMMELA can generate a new policy for an organization by learning the rules of a policy currently enforced in a similar organization. For policy augmentation, PAMMELA can infer new rules based on the knowledge gathered from the existing rules. Experimental results show that our proposed approach provides a reasonably good performance in terms of the various machine learning evaluation metrics as well as execution time. △ Less

Submitted 13 November, 2021; originally announced November 2021.

Comments: This work is under progress

arXiv:2110.10617 [pdf, other]

Colosseum: Large-Scale Wireless Experimentation Through Hardware-in-the-Loop Network Emulation

Authors: Leonardo Bonati, Pedram Johari, Michele Polese, Salvatore D'Oro, Subhramoy Mohanti, Miead Tehrani-Moayyed, Davide Villa, Shweta Shrivastava, Chinenye Tassie, Kurt Yoder, Ajeet Bagga, Paresh Patel, Ventz Petkov, Michael Seltser, Francesco Restuccia, Abhimanyu Gosain, Kaushik R. Chowdhury, Stefano Basagni, Tommaso Melodia

Abstract: Colosseum is an open-access and publicly-available large-scale wireless testbed for experimental research via virtualized and softwarized waveforms and protocol stacks on a fully programmable, "white-box" platform. Through 256 state-of-the-art software-defined radios and a massive channel emulator core, Colosseum can model virtually any scenario, enabling the design, development and testing of sol… ▽ More Colosseum is an open-access and publicly-available large-scale wireless testbed for experimental research via virtualized and softwarized waveforms and protocol stacks on a fully programmable, "white-box" platform. Through 256 state-of-the-art software-defined radios and a massive channel emulator core, Colosseum can model virtually any scenario, enabling the design, development and testing of solutions at scale in a variety of deployments and channel conditions. These Colosseum radio-frequency scenarios are reproduced through high-fidelity FPGA-based emulation with finite-impulse response filters. Filters model the taps of desired wireless channels and apply them to the signals generated by the radio nodes, faithfully mimicking the conditions of real-world wireless environments. In this paper, we introduce Colosseum as a testbed that is for the first time open to the research community. We describe the architecture of Colosseum and its experimentation and emulation capabilities. We then demonstrate the effectiveness of Colosseum for experimental research at scale through exemplary use cases including prevailing wireless technologies (e.g., cellular and Wi-Fi) in spectrum sharing and unmanned aerial vehicle scenarios. A roadmap for Colosseum future updates concludes the paper. △ Less

Submitted 14 December, 2021; v1 submitted 20 October, 2021; originally announced October 2021.

arXiv:2108.11914 [pdf, other]

User-Centric Semi-Automated Infographics Authoring and Recommendation

Authors: Anjul Tyagi, Jian Zhao, Pushkar Patel, Swasti Khurana, Klaus Mueller

Abstract: Designing infographics can be a tedious process for non-experts and time-consuming even for professional designers. Based on the literature and a formative study, we propose a flexible framework for automated and semi-automated infographics design. This framework captures the main design components in infographics and streamlines the generation workflow into three steps, allowing users to control… ▽ More Designing infographics can be a tedious process for non-experts and time-consuming even for professional designers. Based on the literature and a formative study, we propose a flexible framework for automated and semi-automated infographics design. This framework captures the main design components in infographics and streamlines the generation workflow into three steps, allowing users to control and optimize each aspect independently. Based on the framework, we also propose an interactive tool, \name{}, for assisting novice designers with creating high-quality infographics from an input in a markdown format by offering recommendations of different design components of infographics. Simultaneously, more experienced designers can provide custom designs and layout ideas to the tool using a canvas to control the automated generation process partially. As part of our work, we also contribute an individual visual group (VG) and connection designs dataset (in SVG), along with a 1k complete infographic image dataset with segmented VGs. This dataset plays a crucial role in diversifying the infographic designs created by our framework. We evaluate our approach with a comparison against similar tools, a user study with novice and expert designers, and a case study. Results confirm that our framework and \name{} excel in creating customized infographics and exploring a large variety of designs. △ Less

Submitted 27 August, 2021; v1 submitted 26 August, 2021; originally announced August 2021.

arXiv:2106.07087 [pdf, other]

Koios: A Deep Learning Benchmark Suite for FPGA Architecture and CAD Research

Authors: Aman Arora, Andrew Boutros, Daniel Rauch, Aishwarya Rajen, Aatman Borda, Seyed Alireza Damghani, Samidh Mehta, Sangram Kate, Pragnesh Patel, Kenneth B. Kent, Vaughn Betz, Lizy K. John

Abstract: With the prevalence of deep learning (DL) in many applications, researchers are investigating different ways of optimizing FPGA architecture and CAD to achieve better quality-of-results (QoR) on DL-based workloads. In this optimization process, benchmark circuits are an essential component; the QoR achieved on a set of benchmarks is the main driver for architecture and CAD design choices. However,… ▽ More With the prevalence of deep learning (DL) in many applications, researchers are investigating different ways of optimizing FPGA architecture and CAD to achieve better quality-of-results (QoR) on DL-based workloads. In this optimization process, benchmark circuits are an essential component; the QoR achieved on a set of benchmarks is the main driver for architecture and CAD design choices. However, current academic benchmark suites are inadequate, as they do not capture any designs from the DL domain. This work presents a new suite of DL acceleration benchmark circuits for FPGA architecture and CAD research, called Koios. This suite of 19 circuits covers a wide variety of accelerated neural networks, design sizes, implementation styles, abstraction levels, and numerical precisions. These designs are larger, more data parallel, more heterogeneous, more deeply pipelined, and utilize more FPGA architectural features compared to existing open-source benchmarks. This enables researchers to pin-point architectural inefficiencies for this class of workloads and optimize CAD tools on more realistic benchmarks that stress the CAD algorithms in different ways. In this paper, we describe the designs in our benchmark suite, present results of running them through the Verilog-to-Routing (VTR) flow using a recent FPGA architecture model, and identify key insights from the resulting metrics. On average, our benchmarks have 3.7x more netlist primitives, 1.8x and 4.7x higher DSP and BRAM densities, and 1.7x higher frequency with 1.9x more near-critical paths compared to the widely-used VTR suite. Finally, we present two example case studies showing how architectural exploration for DL-optimized FPGAs can be performed using our new benchmark suite. △ Less

Submitted 13 June, 2021; originally announced June 2021.

arXiv:2104.14643 [pdf, other]

AGORA: Avatars in Geography Optimized for Regression Analysis

Authors: Priyanka Patel, Chun-Hao P. Huang, Joachim Tesch, David T. Hoffmann, Shashank Tripathi, Michael J. Black

Abstract: While the accuracy of 3D human pose estimation from images has steadily improved on benchmark datasets, the best methods still fail in many real-world scenarios. This suggests that there is a domain gap between current datasets and common scenes containing people. To obtain ground-truth 3D pose, current datasets limit the complexity of clothing, environmental conditions, number of subjects, and oc… ▽ More While the accuracy of 3D human pose estimation from images has steadily improved on benchmark datasets, the best methods still fail in many real-world scenarios. This suggests that there is a domain gap between current datasets and common scenes containing people. To obtain ground-truth 3D pose, current datasets limit the complexity of clothing, environmental conditions, number of subjects, and occlusion. Moreover, current datasets evaluate sparse 3D joint locations corresponding to the major joints of the body, ignoring the hand pose and the face shape. To evaluate the current state-of-the-art methods on more challenging images, and to drive the field to address new problems, we introduce AGORA, a synthetic dataset with high realism and highly accurate ground truth. Here we use 4240 commercially-available, high-quality, textured human scans in diverse poses and natural clothing; this includes 257 scans of children. We create reference 3D poses and body shapes by fitting the SMPL-X body model (with face and hands) to the 3D scans, taking into account clothing. We create around 14K training and 3K test images by rendering between 5 and 15 people per image using either image-based lighting or rendered 3D environments, taking care to make the images physically plausible and photoreal. In total, AGORA consists of 173K individual person crops. We evaluate existing state-of-the-art methods for 3D human pose estimation on this dataset and find that most methods perform poorly on images of children. Hence, we extend the SMPL-X model to better capture the shape of children. Additionally, we fine-tune methods on AGORA and show improved performance on both AGORA and 3DPW, confirming the realism of the dataset. We provide all the registered 3D reference training data, rendered images, and a web-based evaluation site at https://agora.is.tue.mpg.de/. △ Less

Submitted 29 April, 2021; originally announced April 2021.

Journal ref: CVPR 2021

arXiv:2103.06813 [pdf, other]

COVID-19: Optimal Allocation of Ventilator Supply under Uncertainty and Risk

Authors: Xuecheng Yin, I. Esra Buyuktahtakin, Bhumi P. Patel

Abstract: This study presents a new risk-averse multi-stage stochastic epidemics-ventilator-logistics compartmental model to address the resource allocation challenges of mitigating COVID-19. This epidemiological logistics model involves the uncertainty of untested asymptomatic infections and incorporates short-term human migration. Disease transmission is also forecasted through a new formulation of transm… ▽ More This study presents a new risk-averse multi-stage stochastic epidemics-ventilator-logistics compartmental model to address the resource allocation challenges of mitigating COVID-19. This epidemiological logistics model involves the uncertainty of untested asymptomatic infections and incorporates short-term human migration. Disease transmission is also forecasted through a new formulation of transmission rates that evolve over space and time with respect to various non-pharmaceutical interventions, such as wearing masks, social distancing, and lockdown. The proposed multi-stage stochastic model overviews different scenarios on the number of asymptomatic individuals while optimizing the distribution of resources, such as ventilators, to minimize the total expected number of newly infected and deceased people. The Conditional Value at Risk (CVaR) is also incorporated into the multi-stage mean-risk model to allow for a trade-off between the weighted expected loss due to the outbreak and the expected risks associated with experiencing disastrous pandemic scenarios. We apply our multi-stage mean-risk epidemics-ventilator-logistics model to the case of controlling the COVID-19 in highly-impacted counties of New York and New Jersey. We calibrate, validate, and test our model using actual infection, population, and migration data. The results indicate that short-term migration influences the transmission of the disease significantly. The optimal number of ventilators allocated to each region depends on various factors, including the number of initial infections, disease transmission rates, initial ICU capacity, the population of a geographical location, and the availability of ventilator supply. Our data-driven modeling framework can be adapted to study the disease transmission dynamics and logistics of other similar epidemics and pandemics. △ Less

Submitted 9 March, 2021; originally announced March 2021.

Comments: 35 pages, 6 figures, 10 tables, Under Review for a Journal

arXiv:2103.03096 [pdf, other]

Towards Designing Computer Vision-based Explainable-AI Solution: A Use Case of Livestock Mart Industry

Authors: Devam Dave, Het Naik, Smiti Singhal, Rudresh Dwivedi, Pankesh Patel

Abstract: The objective of an online Mart is to match buyers and sellers, to weigh animals and to oversee their sale. A reliable pricing method can be developed by ML models that can read through historical sales data. However, when AI models suggest or recommend a price, that in itself does not reveal too much (i.e., it acts like a black box) about the qualities and the abilities of an animal. An intereste… ▽ More The objective of an online Mart is to match buyers and sellers, to weigh animals and to oversee their sale. A reliable pricing method can be developed by ML models that can read through historical sales data. However, when AI models suggest or recommend a price, that in itself does not reveal too much (i.e., it acts like a black box) about the qualities and the abilities of an animal. An interested buyer would like to know more about the salient features of an animal before making the right choice based on his requirements. A model capable of explaining the different factors that impact the price point is essential for the needs of the market. It can also inspire confidence in buyers and sellers about the price point offered. To achieve these objectives, we have been working with the team at MartEye, a startup based in Portershed in Galway City, Ireland. Through this paper, we report our work-in-progress research towards building a smart video analytic platform, leveraging Explainable AI techniques. △ Less

Submitted 8 February, 2021; originally announced March 2021.

Comments: 8 pages, 5 figures

arXiv:2103.00376 [pdf]

They'll Know It When They See It: Analyzing Post-Release Feedback from the Android Community

Authors: Sherlock A. Licorish, Chan Won Lee, Bastin Tony Roy Savarimuthu, Priyanka Patel, Stephen G. MacDonell

Abstract: It is known that user involvement and user-centered design enhance system acceptance, particularly when end-users' views are considered early in the process. However, the increasingly common method of system deployment, through frequent releases via an online application distribution platform, relies more on post-release feedback from a virtual community. Such feedback may be received from large a… ▽ More It is known that user involvement and user-centered design enhance system acceptance, particularly when end-users' views are considered early in the process. However, the increasingly common method of system deployment, through frequent releases via an online application distribution platform, relies more on post-release feedback from a virtual community. Such feedback may be received from large and diverse communities of users, posing challenges to developers in terms of extracting and identifying the most pressing requests to address. In seeking to tackle these challenges we have used natural language processing techniques to study enhancement requests logged by the Android community. We observe that features associated with a specific subset of topics were most frequently requested for improvement, and that end-users expressed particular discontent with the Jellybean release. End-users also tended to request improvements to specific issues together, potentially posing a prioritization challenge to Google. △ Less

Submitted 27 February, 2021; originally announced March 2021.

Comments: Conference Proceeding, 10 pages, 4 figures, 3 tables

Journal ref: Proceedings of the 21st Americas Conference on Information Systems (AMCIS2015). Puerto Rico, AISeL, 1-11. http://aisel.aisnet.org/amcis2015/VirtualComm/GeneralPresentations/7/

arXiv:2012.05141 [pdf, other]

EMRs with Blockchain : A distributed democratised Electronic Medical Record sharing platform

Authors: Sanket Shevkar, Parthit Patel, Saptarshi Majumder, Harshita Singh, Kshitijaa Jaglan, Hrithwik Shalu

Abstract: Medical data sharing needs to be done with the utmost respect for privacy and security. It contains intimate data of the patient and any access to it must be highly regulated. With the emergence of vertical solutions in healthcare institutions, interoperability across organisations has been hindered. The authors of this paper propose a blockchain based medical-data sharing solution, utilising Hype… ▽ More Medical data sharing needs to be done with the utmost respect for privacy and security. It contains intimate data of the patient and any access to it must be highly regulated. With the emergence of vertical solutions in healthcare institutions, interoperability across organisations has been hindered. The authors of this paper propose a blockchain based medical-data sharing solution, utilising Hyperledger Fabric to regulate access to medical data, and using the InterPlanatory File System for its storage. We believe that the combination of these two distributed solutions can enable patients to access their medical records across healthcare institutions while ensuring non-repudiation, immutability and providing data-ownership. It would enable healthcare practitioners to access all previous medical records in a single location, empowering them with the data required for the effective diagnosis and treatment of patients. Making it safe and straightforward, it would also enable patients to share medical data with research institutions, leading to the creation of reliable data sets, laying the groundwork required for the creation of personalised medicine. △ Less

Submitted 9 December, 2020; originally announced December 2020.

Comments: 8 pages, 2 figures

arXiv:2012.01930 [pdf, ps, other]

Learning Explainable Interventions to Mitigate HIV Transmission in Sex Workers Across Five States in India

Authors: Raghav Awasthi, Prachi Patel, Vineet Joshi, Shama Karkal, Tavpritesh Sethi

Abstract: Female sex workers(FSWs) are one of the most vulnerable and stigmatized groups in society. As a result, they often suffer from a lack of quality access to care. Grassroot organizations engaged in improving health services are often faced with the challenge of improving the effectiveness of interventions due to complex influences. This work combines structure learning, discriminative modeling, and… ▽ More Female sex workers(FSWs) are one of the most vulnerable and stigmatized groups in society. As a result, they often suffer from a lack of quality access to care. Grassroot organizations engaged in improving health services are often faced with the challenge of improving the effectiveness of interventions due to complex influences. This work combines structure learning, discriminative modeling, and grass-root level expertise of designing interventions across five different Indian states to discover the influence of non-obvious factors for improving safe-sex practices in FSWs. A bootstrapped, ensemble-averaged Bayesian Network structure was learned to quantify the factors that could maximize condom usage as revealed from the model. A discriminative model was then constructed using XgBoost and random forest in order to predict condom use behavior The best model achieved 83% sensitivity, 99% specificity, and 99% area under the precision-recall curve for the prediction. Both generative and discriminative modeling approaches revealed that financial literacy training was the primary influence and predictor of condom use in FSWs. These insights have led to a currently ongoing field trial for assessing the real-world utility of this approach. Our work highlights the potential of explainable models for transparent discovery and prioritization of anti-HIV interventions in female sex workers in a resource-limited setting. △ Less

Submitted 30 November, 2020; originally announced December 2020.

Comments: Presented at NeurIPS 2020 Workshop on Machine Learning for the Developing World

arXiv:2011.03195 [pdf, other]

Explainable AI meets Healthcare: A Study on Heart Disease Dataset

Authors: Devam Dave, Het Naik, Smiti Singhal, Pankesh Patel

Abstract: With the increasing availability of structured and unstructured data and the swift progress of analytical techniques, Artificial Intelligence (AI) is bringing a revolution to the healthcare industry. With the increasingly indispensable role of AI in healthcare, there are growing concerns over the lack of transparency and explainability in addition to potential bias encountered by predictions of th… ▽ More With the increasing availability of structured and unstructured data and the swift progress of analytical techniques, Artificial Intelligence (AI) is bringing a revolution to the healthcare industry. With the increasingly indispensable role of AI in healthcare, there are growing concerns over the lack of transparency and explainability in addition to potential bias encountered by predictions of the model. This is where Explainable Artificial Intelligence (XAI) comes into the picture. XAI increases the trust placed in an AI system by medical practitioners as well as AI researchers, and thus, eventually, leads to an increasingly widespread deployment of AI in healthcare. In this paper, we present different interpretability techniques. The aim is to enlighten practitioners on the understandability and interpretability of explainable AI systems using a variety of techniques available which can be very advantageous in the health-care domain. Medical diagnosis model is responsible for human life and we need to be confident enough to treat a patient as instructed by a black-box model. Our paper contains examples based on the heart disease dataset and elucidates on how the explainability techniques should be preferred to create trustworthiness while using AI systems in healthcare. △ Less

Submitted 6 November, 2020; originally announced November 2020.

Comments: 23

arXiv:2010.09893 [pdf, other]

LT-GAN: Self-Supervised GAN with Latent Transformation Detection

Authors: Parth Patel, Nupur Kumari, Mayank Singh, Balaji Krishnamurthy

Abstract: Generative Adversarial Networks (GANs) coupled with self-supervised tasks have shown promising results in unconditional and semi-supervised image generation. We propose a self-supervised approach (LT-GAN) to improve the generation quality and diversity of images by estimating the GAN-induced transformation (i.e. transformation induced in the generated images by perturbing the latent space of gener… ▽ More Generative Adversarial Networks (GANs) coupled with self-supervised tasks have shown promising results in unconditional and semi-supervised image generation. We propose a self-supervised approach (LT-GAN) to improve the generation quality and diversity of images by estimating the GAN-induced transformation (i.e. transformation induced in the generated images by perturbing the latent space of generator). Specifically, given two pairs of images where each pair comprises of a generated image and its transformed version, the self-supervision task aims to identify whether the latent transformation applied in the given pair is same to that of the other pair. Hence, this auxiliary loss encourages the generator to produce images that are distinguishable by the auxiliary network, which in turn promotes the synthesis of semantically consistent images with respect to latent transformations. We show the efficacy of this pretext task by improving the image generation quality in terms of FID on state-of-the-art models for both conditional and unconditional settings on CIFAR-10, CelebA-HQ and ImageNet datasets. Moreover, we empirically show that LT-GAN helps in improving controlled image editing for CelebA-HQ and ImageNet over baseline models. We experimentally demonstrate that our proposed LT self-supervision task can be effectively combined with other state-of-the-art training techniques for added benefits. Consequently, we show that our approach achieves the new state-of-the-art FID score of 9.8 on conditional CIFAR-10 image generation. △ Less

Submitted 19 October, 2020; originally announced October 2020.

Comments: Accepted at WACV2021

arXiv:2010.09687 [pdf, other]

A Demonstration of Smart Doorbell Design Using Federated Deep Learning

Authors: Vatsal Patel, Sarth Kanani, Tapan Pathak, Pankesh Patel, Muhammad Intizar Ali, John Breslin

Abstract: Smart doorbells have been playing an important role in protecting our modern homes. Existing approaches of sending video streams to a centralized server (or Cloud) for video analytics have been facing many challenges such as latency, bandwidth cost and more importantly users' privacy concerns. To address these challenges, this paper showcases the ability of an intelligent smart doorbell based on F… ▽ More Smart doorbells have been playing an important role in protecting our modern homes. Existing approaches of sending video streams to a centralized server (or Cloud) for video analytics have been facing many challenges such as latency, bandwidth cost and more importantly users' privacy concerns. To address these challenges, this paper showcases the ability of an intelligent smart doorbell based on Federated Deep Learning, which can deploy and manage video analytics applications such as a smart doorbell across Edge and Cloud resources. This platform can scale, work with multiple devices, seamlessly manage online orchestration of the application components. The proposed framework is implemented using state-of-the-art technology. We implement the Federated Server using the Flask framework, containerized using Nginx and Gunicorn, which is deployed on AWS EC2 and AWS Serverless architecture. △ Less

Submitted 19 October, 2020; originally announced October 2020.

Comments: 6

arXiv:2010.07680 [pdf, other]

Demonstration of a Cloud-based Software Framework for Video Analytics Application using Low-Cost IoT Devices

Authors: Bhavin Joshi, Tapan Pathak, Vatsal Patel, Sarth Kanani, Pankesh Patel, Muhammad Intizar Ali, John Breslin

Abstract: The design of products and services such as a Smart doorbell, demonstrating video analytics software/algorithm functionality, is expected to address a new kind of requirements such as designing a scalable solution while considering the trade-off between cost and accuracy; a flexible architecture to deploy new AI-based models or update existing models, as user requirements evolve; as well as seamle… ▽ More The design of products and services such as a Smart doorbell, demonstrating video analytics software/algorithm functionality, is expected to address a new kind of requirements such as designing a scalable solution while considering the trade-off between cost and accuracy; a flexible architecture to deploy new AI-based models or update existing models, as user requirements evolve; as well as seamlessly integrating different kinds of user interfaces and devices. To address these challenges, we propose a smart doorbell that orchestrates video analytics across Edge and Cloud resources. The proposal uses AWS as a base platform for implementation and leverages Commercially Available Off-The-Shelf(COTS) affordable devices such as Raspberry Pi in the form of an Edge device. △ Less

Submitted 29 September, 2020; originally announced October 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2009.09065

arXiv:2009.09065 [pdf, other]

A Distributed Framework to Orchestrate Video Analytics Applications

Authors: Tapan Pathak, Vatsal Patel, Sarth Kanani, Shailesh Arya, Pankesh Patel, Muhammad Intizar Ali, John Breslin

Abstract: The concept of the Internet of Things (IoT) is a reality now. This paradigm shift has caught everyones attention in a large class of applications, including IoT-based video analytics using smart doorbells. Due to its growing application segments, various efforts exist in scientific literature and many video-based doorbell solutions are commercially available in the market. However, contemporary of… ▽ More The concept of the Internet of Things (IoT) is a reality now. This paradigm shift has caught everyones attention in a large class of applications, including IoT-based video analytics using smart doorbells. Due to its growing application segments, various efforts exist in scientific literature and many video-based doorbell solutions are commercially available in the market. However, contemporary offerings are bespoke, offering limited composability and reusability of a smart doorbell framework. Second, they are monolithic and proprietary, which means that the implementation details remain hidden from the users. We believe that a transparent design can greatly aid in the development of a smart doorbell, enabling its use in multiple application domains. To address the above-mentioned challenges, we propose a distributed framework to orchestrate video analytics across Edge and Cloud resources. We investigate trade-offs in the distribution of different software components over a bespoke/full system, where components over Edge and Cloud are treated generically. This paper evaluates the proposed framework as well as the state-of-the-art models and presents comparative analysis of them on various metrics (such as overall model accuracy, latency, memory, and CPU usage). The evaluation result demonstrates our intuition very well, showcasing that the AWS-based approach exhibits reasonably high object-detection accuracy, low memory, and CPU usage when compared to the state-of-the-art approaches, but high latency. △ Less

Submitted 17 September, 2020; originally announced September 2020.

Comments: 9

arXiv:2007.08652 [pdf]

doi 10.5121/csit.2020.100906

Prediction of Cancer Microarray and DNA Methylation Data using Non-negative Matrix Factorization

Authors: Parth Patel, Kalpdrum Passi, Chakresh Kumar Jain

Abstract: Over the past few years, there has been a considerable spread of microarray technology in many biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies in their dimensionality, and thus for an efficient and effective means of studying the same, a… ▽ More Over the past few years, there has been a considerable spread of microarray technology in many biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms and approaches for the reduction of dimensionality of such microarray datasets. This study exploits the matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification accuracies are then compared for these algorithms. This technique gives an accuracy of 98%. △ Less

Submitted 15 July, 2020; originally announced July 2020.

Comments: 9th International Conference on Data Mining & Knowledge Management Process (CDKP 2020)

ACM Class: I.2.7; J.3

arXiv:2006.15314 [pdf, other]

A Blockchain-based Approach for Assessing Compliance with SLA-guaranteed IoT Services

Authors: A. Alzubaidi, K. Mitra, P. Patel, E. Solaiman

Abstract: Within cloud-based internet of things (IoT) applications, typically cloud providers employ Service Level Agreements (SLAs) to ensure the quality of their provisioned services. Similar to any other contractual method, an SLA is not immune to breaches. Ideally, an SLA stipulates consequences (e.g. penalties) imposed on cloud providers when they fail to conform to SLA terms. The current practice assu… ▽ More Within cloud-based internet of things (IoT) applications, typically cloud providers employ Service Level Agreements (SLAs) to ensure the quality of their provisioned services. Similar to any other contractual method, an SLA is not immune to breaches. Ideally, an SLA stipulates consequences (e.g. penalties) imposed on cloud providers when they fail to conform to SLA terms. The current practice assumes trust in service providers to acknowledge SLA breach incidents and executing associated consequences. Recently, the Blockchain paradigm has introduced compelling capabilities that may enable us to address SLA enforcement more elegantly. This paper proposes and implements a blockchain-based approach for assessing SLA compliance and enforcing consequences. It employs a diagnostic accuracy method for validating the dependability of the proposed solution. The paper also benchmarks Hyperledger Fabric to investigate its feasibility as an underlying blockchain infrastructure concerning latency and transaction success/fail rates. △ Less

Submitted 27 June, 2020; originally announced June 2020.

arXiv:2005.09748 [pdf, other]

The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory Framework

Authors: Nastaran Hajinazar, Pratyush Patel, Minesh Patel, Konstantinos Kanellopoulos, Saugata Ghose, Rachata Ausavarungnirun, Geraldo Francisco de Oliveira Jr., Jonathan Appavoo, Vivek Seshadri, Onur Mutlu

Abstract: Computers continue to diversify with respect to system designs, emerging memory technologies, and application memory demands. Unfortunately, continually adapting the conventional virtual memory framework to each possible system configuration is challenging, and often results in performance loss or requires non-trivial workarounds. To address these challenges, we propose a new virtual memory framew… ▽ More Computers continue to diversify with respect to system designs, emerging memory technologies, and application memory demands. Unfortunately, continually adapting the conventional virtual memory framework to each possible system configuration is challenging, and often results in performance loss or requires non-trivial workarounds. To address these challenges, we propose a new virtual memory framework, the Virtual Block Interface (VBI). We design VBI based on the key idea that delegating memory management duties to hardware can reduce the overheads and software complexity associated with virtual memory. VBI introduces a set of variable-sized virtual blocks (VBs) to applications. Each VB is a contiguous region of the globally-visible VBI address space, and an application can allocate each semantically meaningful unit of information (e.g., a data structure) in a separate VB. VBI decouples access protection from memory allocation and address translation. While the OS controls which programs have access to which VBs, dedicated hardware in the memory controller manages the physical memory allocation and address translation of the VBs. This approach enables several architectural optimizations to (1) efficiently and flexibly cater to different and increasingly diverse system configurations, and (2) eliminate key inefficiencies of conventional virtual memory. We demonstrate the benefits of VBI with two important use cases: (1) reducing the overheads of address translation (for both native execution and virtual machine environments), as VBI reduces the number of translation requests and associated memory accesses; and (2) two heterogeneous main memory architectures, where VBI increases the effectiveness of managing fast memory regions. For both cases, VBI significanttly improves performance over conventional virtual memory. △ Less

Submitted 19 May, 2020; originally announced May 2020.

arXiv:2004.06380 [pdf]

A Survey of Energy Efficient Schemes in Ad-hoc Networks

Authors: Priya P. Patel, Rutvij H. Jhaveri

Abstract: Ad hoc network is a collection of different types of nodes, which are connected in heterogeneous or homogeneous manner. It is also known as self-organizing-wireless network. The dynamic nature of ad hoc networks make them more attractive, which is used in many different applications. Every coin has two sides: one is the advantage part and other is disadvantages, in the same manner nature of ad hoc… ▽ More Ad hoc network is a collection of different types of nodes, which are connected in heterogeneous or homogeneous manner. It is also known as self-organizing-wireless network. The dynamic nature of ad hoc networks make them more attractive, which is used in many different applications. Every coin has two sides: one is the advantage part and other is disadvantages, in the same manner nature of ad hoc network make it more attractive from one side in other hand there are some issues too. Energy efficiency is a core factor which effects on ad hoc network in terms of battery life, throughput, overhead of messages, transmission error. For solving issues of energy constraints, different mechanisms are proposed by various researchers. In this paper, we survey various existing schemes which attempt to improve energy efficiency of different types of ad hoc routing protocol to increase network lifetime. Furthermore we outline future scope of these existing schemes which may help researches to carry out further research in this direction. △ Less

Submitted 14 April, 2020; originally announced April 2020.

arXiv:1910.07724 [pdf, other]

doi 10.1109/ICAPR.2017.8593079

Collaborative Filtering with Label Consistent Restricted Boltzmann Machine

Authors: Sagar Verma, Prince Patel, Angshul Majumdar

Abstract: The possibility of employing restricted Boltzmann machine (RBM) for collaborative filtering has been known for about a decade. However, there has been hardly any work on this topic since 2007. This work revisits the application of RBM in recommender systems. RBM based collaborative filtering only used the rating information; this is an unsupervised architecture. This work adds supervision by explo… ▽ More The possibility of employing restricted Boltzmann machine (RBM) for collaborative filtering has been known for about a decade. However, there has been hardly any work on this topic since 2007. This work revisits the application of RBM in recommender systems. RBM based collaborative filtering only used the rating information; this is an unsupervised architecture. This work adds supervision by exploiting user demographic information and item metadata. A network is learned from the representation layer to the labels (metadata). The proposed label consistent RBM formulation improves significantly on the existing RBM based approach and yield results at par with the state-of-the-art latent factor based models. △ Less

Submitted 17 October, 2019; originally announced October 2019.

Comments: 6 pages, ICAPR 2017, Code: https://github.com/sagarverma/LC-CFRBM

arXiv:1908.05849 [pdf]

AGDC: Automatic Garbage Detection and Collection

Authors: Siddhant Bansal, Seema Patel, Ishita Shah, Prof. Alpesh Patel, Prof. Jagruti Makwana, Dr. Rajesh Thakker

Abstract: Waste management is one of the significant problems throughout the world. Contemporaneous methods find it difficult to manage the volume of solid waste generated by the growing urban population. In this paper, we propose a system which is very hygienic and cheap that uses Artificial Intelligence algorithms for detection of the garbage. Once the garbage is detected the system calculates the positio… ▽ More Waste management is one of the significant problems throughout the world. Contemporaneous methods find it difficult to manage the volume of solid waste generated by the growing urban population. In this paper, we propose a system which is very hygienic and cheap that uses Artificial Intelligence algorithms for detection of the garbage. Once the garbage is detected the system calculates the position of the garbage by the use of the camera only. The proposed system is capable of distinguishing between valuables and garbage with more than 95% confidence in real-time. Finally, a robotic arm controlled by the microcontroller is used to pick up the garbage and places it in the bin. Concluding, the paper explains a system that is capable of working as a human in terms of inspecting and collecting the garbage. The system is able to achieve 3-4 frames per second on the Raspberry Pi, capable of detecting the garbage in real-time with 90%+ confidence. △ Less

Submitted 16 August, 2019; originally announced August 2019.

Showing 1–50 of 69 results for author: Patel, P