Search | arXiv e-print repository

TMA-Grid: An open-source, zero-footprint web application for FAIR Tissue MicroArray De-arraying

Authors: Aaron Ge, Monjoy Saha, Maire A. Duggan, Petra Lenz, Mustapha Abubakar, Montserrat García-Closas, Jeya Balasubramanian, Jonas S. Almeida, Praphulla MS Bhawsar

Abstract: Background: Tissue Microarrays (TMAs) significantly increase analytical efficiency in histopathology and large-scale epidemiologic studies by allowing multiple tissue cores to be scanned on a single slide. The individual cores can be digitally extracted and then linked to metadata for analysis in a process known as de-arraying. However, TMAs often contain core misalignments and artifacts due to… ▽ More Background: Tissue Microarrays (TMAs) significantly increase analytical efficiency in histopathology and large-scale epidemiologic studies by allowing multiple tissue cores to be scanned on a single slide. The individual cores can be digitally extracted and then linked to metadata for analysis in a process known as de-arraying. However, TMAs often contain core misalignments and artifacts due to assembly errors, which can adversely affect the reliability of the extracted cores during the de-arraying process. Moreover, conventional approaches for TMA de-arraying rely on desktop solutions.Therefore, a robust yet flexible de-arraying method is crucial to account for these inaccuracies and ensure effective downstream analyses. Results: We developed TMA-Grid, an in-browser, zero-footprint, interactive web application for TMA de-arraying. This web application integrates a convolutional neural network for precise tissue segmentation and a grid estimation algorithm to match each identified core to its expected location. The application emphasizes interactivity, allowing users to easily adjust segmentation and gridding results. Operating entirely in the web-browser, TMA-Grid eliminates the need for downloads or installations and ensures data privacy. Adhering to FAIR principles (Findable, Accessible, Interoperable, and Reusable), the application and its components are designed for seamless integration into TMA research workflows. Conclusions: TMA-Grid provides a robust, user-friendly solution for TMA dearraying on the web. As an open, freely accessible platform, it lays the foundation for collaborative analyses of TMAs and similar histopathology imaging data. Availability: Web application: https://episphere.github.io/tma-grid Code: https://github.com/episphere/tma-grid Tutorial: https://youtu.be/miajqyw4BVk △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: NA

arXiv:2407.09355 [pdf]

FastImpute: A Baseline for Open-source, Reference-Free Genotype Imputation Methods -- A Case Study in PRS313

Authors: Aaron Ge, Jeya Balasubramanian, Xueyao Wu, Peter Kraft, Jonas S. Almeida

Abstract: Genotype imputation enhances genetic data by predicting missing SNPs using reference haplotype information. Traditional methods leverage linkage disequilibrium (LD) to infer untyped SNP genotypes, relying on the similarity of LD structures between genotyped target sets and fully sequenced reference panels. Recently, reference-free deep learning-based methods have emerged, offering a promising alte… ▽ More Genotype imputation enhances genetic data by predicting missing SNPs using reference haplotype information. Traditional methods leverage linkage disequilibrium (LD) to infer untyped SNP genotypes, relying on the similarity of LD structures between genotyped target sets and fully sequenced reference panels. Recently, reference-free deep learning-based methods have emerged, offering a promising alternative by predicting missing genotypes without external databases, thereby enhancing privacy and accessibility. However, these methods often produce models with tens of millions of parameters, leading to challenges such as the need for substantial computational resources to train and inefficiency for client-sided deployment. Our study addresses these limitations by introducing a baseline for a novel genotype imputation pipeline that supports client-sided imputation models generalizable across any genotyping chip and genomic region. This approach enhances patient privacy by performing imputation directly on edge devices. As a case study, we focus on PRS313, a polygenic risk score comprising 313 SNPs used for breast cancer risk prediction. Utilizing consumer genetic panels such as 23andMe, our model democratizes access to personalized genetic insights by allowing 23andMe users to obtain their PRS313 score. We demonstrate that simple linear regression can significantly improve the accuracy of PRS313 scores when calculated using SNPs imputed from consumer gene panels, such as 23andMe. Our linear regression model achieved an R^2 of 0.86, compared to 0.33 without imputation and 0.28 with simple imputation (substituting missing SNPs with the minor allele frequency). These findings suggest that popular SNP analysis libraries could benefit from integrating linear regression models for genotype imputation, providing a viable and light-weight alternative to reference based imputation. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: This paper is 16 pages long and contains 7 figures. For more information and to access related resources: * Web application: https://aaronge-2020.github.io/DeepImpute/ * Code repository: https://github.com/aaronge-2020/DeepImpute

arXiv:2401.01818 [pdf, other]

SENS3: Multisensory Database of Finger-Surface Interactions and Corresponding Sensations

Authors: Jagan K. Balasubramanian, Bence L. Kodak, Yasemin Vardar

Abstract: The growing demand for natural interactions with technology underscores the importance of achieving realistic touch sensations in digital environments. Realizing this goal highly depends on comprehensive databases of finger-surface interactions, which need further development. Here, we present SENS3 -- www.sens3.net -- an extensive open-access repository of multisensory data acquired from fifty su… ▽ More The growing demand for natural interactions with technology underscores the importance of achieving realistic touch sensations in digital environments. Realizing this goal highly depends on comprehensive databases of finger-surface interactions, which need further development. Here, we present SENS3 -- www.sens3.net -- an extensive open-access repository of multisensory data acquired from fifty surfaces when two participants explored them with their fingertips through static contact, pressing, tapping, and sliding. SENS3 encompasses high-fidelity visual, audio, and haptic information recorded during these interactions, including videos, sounds, contact forces, torques, positions, accelerations, skin temperature, heat flux, and surface photographs. Additionally, it incorporates thirteen participants' psychophysical sensation ratings (rough-smooth, flat-bumpy, sticky-slippery, hot-cold, regular-irregular, fine-coarse, hard-soft, and wet-dry) while exploring these surfaces freely. Designed with an open-ended framework, SENS3 has the potential to be expanded with additional textures and participants. We anticipate that SENS3 will be valuable for advancing multisensory texture rendering, user experience development, and touch sensing in robotics. △ Less

Submitted 1 July, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

Comments: 15 pages, 3 table, 3 figures, conference

arXiv:2310.09252 [pdf]

Wasm-iCARE: a portable and privacy-preserving web module to build, validate, and apply absolute risk models

Authors: Jeya Balaji Balasubramanian, Parichoy Pal Choudhury, Srijon Mukhopadhyay, Thomas Ahearn, Nilanjan Chatterjee, Montserrat García-Closas, Jonas S. Almeida

Abstract: Objective: Absolute risk models estimate an individual's future disease risk over a specified time interval. Applications utilizing server-side risk tooling, such as the R-based iCARE (R-iCARE), to build, validate, and apply absolute risk models, face serious limitations in portability and privacy due to their need for circulating user data in remote servers for operation. Our objective was to ove… ▽ More Objective: Absolute risk models estimate an individual's future disease risk over a specified time interval. Applications utilizing server-side risk tooling, such as the R-based iCARE (R-iCARE), to build, validate, and apply absolute risk models, face serious limitations in portability and privacy due to their need for circulating user data in remote servers for operation. Our objective was to overcome these limitations. Materials and Methods: We refactored R-iCARE into a Python package (Py-iCARE) then compiled it to WebAssembly (Wasm-iCARE): a portable web module, which operates entirely within the privacy of the user's device. Results: We showcase the portability and privacy of Wasm-iCARE through two applications: for researchers to statistically validate risk models, and to deliver them to end-users. Both applications run entirely on the client-side, requiring no downloads or installations, and keeps user data on-device during risk calculation. Conclusions: Wasm-iCARE fosters accessible and privacy-preserving risk tools, accelerating their validation and delivery. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: 10 pages, 2 figures

arXiv:2308.02995 [pdf]

mSigSDK -- private, at scale, computation of mutation signatures

Authors: Aaron Ge, Yasmmin Côrtes Martins, Tongwu Zhang, Kailing Chen, Maria Teresa Landi, Brian Park, Jeya Balasubramanian, Jonas S Almeida

Abstract: In our previous work, we demonstrated that it is feasible to perform analysis on mutation signature data without the need for downloads or installations and analyze individual patient data at scale without compromising privacy. Building on this foundation, we developed a Software Development Kit (SDK) called mSigSDK to facilitate the orchestration of distributed data processing workflows and graph… ▽ More In our previous work, we demonstrated that it is feasible to perform analysis on mutation signature data without the need for downloads or installations and analyze individual patient data at scale without compromising privacy. Building on this foundation, we developed a Software Development Kit (SDK) called mSigSDK to facilitate the orchestration of distributed data processing workflows and graphic visualization of mutational signature analysis results. We strictly adhered to modern web computing standards, particularly the modularization standards set by the ECMAScript ES6 framework (JavaScript modules). Our approach allows for computation to be entirely performed by secure delegation to the computational resources of the user's own machine (in-browser), without any downloads or installations. The mSigSDK was developed primarily as a companion library to the mSig Portal resource of the National Cancer Institute Division of Cancer Epidemiology and Genetics (NIH/NCI/DCEG), with a focus on its FAIR extensibility as components of other researchers' computational constructs. Anticipated extensions include the programmatic operation of other mutation signature API ecosystems such as SIGNAL and COSMIC, advancing towards a data commons for mutational signature research (Grossman et al., 2016). △ Less

Submitted 19 January, 2024; v1 submitted 5 August, 2023; originally announced August 2023.

arXiv:2306.01634 [pdf]

A FAIR platform for reproducing mutational signature detection on tumor sequencing data

Authors: Aaron Ge, Tongwu Zhang, Clara Bodelon, Montserrat Garcia-Closas, Jonas Almeida, Jeya Balasubramanian

Abstract: This paper presents a portable, privacy-preserving, in-browser platform for the reproducible assessment of mutational signature detection methods from sparse sequencing data generated by targeted gene panels. The platform aims to address the reproducibility challenges in mutational signature research by adhering to the FAIR principles, making it findable, accessible, interoperable, and reusable. O… ▽ More This paper presents a portable, privacy-preserving, in-browser platform for the reproducible assessment of mutational signature detection methods from sparse sequencing data generated by targeted gene panels. The platform aims to address the reproducibility challenges in mutational signature research by adhering to the FAIR principles, making it findable, accessible, interoperable, and reusable. Our approach focuses on the detection of specific mutational signatures, such as SBS3, which have been linked to specific mutagenic processes. The platform relies on publicly available data, simulation, downsampling techniques, and machine learning algorithms to generate training data and labels and to train and evaluate models. The key achievement of our platform is its transparency, reusability, and privacy preservation, enabling researchers and clinicians to analyze mutational signatures with the guarantee that no data circulates outside the client machine. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: Our proposed in-browser platform is publicly available under the MIT license at https://aaronge-2020.github.io/Sig3-Detection/. No data leaves this privacy-preserving environment, which can be cloned or forked and served from other domains with no restrictions. All the code and relevant data used to create this platform can be found at https://github.com/aaronge-2020/Sig3-Detection

arXiv:2206.06159 [pdf]

Moving towards FAIR practices in epidemiological research

Authors: Montserrat Garcia-Closas, Thomas U. Ahearn, Mia M. Gaudet, Amber N. Hurson, Jeya Balaji Balasubramanian, Parichoy Pal Choudhury, Nicole M. Gerlanc, Bhaumik Patel, Daniel Russ, Mustapha Abubakar, Neal D. Freedman, Wendy S. W. Wong, Stephen J. Chanock, Amy Berrington de Gonzalez, Jonas S Almeida

Abstract: Reproducibility and replicability of research findings are central to the scientific integrity of epidemiology. In addition, many research questions require combiningdata from multiple sources to achieve adequate statistical power. However, barriers related to confidentiality, costs, and incentives often limit the extent and speed of sharing resources, both data and code. Epidemiological practices… ▽ More Reproducibility and replicability of research findings are central to the scientific integrity of epidemiology. In addition, many research questions require combiningdata from multiple sources to achieve adequate statistical power. However, barriers related to confidentiality, costs, and incentives often limit the extent and speed of sharing resources, both data and code. Epidemiological practices that follow FAIR principles can address these barriers by making resources (F)indable with the necessary metadata , (A)ccessible to authorized users and (I)nteroperable with other data, to optimize the (R)e-use of resources with appropriate credit to its creators. We provide an overview of these principles and describe approaches for implementation in epidemiology. Increasing degrees of FAIRness can be achieved by moving data and code from on-site locations to the Cloud, using machine-readable and non-proprietary files, and developing open-source code. Adoption of these practices will improve daily work and collaborative analyses, and facilitate compliance with data sharing policies from funders and scientific journals. Achieving a high degree of FAIRness will require funding, training, organizational support, recognition, and incentives for sharing resources. But these costs are amply outweighed by the benefits of making research more reproducible, impactful, and equitable by facilitating the re-use of precious research resources by the scientific community. △ Less

Submitted 13 June, 2022; originally announced June 2022.

arXiv:1705.05765 [pdf, other]

Online Article Ranking as a Constrained, Dynamic, Multi-Objective Optimization Problem

Authors: Jeya Balaji Balasubramanian, Akshay Soni, Yashar Mehdad, Nikolay Laptev

Abstract: The content ranking problem in a social news website, is typically a function that maximizes a scalar metric of interest like dwell-time. However, like in most real-world applications we are interested in more than one metric---for instance simultaneously maximizing click-through rate, monetization metrics, dwell-time---and also satisfy the traffic requirements promised to different publishers. Al… ▽ More The content ranking problem in a social news website, is typically a function that maximizes a scalar metric of interest like dwell-time. However, like in most real-world applications we are interested in more than one metric---for instance simultaneously maximizing click-through rate, monetization metrics, dwell-time---and also satisfy the traffic requirements promised to different publishers. All this needs to be done on online data and under the settings where the objective function and the constraints can dynamically change; this could happen if for instance new publishers are added, some contracts are adjusted, or if some contracts are over. In this paper, we formulate this problem as a constrained, dynamic, multi-objective optimization problem. We propose a novel framework that extends a successful genetic optimization algorithm, NSGA-II, to solve this online, data-driven problem. We design the modules of NSGA-II to suit our problem. We evaluate optimization performance using Hypervolume and introduce a confidence interval metric for assessing the practicality of a solution. We demonstrate the application of this framework on a real-world Article Ranking problem. We observe that we make considerable improvements in both time and performance over a brute-force baseline technique that is currently in production. △ Less

Submitted 16 May, 2017; originally announced May 2017.

Comments: 7 pages

Showing 1–8 of 8 results for author: Balasubramanian, J