Search | arXiv e-print repository

LRM-Zero: Training Large Reconstruction Models with Synthesized Data

Authors: Desai Xie, Sai Bi, Zhixin Shu, Kai Zhang, Zexiang Xu, Yi Zhou, Sören Pirk, Arie Kaufman, Xin Sun, Hao Tan

Abstract: We present LRM-Zero, a Large Reconstruction Model (LRM) trained entirely on synthesized 3D data, achieving high-quality sparse-view 3D reconstruction. The core of LRM-Zero is our procedural 3D dataset, Zeroverse, which is automatically synthesized from simple primitive shapes with random texturing and augmentations (e.g., height fields, boolean differences, and wireframes). Unlike previous 3D data… ▽ More We present LRM-Zero, a Large Reconstruction Model (LRM) trained entirely on synthesized 3D data, achieving high-quality sparse-view 3D reconstruction. The core of LRM-Zero is our procedural 3D dataset, Zeroverse, which is automatically synthesized from simple primitive shapes with random texturing and augmentations (e.g., height fields, boolean differences, and wireframes). Unlike previous 3D datasets (e.g., Objaverse) which are often captured or crafted by humans to approximate real 3D data, Zeroverse completely ignores realistic global semantics but is rich in complex geometric and texture details that are locally similar to or even more intricate than real objects. We demonstrate that our LRM-Zero, trained with our fully synthesized Zeroverse, can achieve high visual quality in the reconstruction of real-world objects, competitive with models trained on Objaverse. We also analyze several critical design choices of Zeroverse that contribute to LRM-Zero's capability and training stability. Our work demonstrates that 3D reconstruction, one of the core tasks in 3D vision, can potentially be addressed without the semantics of real-world objects. The Zeroverse's procedural synthesis code and interactive visualization are available at: https://desaixie.github.io/lrm-zero/. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 23 pages, 8 figures. Our code and interactive visualization are available at: https://desaixie.github.io/lrm-zero/

arXiv:2312.13980 [pdf, other]

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

Authors: Desai Xie, Jiahao Li, Hao Tan, Xin Sun, Zhixin Shu, Yi Zhou, Sai Bi, Sören Pirk, Arie E. Kaufman

Abstract: Multi-view diffusion models, obtained by applying Supervised Finetuning (SFT) to text-to-image diffusion models, have driven recent breakthroughs in text-to-3D research. However, due to the limited size and quality of existing 3D datasets, they still suffer from multi-view inconsistencies and Neural Radiance Field (NeRF) reconstruction artifacts. We argue that multi-view diffusion models can benef… ▽ More Multi-view diffusion models, obtained by applying Supervised Finetuning (SFT) to text-to-image diffusion models, have driven recent breakthroughs in text-to-3D research. However, due to the limited size and quality of existing 3D datasets, they still suffer from multi-view inconsistencies and Neural Radiance Field (NeRF) reconstruction artifacts. We argue that multi-view diffusion models can benefit from further Reinforcement Learning Finetuning (RLFT), which allows models to learn from the data generated by themselves and improve beyond their dataset limitations during SFT. To this end, we introduce Carve3D, an improved RLFT algorithm coupled with a novel Multi-view Reconstruction Consistency (MRC) metric, to enhance the consistency of multi-view diffusion models. To measure the MRC metric on a set of multi-view images, we compare them with their corresponding NeRF renderings at the same camera viewpoints. The resulting model, which we denote as Carve3DM, demonstrates superior multi-view consistency and NeRF reconstruction quality than existing models. Our results suggest that pairing SFT with Carve3D's RLFT is essential for developing multi-view-consistent diffusion models, mirroring the standard Large Language Model (LLM) alignment pipeline. Our code, training and testing data, and video results are available at: https://desaixie.github.io/carve-3d. △ Less

Submitted 9 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: 22 pages, 16 figures. Our code, training and testing data, and video results are available at: https://desaixie.github.io/carve-3d. This paper has been accepted to CVPR 2024. v2: incorporated changes from the CVPR 2024 camera-ready version

arXiv:2310.00868 [pdf, other]

RT-GAN: Recurrent Temporal GAN for Adding Lightweight Temporal Consistency to Frame-Based Domain Translation Approaches

Authors: Shawn Mathew, Saad Nadeem, Alvin C. Goh, Arie Kaufman

Abstract: While developing new unsupervised domain translation methods for endoscopy videos, it is typical to start with approaches that initially work for individual frames without temporal consistency. Once an individual-frame model has been finalized, additional contiguous frames are added with a modified deep learning architecture to train a new model for temporal consistency. This transition to tempora… ▽ More While developing new unsupervised domain translation methods for endoscopy videos, it is typical to start with approaches that initially work for individual frames without temporal consistency. Once an individual-frame model has been finalized, additional contiguous frames are added with a modified deep learning architecture to train a new model for temporal consistency. This transition to temporally-consistent deep learning models, however, requires significantly more computational and memory resources for training. In this paper, we present a lightweight solution with a tunable temporal parameter, RT-GAN (Recurrent Temporal GAN), for adding temporal consistency to individual frame-based approaches that reduces training requirements by a factor of 5. We demonstrate the effectiveness of our approach on two challenging use cases in colonoscopy: haustral fold segmentation (indicative of missed surface) and realistic colonoscopy simulator video generation. The datasets, accompanying code, and pretrained models will be made available at \url{https://github.com/nadeemlab/CEP}. △ Less

Submitted 1 October, 2023; originally announced October 2023.

Comments: First two authors contributed equally

arXiv:2307.03687 [pdf, other]

Leveraging text data for causal inference using electronic health records

Authors: Reagan Mozer, Aaron R. Kaufman, Leo A. Celi, Luke Miratrix

Abstract: In studies that rely on data from electronic health records (EHRs), unstructured text data such as clinical progress notes offer a rich source of information about patient characteristics and care that may be missing from structured data. Despite the prevalence of text in clinical research, these data are often ignored for the purposes of quantitative analysis due their complexity. This paper pres… ▽ More In studies that rely on data from electronic health records (EHRs), unstructured text data such as clinical progress notes offer a rich source of information about patient characteristics and care that may be missing from structured data. Despite the prevalence of text in clinical research, these data are often ignored for the purposes of quantitative analysis due their complexity. This paper presents a unified framework for leveraging text data to support causal inference with electronic health data at multiple stages of analysis. In particular, we consider how natural language processing and statistical text analysis can be combined with standard inferential techniques to address common challenges due to missing data, confounding bias, and treatment effect heterogeneity. Through an application to a recent EHR study investigating the effects of a non-randomized medical intervention on patient outcomes, we show how incorporating text data in a traditional matching analysis can help strengthen the validity of an estimated treatment effect and identify patient subgroups that may benefit most from treatment. We believe these methods have the potential to expand the scope of secondary analysis of clinical data to domains where structured EHR data is limited, such as in developing countries. To this end, we provide code and open-source replication materials to encourage adoption and broader exploration of these techniques in clinical research. △ Less

Submitted 20 May, 2024; v1 submitted 9 June, 2023; originally announced July 2023.

arXiv:2306.04602 [pdf, other]

Prefix Siphoning: Exploiting LSM-Tree Range Filters For Information Disclosure (Full Version)

Authors: Adi Kaufman, Moshik Hershcovitch, Adam Morrison

Abstract: Key-value stores typically leave access control to the systems for which they act as storage engines. Unfortunately, attackers may circumvent such read access controls via timing attacks on the key-value store, which use differences in query response times to glean information about stored data. To date, key-value store timing attacks have aimed to disclose stored values and have exploited exter… ▽ More Key-value stores typically leave access control to the systems for which they act as storage engines. Unfortunately, attackers may circumvent such read access controls via timing attacks on the key-value store, which use differences in query response times to glean information about stored data. To date, key-value store timing attacks have aimed to disclose stored values and have exploited external mechanisms that can be disabled for protection. In this paper, we point out that key disclosure is also a security threat -- and demonstrate key disclosure timing attacks that exploit mechanisms of the key-value store itself. We target LSM-tree based key-value stores utilizing range filters, which have been recently proposed to optimize LSM-tree range queries. We analyze the impact of the range filters SuRF and prefix Bloom filter on LSM-trees through a security lens, and show that they enable a key disclosure timing attack, which we call prefix siphoning. Prefix siphoning successfully leverages benign queries for non-present keys to identify prefixes of actual keys -- and in some cases, full keys -- in scenarios where brute force searching for keys (via exhaustive enumeration or random guesses) is infeasible. △ Less

Submitted 8 September, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: Full version of USENIX ATC'23 paper

arXiv:2305.13934 [pdf, other]

doi 10.1038/s41598-023-38964-3

Perception, performance, and detectability of conversational artificial intelligence across 32 university courses

Authors: Hazem Ibrahim, Fengyuan Liu, Rohail Asim, Balaraju Battu, Sidahmed Benabderrahmane, Bashar Alhafni, Wifag Adnan, Tuka Alhanai, Bedoor AlShebli, Riyadh Baghdadi, Jocelyn J. Bélanger, Elena Beretta, Kemal Celik, Moumena Chaqfeh, Mohammed F. Daqaq, Zaynab El Bernoussi, Daryl Fougnie, Borja Garcia de Soto, Alberto Gandolfi, Andras Gyorgy, Nizar Habash, J. Andrew Harris, Aaron Kaufman, Lefteris Kirousis, Korhan Kocak , et al. (14 additional authors not shown)

Abstract: The emergence of large language models has led to the development of powerful tools such as ChatGPT that can produce text indistinguishable from human-generated work. With the increasing accessibility of such technology, students across the globe may utilize it to help with their school work -- a possibility that has sparked discussions on the integrity of student evaluations in the age of artific… ▽ More The emergence of large language models has led to the development of powerful tools such as ChatGPT that can produce text indistinguishable from human-generated work. With the increasing accessibility of such technology, students across the globe may utilize it to help with their school work -- a possibility that has sparked discussions on the integrity of student evaluations in the age of artificial intelligence (AI). To date, it is unclear how such tools perform compared to students on university-level courses. Further, students' perspectives regarding the use of such tools, and educators' perspectives on treating their use as plagiarism, remain unknown. Here, we compare the performance of ChatGPT against students on 32 university-level courses. We also assess the degree to which its use can be detected by two classifiers designed specifically for this purpose. Additionally, we conduct a survey across five countries, as well as a more in-depth survey at the authors' institution, to discern students' and educators' perceptions of ChatGPT's use. We find that ChatGPT's performance is comparable, if not superior, to that of students in many courses. Moreover, current AI-text classifiers cannot reliably detect ChatGPT's use in school work, due to their propensity to classify human-written answers as AI-generated, as well as the ease with which AI-generated text can be edited to evade detection. Finally, we find an emerging consensus among students to use the tool, and among educators to treat this as plagiarism. Our findings offer insights that could guide policy discussions addressing the integration of AI into educational frameworks. △ Less

Submitted 7 May, 2023; originally announced May 2023.

Comments: 17 pages, 4 figures

arXiv:2304.06872 [pdf, other]

doi 10.1109/TVCG.2023.3332511

Submerse: Visualizing Storm Surge Flooding Simulations in Immersive Display Ecologies

Authors: Saeed Boorboor, Yoonsang Kim, Ping Hu, Josef M. Moses, Brian A. Colle, Arie E. Kaufman

Abstract: We present Submerse, an end-to-end framework for visualizing flooding scenarios on large and immersive display ecologies. Specifically, we reconstruct a surface mesh from input flood simulation data and generate a to-scale 3D virtual scene by incorporating geographical data such as terrain, textures, buildings, and additional scene objects. To optimize computation and memory performance for large… ▽ More We present Submerse, an end-to-end framework for visualizing flooding scenarios on large and immersive display ecologies. Specifically, we reconstruct a surface mesh from input flood simulation data and generate a to-scale 3D virtual scene by incorporating geographical data such as terrain, textures, buildings, and additional scene objects. To optimize computation and memory performance for large simulation datasets, we discretize the data on an adaptive grid using dynamic quadtrees and support level-of-detail based rendering. Moreover, to provide a perception of flooding direction for a time instance, we animate the surface mesh by synthesizing water waves. As interaction is key for effective decision-making and analysis, we introduce two novel techniques for flood visualization in immersive systems: (1) an automatic scene-navigation method using optimal camera viewpoints generated for marked points-of-interest based on the display layout, and (2) an AR-based focus+context technique using an auxiliary display system. Submerse is developed in collaboration between computer scientists and atmospheric scientists. We evaluate the effectiveness of our system and application by conducting workshops with emergency managers, domain experts, and concerned stakeholders in the Stony Brook Reality Deck, an immersive gigapixel facility, to visualize a superstorm flooding scenario in New York City. △ Less

Submitted 13 April, 2023; originally announced April 2023.

arXiv:2212.12141 [pdf, other]

Human Activity Recognition in an Open World

Authors: Derek S. Prijatelj, Samuel Grieggs, Jin Huang, Dawei Du, Ameya Shringi, Christopher Funk, Adam Kaufman, Eric Robertson, Walter J. Scheirer

Abstract: Managing novelty in perception-based human activity recognition (HAR) is critical in realistic settings to improve task performance over time and ensure solution generalization outside of prior seen samples. Novelty manifests in HAR as unseen samples, activities, objects, environments, and sensor changes, among other ways. Novelty may be task-relevant, such as a new class or new features, or task-… ▽ More Managing novelty in perception-based human activity recognition (HAR) is critical in realistic settings to improve task performance over time and ensure solution generalization outside of prior seen samples. Novelty manifests in HAR as unseen samples, activities, objects, environments, and sensor changes, among other ways. Novelty may be task-relevant, such as a new class or new features, or task-irrelevant resulting in nuisance novelty, such as never before seen noise, blur, or distorted video recordings. To perform HAR optimally, algorithmic solutions must be tolerant to nuisance novelty, and learn over time in the face of novelty. This paper 1) formalizes the definition of novelty in HAR building upon the prior definition of novelty in classification tasks, 2) proposes an incremental open world learning (OWL) protocol and applies it to the Kinetics datasets to generate a new benchmark KOWL-718, 3) analyzes the performance of current state-of-the-art HAR models when novelty is introduced over time, 4) provides a containerized and packaged pipeline for reproducing the OWL protocol and for modifying for any future updates to Kinetics. The experimental analysis includes an ablation study of how the different models perform under various conditions as annotated by Kinetics-AVA. The protocol as an algorithm for reproducing experiments using the KOWL-718 benchmark will be publicly released with code and containers at https://github.com/prijatelj/human-activity-recognition-in-an-open-world. The code may be used to analyze different annotations and subsets of the Kinetics datasets in an incremental open world fashion, as well as be extended as further updates to Kinetics are released. △ Less

Submitted 22 December, 2022; originally announced December 2022.

Comments: 39 pages, 16 figures, 3 tables, Pre-print submitted to JAIR

ACM Class: I.5.4

arXiv:2206.14951 [pdf, other]

CLTS-GAN: Color-Lighting-Texture-Specular Reflection Augmentation for Colonoscopy

Authors: Shawn Mathew, Saad Nadeem, Arie Kaufman

Abstract: Automated analysis of optical colonoscopy (OC) video frames (to assist endoscopists during OC) is challenging due to variations in color, lighting, texture, and specular reflections. Previous methods either remove some of these variations via preprocessing (making pipelines cumbersome) or add diverse training data with annotations (but expensive and time-consuming). We present CLTS-GAN, a new deep… ▽ More Automated analysis of optical colonoscopy (OC) video frames (to assist endoscopists during OC) is challenging due to variations in color, lighting, texture, and specular reflections. Previous methods either remove some of these variations via preprocessing (making pipelines cumbersome) or add diverse training data with annotations (but expensive and time-consuming). We present CLTS-GAN, a new deep learning model that gives fine control over color, lighting, texture, and specular reflection synthesis for OC video frames. We show that adding these colonoscopy-specific augmentations to the training data can improve state-of-the-art polyp detection/segmentation methods as well as drive next generation of OC simulators for training medical students. The code and pre-trained models for CLTS-GAN are available on Computational Endoscopy Platform GitHub (https://github.com/nadeemlab/CEP). △ Less

Submitted 29 June, 2022; originally announced June 2022.

Comments: MICCAI 2022. **First two authors contributed equally

arXiv:2202.10551 [pdf, other]

Geometry-Aware Planar Embedding of Treelike Structures

Authors: Ping Hu, Saeed Boorboor, Joseph Marino, Arie E. Kaufman

Abstract: The growing complexity of spatial and structural information in 3D data makes data inspection and visualization a challenging task. We describe a method to create a planar embedding of 3D treelike structures using their skeleton representations. Our method maintains the original geometry, without overlaps, to the best extent possible, allowing exploration of the topology within a single view. We p… ▽ More The growing complexity of spatial and structural information in 3D data makes data inspection and visualization a challenging task. We describe a method to create a planar embedding of 3D treelike structures using their skeleton representations. Our method maintains the original geometry, without overlaps, to the best extent possible, allowing exploration of the topology within a single view. We present a novel camera view generation method which maximizes the visible geometric attributes (segment shape and relative placement between segments). Camera views are created for individual segments and are used to determine local bending angles at each node by projecting them to 2D. The final embedding is generated by minimizing an energy function (the weights of which are user adjustable) based on branch length and the 2D angles, while avoiding intersections. The user can also interactively modify segment placement within the 2D embedding, and the overall embedding will update accordingly. A global to local interactive exploration is provided using hierarchical camera views that are created for subtrees within the structure. We evaluate our method both qualitatively and quantitatively and demonstrate our results by constructing planar visualizations of line data (traced neurons) and volume data (CT vascular and bronchial data △ Less

Submitted 21 February, 2022; originally announced February 2022.

arXiv:2202.01115 [pdf, other]

doi 10.1109/TVCG.2021.3127132

NeuRegenerate: A Framework for Visualizing Neurodegeneration

Authors: Saeed Boorboor, Shawn Mathew, Mala Ananth, David Talmage, Lorna W. Role, Arie E. Kaufman

Abstract: Recent advances in high-resolution microscopy have allowed scientists to better understand the underlying brain connectivity. However, due to the limitation that biological specimens can only be imaged at a single timepoint, studying changes to neural projections is limited to general observations using population analysis. In this paper, we introduce NeuRegenerate, a novel end-to-end framework fo… ▽ More Recent advances in high-resolution microscopy have allowed scientists to better understand the underlying brain connectivity. However, due to the limitation that biological specimens can only be imaged at a single timepoint, studying changes to neural projections is limited to general observations using population analysis. In this paper, we introduce NeuRegenerate, a novel end-to-end framework for the prediction and visualization of changes in neural fiber morphology within a subject, for specified age-timepoints.To predict projections, we present neuReGANerator, a deep-learning network based on cycle-consistent generative adversarial network (cycleGAN) that translates features of neuronal structures in a region, across age-timepoints, for large brain microscopy volumes. We improve the reconstruction quality of neuronal structures by implementing a density multiplier and a new loss function, called the hallucination loss.Moreover, to alleviate artifacts that occur due to tiling of large input volumes, we introduce a spatial-consistency module in the training pipeline of neuReGANerator. We show that neuReGANerator has a reconstruction accuracy of 94% in predicting neuronal structures. Finally, to visualize the predicted change in projections, NeuRegenerate offers two modes: (1) neuroCompare to simultaneously visualize the difference in the structures of the neuronal projections, across the age timepoints, and (2) neuroMorph, a vesselness-based morphing technique to interactively visualize the transformation of the structures from one age-timepoint to the other. Our framework is designed specifically for volumes acquired using wide-field microscopy. We demonstrate our framework by visualizing the structural changes in neuronal fibers within the cholinergic system of the mouse brain between a young and old specimen. △ Less

Submitted 2 February, 2022; originally announced February 2022.

Comments: Accepted for publication in IEEE Transactions on Visualization and Computer Graphics

arXiv:2108.03799 [pdf, other]

COVID-view: Diagnosis of COVID-19 using Chest CT

Authors: Shreeraj Jadhav, Gaofeng Deng, Marlene Zawin, Arie E. Kaufman

Abstract: Significant work has been done towards deep learning (DL) models for automatic lung and lesion segmentation and classification of COVID-19 on chest CT data. However, comprehensive visualization systems focused on supporting the dual visual+DL diagnosis of COVID-19 are non-existent. We present COVID-view, a visualization application specially tailored for radiologists to diagnose COVID-19 from ches… ▽ More Significant work has been done towards deep learning (DL) models for automatic lung and lesion segmentation and classification of COVID-19 on chest CT data. However, comprehensive visualization systems focused on supporting the dual visual+DL diagnosis of COVID-19 are non-existent. We present COVID-view, a visualization application specially tailored for radiologists to diagnose COVID-19 from chest CT data. The system incorporates a complete pipeline of automatic lungs segmentation, localization/ isolation of lung abnormalities, followed by visualization, visual and DL analysis, and measurement/quantification tools. Our system combines the traditional 2D workflow of radiologists with newer 2D and 3D visualization techniques with DL support for a more comprehensive diagnosis. COVID-view incorporates a novel DL model for classifying the patients into positive/negative COVID-19 cases, which acts as a reading aid for the radiologist using COVID-view and provides the attention heatmap as an explainable DL for the model output. We designed and evaluated COVID-view through suggestions, close feedback and conducting case studies of real-world patient data by expert radiologists who have substantial experience diagnosing chest CT scans for COVID-19, pulmonary embolism, and other forms of lung infections. We present requirements and task analysis for the diagnosis of COVID-19 that motivate our design choices and results in a practical system which is capable of handling real-world patient cases. △ Less

Submitted 9 August, 2021; originally announced August 2021.

Comments: 11 pages, 10 figures, accepted to IEEE VIS 2021 conference and IEEE Transactions on Visualization and Computer Graphics

arXiv:2106.15346 [pdf, other]

Estimating Incremental Acquisition of Content Launches in a Subscription Service

Authors: Hamidreza Badri, Alex Kaufman

Abstract: Subscription services face a difficult problem when estimating the causal impact of content launches on acquisition. Customers buy subscriptions, not individual pieces of content, and once subscribed they may consume many pieces of content in addition to the one(s) that drew them to the service. In this paper, we propose a scalable methodology to estimate the incremental acquisition impact of cont… ▽ More Subscription services face a difficult problem when estimating the causal impact of content launches on acquisition. Customers buy subscriptions, not individual pieces of content, and once subscribed they may consume many pieces of content in addition to the one(s) that drew them to the service. In this paper, we propose a scalable methodology to estimate the incremental acquisition impact of content launches in a subscription business model when randomized experimentation is not feasible. Our approach uses simple assumptions to transform the problem into an equivalent question: what is the expected consumption rate for new subscribers who did not join due to the content launch? We estimate this counterfactual rate using the consumption rate of new subscribers who joined just prior to launch, while making adjustments for variation related to subscriber attributes, the in-product experience, and seasonality. We then compare our counterfactual consumption to the actual rate in order to back out an acquisition estimate. Our methodology provides top-line impact estimates at the content / day / region grain. Additionally, to enable subscriber-level attribution, we present an algorithm that assigns specific individual accounts to add up to the top-line estimate. Subscriber-level attribution is derived by solving an optimization problem to minimize the number of subscribers attributed to more than one piece of content, while maximizing the average propensity to be incremental for subscribers attributed to each piece of content. Finally, in the absence of definitive ground truth, we present several validation methods which can be used to assess the plausibility of impact estimates generated by these methods. △ Less

Submitted 24 June, 2021; originally announced June 2021.

MSC Class: 62D20

arXiv:2106.12522 [pdf, other]

FoldIt: Haustral Folds Detection and Segmentation in Colonoscopy Videos

Authors: Shawn Mathew, Saad Nadeem, Arie Kaufman

Abstract: Haustral folds are colon wall protrusions implicated for high polyp miss rate during optical colonoscopy procedures. If segmented accurately, haustral folds can allow for better estimation of missed surface and can also serve as valuable landmarks for registering pre-treatment virtual (CT) and optical colonoscopies, to guide navigation towards the anomalies found in pre-treatment scans. We present… ▽ More Haustral folds are colon wall protrusions implicated for high polyp miss rate during optical colonoscopy procedures. If segmented accurately, haustral folds can allow for better estimation of missed surface and can also serve as valuable landmarks for registering pre-treatment virtual (CT) and optical colonoscopies, to guide navigation towards the anomalies found in pre-treatment scans. We present a novel generative adversarial network, FoldIt, for feature-consistent image translation of optical colonoscopy videos to virtual colonoscopy renderings with haustral fold overlays. A new transitive loss is introduced in order to leverage ground truth information between haustral fold annotations and virtual colonoscopy renderings. We demonstrate the effectiveness of our model on real challenging optical colonoscopy videos as well as on textured virtual colonoscopy videos with clinician-verified haustral fold annotations. All code and scripts to reproduce the experiments of this paper will be made available via our Computational Endoscopy Platform at https://github.com/nadeemlab/CEP. △ Less

Submitted 26 August, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

Comments: MICCAI 2021 (Early Accept) (Oral Presentation), *Saad Nadeem and Shawn Mathew contributed equally

arXiv:2104.01304 [pdf, other]

Diarization of Legal Proceedings. Identifying and Transcribing Judicial Speech from Recorded Court Audio

Authors: Jeffrey Tumminia, Amanda Kuznecov, Sophia Tsilerides, Ilana Weinstein, Brian McFee, Michael Picheny, Aaron R. Kaufman

Abstract: United States Courts make audio recordings of oral arguments available as public record, but these recordings rarely include speaker annotations. This paper addresses the Speech Audio Diarization problem, answering the question of "Who spoke when?" in the domain of judicial oral argument proceedings. We present a workflow for diarizing the speech of judges using audio recordings of oral arguments,… ▽ More United States Courts make audio recordings of oral arguments available as public record, but these recordings rarely include speaker annotations. This paper addresses the Speech Audio Diarization problem, answering the question of "Who spoke when?" in the domain of judicial oral argument proceedings. We present a workflow for diarizing the speech of judges using audio recordings of oral arguments, a process we call Reference-Dependent Speaker Verification. We utilize a speech embedding network trained with the Generalized End-to-End Loss to encode speech into d-vectors and a pre-defined reference audio library based on annotated data. We find that by encoding reference audio for speakers and full arguments and computing similarity scores we achieve a 13.8% Diarization Error Rate for speakers covered by the reference audio library on a held-out test set. We evaluate our method on the Supreme Court of the United States oral arguments, accessed through the Oyez Project, and outline future work for diarizing legal proceedings. A code repository for this research is available at github.com/JeffT13/rd-diarization △ Less

Submitted 2 April, 2021; originally announced April 2021.

Comments: Under review for InterSpeech 2021

arXiv:2101.07280 [pdf, other]

doi 10.1109/ISBI48211.2021.9433982

Visualizing Missing Surfaces In Colonoscopy Videos using Shared Latent Space Representations

Authors: Shawn Mathew, Saad Nadeem, Arie Kaufman

Abstract: Optical colonoscopy (OC), the most prevalent colon cancer screening tool, has a high miss rate due to a number of factors, including the geometry of the colon (haustral fold and sharp bends occlusions), endoscopist inexperience or fatigue, endoscope field of view, etc. We present a framework to visualize the missed regions per-frame during the colonoscopy, and provides a workable clinical solution… ▽ More Optical colonoscopy (OC), the most prevalent colon cancer screening tool, has a high miss rate due to a number of factors, including the geometry of the colon (haustral fold and sharp bends occlusions), endoscopist inexperience or fatigue, endoscope field of view, etc. We present a framework to visualize the missed regions per-frame during the colonoscopy, and provides a workable clinical solution. Specifically, we make use of 3D reconstructed virtual colonoscopy (VC) data and the insight that VC and OC share the same underlying geometry but differ in color, texture and specular reflections, embedded in the OC domain. A lossy unpaired image-to-image translation model is introduced with enforced shared latent space for OC and VC. This shared latent space captures the geometric information while deferring the color, texture, and specular information creation to additional Gaussian noise input. This additional noise input can be utilized to generate one-to-many mappings from VC to OC and OC to OC. The code, data and trained models will be released via our Computational Endoscopy Platform at https://github.com/nadeemlab/CEP. △ Less

Submitted 23 June, 2021; v1 submitted 18 January, 2021; originally announced January 2021.

Comments: IEEE International Symposium on Biomedical Imaging (ISBI) 2021, **Shawn Mathew and Saad Nadeem contributed equally

arXiv:2004.02956 [pdf, other]

Deblurring using Analysis-Synthesis Networks Pair

Authors: Adam Kaufman, Raanan Fattal

Abstract: Blind image deblurring remains a challenging problem for modern artificial neural networks. Unlike other image restoration problems, deblurring networks fail behind the performance of existing deblurring algorithms in case of uniform and 3D blur models. This follows from the diverse and profound effect that the unknown blur-kernel has on the deblurring operator. We propose a new architecture whi… ▽ More Blind image deblurring remains a challenging problem for modern artificial neural networks. Unlike other image restoration problems, deblurring networks fail behind the performance of existing deblurring algorithms in case of uniform and 3D blur models. This follows from the diverse and profound effect that the unknown blur-kernel has on the deblurring operator. We propose a new architecture which breaks the deblurring network into an analysis network which estimates the blur, and a synthesis network that uses this kernel to deblur the image. Unlike existing deblurring networks, this design allows us to explicitly incorporate the blur-kernel in the network's training. In addition, we introduce new cross-correlation layers that allow better blur estimations, as well as unique components that allow the estimate blur to control the action of the synthesis deblurring action. Evaluating the new approach over established benchmark datasets shows its ability to achieve state-of-the-art deblurring accuracy on various tests, as well as offer a major speedup in runtime. △ Less

Submitted 6 April, 2020; originally announced April 2020.

Journal ref: Computer Vision and Pattern Recognition (CVPR) 2020

arXiv:2003.12473 [pdf, other]

doi 10.1109/CVPR42600.2020.00475

Augmenting Colonoscopy using Extended and Directional CycleGAN for Lossy Image Translation

Authors: Shawn Mathew, Saad Nadeem, Sruti Kumari, Arie Kaufman

Abstract: Colorectal cancer screening modalities, such as optical colonoscopy (OC) and virtual colonoscopy (VC), are critical for diagnosing and ultimately removing polyps (precursors of colon cancer). The non-invasive VC is normally used to inspect a 3D reconstructed colon (from CT scans) for polyps and if found, the OC procedure is performed to physically traverse the colon via endoscope and remove these… ▽ More Colorectal cancer screening modalities, such as optical colonoscopy (OC) and virtual colonoscopy (VC), are critical for diagnosing and ultimately removing polyps (precursors of colon cancer). The non-invasive VC is normally used to inspect a 3D reconstructed colon (from CT scans) for polyps and if found, the OC procedure is performed to physically traverse the colon via endoscope and remove these polyps. In this paper, we present a deep learning framework, Extended and Directional CycleGAN, for lossy unpaired image-to-image translation between OC and VC to augment OC video sequences with scale-consistent depth information from VC, and augment VC with patient-specific textures, color and specular highlights from OC (e.g, for realistic polyp synthesis). Both OC and VC contain structural information, but it is obscured in OC by additional patient-specific texture and specular highlights, hence making the translation from OC to VC lossy. The existing CycleGAN approaches do not handle lossy transformations. To address this shortcoming, we introduce an extended cycle consistency loss, which compares the geometric structures from OC in the VC domain. This loss removes the need for the CycleGAN to embed OC information in the VC domain. To handle a stronger removal of the textures and lighting, a Directional Discriminator is introduced to differentiate the direction of translation (by creating paired information for the discriminator), as opposed to the standard CycleGAN which is direction-agnostic. Combining the extended cycle consistency loss and the Directional Discriminator, we show state-of-the-art results on scale-consistent depth inference for phantom, textured VC and for real polyp and normal colon video sequences. We also present results for realistic pendunculated and flat polyp synthesis from bumps introduced in 3D VC models. Code/models: https://github.com/nadeemlab/CEP. △ Less

Submitted 26 August, 2021; v1 submitted 27 March, 2020; originally announced March 2020.

Comments: CVPR 2020. **First two authors contributed equally to this work

arXiv:1810.09031 [pdf, other]

doi 10.1109/TVCG.2016.2542073

Spherical Parameterization Balancing Angle and Area Distortions

Authors: Saad Nadeem, Zhengyu Su, Wei Zeng, Arie Kaufman, Xianfeng Gu

Abstract: This work presents a novel framework for spherical mesh parameterization. An efficient angle-preserving spherical parameterization algorithm is introduced, which is based on dynamic Yamabe flow and the conformal welding method with solid theoretic foundation. An area-preserving spherical parameterization is also discussed, which is based on discrete optimal mass transport theory. Furthermore, a sp… ▽ More This work presents a novel framework for spherical mesh parameterization. An efficient angle-preserving spherical parameterization algorithm is introduced, which is based on dynamic Yamabe flow and the conformal welding method with solid theoretic foundation. An area-preserving spherical parameterization is also discussed, which is based on discrete optimal mass transport theory. Furthermore, a spherical parameterization algorithm, which is based on the polar decomposition method, balancing angle distortion and area distortion is presented. The algorithms are tested on 3D geometric data and the experiments demonstrate the efficiency and efficacy of the proposed methods. △ Less

Submitted 21 October, 2018; originally announced October 2018.

Comments: IEEE Transactions on Visualization and Computer Graphics, 23(6):1663-1676, 2017 (17 pages, 20 figures)

Journal ref: IEEE Trans. Vis. Comput. Graph., 23(6), pp.1663-1676, 2017

arXiv:1810.09012 [pdf, other]

doi 10.1109/VAST.2016.7883508

C2A: Crowd Consensus Analytics for Virtual Colonoscopy

Authors: Ji Hwan Park, Saad Nadeem, Seyedkoosha Mirhosseini, Arie Kaufman

Abstract: We present a medical crowdsourcing visual analytics platform called C{$^2$}A to visualize, classify and filter crowdsourced clinical data. More specifically, C$^2$A is used to build consensus on a clinical diagnosis by visualizing crowd responses and filtering out anomalous activity. Crowdsourcing medical applications have recently shown promise where the non-expert users (the crowd) were able to… ▽ More We present a medical crowdsourcing visual analytics platform called C{$^2$}A to visualize, classify and filter crowdsourced clinical data. More specifically, C$^2$A is used to build consensus on a clinical diagnosis by visualizing crowd responses and filtering out anomalous activity. Crowdsourcing medical applications have recently shown promise where the non-expert users (the crowd) were able to achieve accuracy similar to the medical experts. This has the potential to reduce interpretation/reading time and possibly improve accuracy by building a consensus on the findings beforehand and letting the medical experts make the final diagnosis. In this paper, we focus on a virtual colonoscopy (VC) application with the clinical technicians as our target users, and the radiologists acting as consultants and classifying segments as benign or malignant. In particular, C$^2$A is used to analyze and explore crowd responses on video segments, created from fly-throughs in the virtual colon. C$^2$A provides several interactive visualization components to build crowd consensus on video segments, to detect anomalies in the crowd data and in the VC video segments, and finally, to improve the non-expert user's work quality and performance by A/B testing for the optimal crowdsourcing platform and application-specific parameters. Case studies and domain experts feedback demonstrate the effectiveness of our framework in improving workers' output quality, the potential to reduce the radiologists' interpretation time, and hence, the potential to improve the traditional clinical workflow by marking the majority of the video segments as benign based on the crowd consensus. △ Less

Submitted 21 October, 2018; originally announced October 2018.

Comments: IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 21-30, 2016 (10 pages, 11 figures)

Journal ref: IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 21-30, 2016

arXiv:1810.08998 [pdf, other]

doi 10.1117/12.2216963

Visualization Framework for Colonoscopy Videos

Authors: Saad Nadeem, Arie Kaufman

Abstract: We present a visualization framework for annotating and comparing colonoscopy videos, where these annotations can then be used for semi-automatic report generation at the end of the procedure. Currently, there are approximately 14 million colonoscopies performed every year in the US. In this work, we create a visualization tool to deal with the deluge of colonoscopy videos in a more effective way.… ▽ More We present a visualization framework for annotating and comparing colonoscopy videos, where these annotations can then be used for semi-automatic report generation at the end of the procedure. Currently, there are approximately 14 million colonoscopies performed every year in the US. In this work, we create a visualization tool to deal with the deluge of colonoscopy videos in a more effective way. We present an interactive visualization framework for the annotation and tagging of colonoscopy videos in an easy and intuitive way. These annotations and tags can later be used for report generation for electronic medical records and for comparison at an individual as well as group level. We also present important use cases and medical expert feedback for our visualization framework. △ Less

Submitted 21 October, 2018; originally announced October 2018.

Comments: SPIE Medical Imaging, 2016 (7 pages, 5 figures)

Journal ref: SPIE Medical Imaging, Vol. 9786, p. 97861T, 2016

arXiv:1810.08850 [pdf, other]

doi 10.1109/TVCG.2016.2598791

Corresponding Supine and Prone Colon Visualization Using Eigenfunction Analysis and Fold Modeling

Authors: Saad Nadeem, Joseph Marino, Xianfeng Gu, Arie Kaufman

Abstract: We present a method for registration and visualization of corresponding supine and prone virtual colonoscopy scans based on eigenfunction analysis and fold modeling. In virtual colonoscopy, CT scans are acquired with the patient in two positions, and their registration is desirable so that physicians can corroborate findings between scans. Our algorithm performs this registration efficiently throu… ▽ More We present a method for registration and visualization of corresponding supine and prone virtual colonoscopy scans based on eigenfunction analysis and fold modeling. In virtual colonoscopy, CT scans are acquired with the patient in two positions, and their registration is desirable so that physicians can corroborate findings between scans. Our algorithm performs this registration efficiently through the use of Fiedler vector representation (the second eigenfunction of the Laplace-Beltrami operator). This representation is employed to first perform global registration of the two colon positions. The registration is then locally refined using the haustral folds, which are automatically segmented using the 3D level sets of the Fiedler vector. The use of Fiedler vectors and the segmented folds presents a precise way of visualizing corresponding regions across datasets and visual modalities. We present multiple methods of visualizing the results, including 2D flattened rendering and the corresponding 3D endoluminal views. The precise fold modeling is used to automatically find a suitable cut for the 2D flattening, which provides a less distorted visualization. Our approach is robust, and we demonstrate its efficiency and efficacy by showing matched views on both the 2D flattened colons and in the 3D endoluminal view. We analytically evaluate the results by measuring the distance between features on the registered colons, and we also assess our fold segmentation against 20 manually labeled datasets. We have compared our results analytically to previous methods, and have found our method to achieve superior results. We also prove the hot spots conjecture for modeling cylindrical topology using Fiedler vector representation, which allows our approach to be used for general cylindrical geometry modeling and feature extraction. △ Less

Submitted 20 October, 2018; originally announced October 2018.

Comments: IEEE Transactions on Visualization and Computer Graphics, 23(1):751-760, 2017 (11 pages, 13 figures)

Journal ref: IEEE Transactions on Visualization and Computer Graphics, 23(1):751-760, 2017

arXiv:1810.05220 [pdf, other]

doi 10.1109/TVCG.2018.2856744

FeatureLego: Volume Exploration Using Exhaustive Clustering of Super-Voxels

Authors: Shreeraj Jadhav, Saad Nadeem, Arie Kaufman

Abstract: We present a volume exploration framework, FeatureLego, that uses a novel voxel clustering approach for efficient selection of semantic features. We partition the input volume into a set of compact super-voxels that represent the finest selection granularity. We then perform an exhaustive clustering of these super-voxels using a graph-based clustering method. Unlike the prevalent brute-force param… ▽ More We present a volume exploration framework, FeatureLego, that uses a novel voxel clustering approach for efficient selection of semantic features. We partition the input volume into a set of compact super-voxels that represent the finest selection granularity. We then perform an exhaustive clustering of these super-voxels using a graph-based clustering method. Unlike the prevalent brute-force parameter sampling approaches, we propose an efficient algorithm to perform this exhaustive clustering. By computing an exhaustive set of clusters, we aim to capture as many boundaries as possible and ensure that the user has sufficient options for efficiently selecting semantically relevant features. Furthermore, we merge all the computed clusters into a single tree of meta-clusters that can be used for hierarchical exploration. We implement an intuitive user-interface to interactively explore volumes using our clustering approach. Finally, we show the effectiveness of our framework on multiple real-world datasets of different modalities. △ Less

Submitted 19 October, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

Comments: IEEE Transactions on Visualization and Computer Graphics, 2018 (12 pages, 11 figures). Supplementary video demonstrating FeatureLego can be found here: https://www.youtube.com/watch?v=y_a3VnACXfE

Journal ref: IEEE Transactions on Visualization and Computer Graphics (Volume: 25, Issue: 9, Pages: 2725 - 2737, Sept. 1 2019)

arXiv:1809.06442 [pdf, other]

doi 10.1109/TVCG.2017.2772237

LMap: Shape-Preserving Local Mappings for Biomedical Visualization

Authors: Saad Nadeem, Xianfeng Gu, Arie Kaufman

Abstract: Visualization of medical organs and biological structures is a challenging task because of their complex geometry and the resultant occlusions. Global spherical and planar mapping techniques simplify the complex geometry and resolve the occlusions to aid in visualization. However, while resolving the occlusions these techniques do not preserve the geometric context, making them less suitable for m… ▽ More Visualization of medical organs and biological structures is a challenging task because of their complex geometry and the resultant occlusions. Global spherical and planar mapping techniques simplify the complex geometry and resolve the occlusions to aid in visualization. However, while resolving the occlusions these techniques do not preserve the geometric context, making them less suitable for mission-critical biomedical visualization tasks. In this paper, we present a shape-preserving local mapping technique for resolving occlusions locally while preserving the overall geometric context. More specifically, we present a novel visualization algorithm, LMap, for conformally parameterizing and deforming a selected local region-of-interest (ROI) on an arbitrary surface. The resultant shape-preserving local mappings help to visualize complex surfaces while preserving the overall geometric context. The algorithm is based on the robust and efficient extrinsic Ricci flow technique, and uses the dynamic Ricci flow algorithm to guarantee the existence of a local map for a selected ROI on an arbitrary surface. We show the effectiveness and efficacy of our method in three challenging use cases: (1) multimodal brain visualization, (2) optimal coverage of virtual colonoscopy centerline flythrough, and (3) molecular surface visualization. △ Less

Submitted 25 October, 2018; v1 submitted 17 September, 2018; originally announced September 2018.

Comments: IEEE Transactions on Visualization and Computer Graphics, 24(12): 3111-3122, 2018 (12 pages, 11 figures)

Journal ref: S. Nadeem, X. Gu, and A. Kaufman. LMap: Shape-Preserving Local Mappings for Biomedical Visualization. IEEE Transactions on Visualization and Computer Graphics, 24(12):3111-3122, 2018

arXiv:1809.06417 [pdf, other]

Radiative Transport Based Flame Volume Reconstruction from Videos

Authors: Liang Shen, Dengming Zhu, Saad Nadeem, Zhaoqi Wang, Arie Kaufman

Abstract: We introduce a novel approach for flame volume reconstruction from videos using inexpensive charge-coupled device (CCD) consumer cameras. The approach includes an economical data capture technique using inexpensive CCD cameras. Leveraging the smear feature of the CCD chip, we present a technique for synchronizing CCD cameras while capturing flame videos from different views. Our reconstruction is… ▽ More We introduce a novel approach for flame volume reconstruction from videos using inexpensive charge-coupled device (CCD) consumer cameras. The approach includes an economical data capture technique using inexpensive CCD cameras. Leveraging the smear feature of the CCD chip, we present a technique for synchronizing CCD cameras while capturing flame videos from different views. Our reconstruction is based on the radiative transport equation which enables complex phenomena such as emission, extinction, and scattering to be used in the rendering process. Both the color intensity and temperature reconstructions are implemented using the CUDA parallel computing framework, which provides real-time performance and allows visualization of reconstruction results after every iteration. We present the results of our approach using real captured data and physically-based simulated data. Finally, we also compare our approach against the other state-of-the-art flame volume reconstruction methods and demonstrate the efficacy and efficiency of our approach in four different applications: (1) rendering of reconstructed flames in virtual environments, (2) rendering of reconstructed flames in augmented reality, (3) flame stylization, and (4) reconstruction of other semitransparent phenomena. △ Less

Submitted 17 September, 2018; originally announced September 2018.

Comments: IEEE Transactions on Visualization and Computer Graphics, 24(7): 2209-2222, 2018

arXiv:1809.06408 [pdf, other]

Crowd-Assisted Polyp Annotation of Virtual Colonoscopy Videos

Authors: Ji Hwan Park, Saad Nadeem, Joseph Marino, Kevin Baker, Matthew Barish, Arie Kaufman

Abstract: Virtual colonoscopy (VC) allows a radiologist to navigate through a 3D colon model reconstructed from a computed tomography scan of the abdomen, looking for polyps, the precursors of colon cancer. Polyps are seen as protrusions on the colon wall and haustral folds, visible in the VC fly-through videos. A complete review of the colon surface requires full navigation from the rectum to the cecum in… ▽ More Virtual colonoscopy (VC) allows a radiologist to navigate through a 3D colon model reconstructed from a computed tomography scan of the abdomen, looking for polyps, the precursors of colon cancer. Polyps are seen as protrusions on the colon wall and haustral folds, visible in the VC fly-through videos. A complete review of the colon surface requires full navigation from the rectum to the cecum in antegrade and retrograde directions, which is a tedious task that takes an average of 30 minutes. Crowdsourcing is a technique for non-expert users to perform certain tasks, such as image or video annotation. In this work, we use crowdsourcing for the examination of complete VC fly-through videos for polyp annotation by non-experts. The motivation for this is to potentially help the radiologist reach a diagnosis in a shorter period of time, and provide a stronger confirmation of the eventual diagnosis. The crowdsourcing interface includes an interactive tool for the crowd to annotate suspected polyps in the video with an enclosing box. Using our workflow, we achieve an overall polyps-per-patient sensitivity of 87.88% (95.65% for polyps $\geq$5mm and 70% for polyps $<$5mm). We also demonstrate the efficacy and effectiveness of a non-expert user in detecting and annotating polyps and discuss their possibility in aiding radiologists in VC examinations. △ Less

Submitted 17 September, 2018; originally announced September 2018.

Comments: 7 pages, SPIE Medical Imaging 2018

arXiv:1809.06402 [pdf, other]

Crowdsourcing Lung Nodules Detection and Annotation

Authors: Saeed Boorboor, Saad Nadeem, Ji Hwan Park, Kevin Baker, Arie Kaufman

Abstract: We present crowdsourcing as an additional modality to aid radiologists in the diagnosis of lung cancer from clinical chest computed tomography (CT) scans. More specifically, a complete workflow is introduced which can help maximize the sensitivity of lung nodule detection by utilizing the collective intelligence of the crowd. We combine the concept of overlapping thin-slab maximum intensity projec… ▽ More We present crowdsourcing as an additional modality to aid radiologists in the diagnosis of lung cancer from clinical chest computed tomography (CT) scans. More specifically, a complete workflow is introduced which can help maximize the sensitivity of lung nodule detection by utilizing the collective intelligence of the crowd. We combine the concept of overlapping thin-slab maximum intensity projections (TS-MIPs) and cine viewing to render short videos that can be outsourced as an annotation task to the crowd. These videos are generated by linearly interpolating overlapping TS-MIPs of CT slices through the depth of each quadrant of a patient's lung. The resultant videos are outsourced to an online community of non-expert users who, after a brief tutorial, annotate suspected nodules in these video segments. Using our crowdsourcing workflow, we achieved a lung nodule detection sensitivity of over 90% for 20 patient CT datasets (containing 178 lung nodules with sizes between 1-30mm), and only 47 false positives from a total of 1021 annotations on nodules of all sizes (96% sensitivity for nodules$>$4mm). These results show that crowdsourcing can be a robust and scalable modality to aid radiologists in screening for lung cancer, directly or in combination with computer-aided detection (CAD) algorithms. For CAD algorithms, the presented workflow can provide highly accurate training data to overcome the high false-positive rate (per scan) problem. We also provide, for the first time, analysis on nodule size and position which can help improve CAD algorithms. △ Less

Submitted 17 September, 2018; originally announced September 2018.

Comments: 7 pages, SPIE Medical Imaging 2018

arXiv:1801.00644 [pdf, other]

Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality

Authors: Reagan Mozer, Luke Miratrix, Aaron Russell Kaufman, L. Jason Anastasopoulos

Abstract: Matching for causal inference is a well-studied problem, but standard methods fail when the units to match are text documents: the high-dimensional and rich nature of the data renders exact matching infeasible, causes propensity scores to produce incomparable matches, and makes assessing match quality difficult. In this paper, we characterize a framework for matching text documents that decomposes… ▽ More Matching for causal inference is a well-studied problem, but standard methods fail when the units to match are text documents: the high-dimensional and rich nature of the data renders exact matching infeasible, causes propensity scores to produce incomparable matches, and makes assessing match quality difficult. In this paper, we characterize a framework for matching text documents that decomposes existing methods into: (1) the choice of text representation, and (2) the choice of distance metric. We investigate how different choices within this framework affect both the quantity and quality of matches identified through a systematic multifactor evaluation experiment using human subjects. Altogether we evaluate over 100 unique text matching methods along with 5 comparison methods taken from the literature. Our experimental results identify methods that generate matches with higher subjective match quality than current state-of-the-art techniques. We enhance the precision of these results by developing a predictive model to estimate the match quality of pairs of text documents as a function of our various distance scores. This model, which we find successfully mimics human judgment, also allows for approximate and unsupervised evaluation of new procedures. We then employ the identified best method to illustrate the utility of text matching in two applications. First, we engage with a substantive debate in the study of media bias by using text matching to control for topic selection when comparing news articles from thirteen news sources. We then show how conditioning on text data leads to more precise causal inferences in an observational study examining the effects of a medical intervention. △ Less

Submitted 13 March, 2019; v1 submitted 2 January, 2018; originally announced January 2018.

arXiv:1708.06034 [pdf, other]

Eccentricity Effects on Blur and Depth Perception

Authors: Qi Sun, Fu-Chung Huang, Li-Yi Wei, David Luebke, Arie Kaufman, Joohwan Kim

Abstract: Foveation and focus cue are the two most discussed topics on vision in designing near-eye displays. Foveation reduces rendering load by omitting spatial details in the content that the peripheral vision cannot appreciate; Providing richer focal cue can resolve vergence-accommodation conflict thereby lessening visual discomfort in using near-eye displays. We performed two psychophysical experiments… ▽ More Foveation and focus cue are the two most discussed topics on vision in designing near-eye displays. Foveation reduces rendering load by omitting spatial details in the content that the peripheral vision cannot appreciate; Providing richer focal cue can resolve vergence-accommodation conflict thereby lessening visual discomfort in using near-eye displays. We performed two psychophysical experiments to investigate the relationship between foveation and focus cue. The first study measured blur discrimination sensitivity as a function of visual eccentricity, where we found discrimination thresholds significantly lower than previously reported. The second study measured depth discrimination threshold where we found a clear dependency on visual eccentricity. We discuss the results from the two studies and suggest further investigation. △ Less

Submitted 6 November, 2019; v1 submitted 20 August, 2017; originally announced August 2017.

arXiv:1609.01329 [pdf, other]

doi 10.1117/12.2216996

Depth Reconstruction and Computer-Aided Polyp Detection in Optical Colonoscopy Video Frames

Authors: Saad Nadeem, Arie Kaufman

Abstract: We present a computer-aided detection algorithm for polyps in optical colonoscopy images. Polyps are the precursors to colon cancer. In the US alone, more than 14 million optical colonoscopies are performed every year, mostly to screen for polyps. Optical colonoscopy has been shown to have an approximately 25% polyp miss rate due to the convoluted folds and bends present in the colon. In this work… ▽ More We present a computer-aided detection algorithm for polyps in optical colonoscopy images. Polyps are the precursors to colon cancer. In the US alone, more than 14 million optical colonoscopies are performed every year, mostly to screen for polyps. Optical colonoscopy has been shown to have an approximately 25% polyp miss rate due to the convoluted folds and bends present in the colon. In this work, we present an automatic detection algorithm to detect these polyps in the optical colonoscopy images. We use a machine learning algorithm to infer a depth map for a given optical colonoscopy image and then use a detailed pre-built polyp profile to detect and delineate the boundaries of polyps in this given image. We have achieved the best recall of 84.0% and the best specificity value of 83.4%. △ Less

Submitted 10 September, 2016; v1 submitted 5 September, 2016; originally announced September 2016.

Comments: **The title has been modified to highlight the contributions more clearly. The original title is: "Computer-Aided Detection of Polyps in Optical Colonoscopy Images". Keywords: Machine learning, computer-aided detection, segmentation, endoscopy, colonoscopy, videos, polyp, detection, medical imaging, depth maps, 3D, reconstruction, computed tomography, virtual colonoscopy, colorectal cancer, SPIE Medical Imaging, 2016

arXiv:1608.00936 [pdf, other]

doi 10.1117/12.2217003

Multimodal Brain Visualization

Authors: Saad Nadeem, Arie Kaufman

Abstract: Current connectivity diagrams of human brain image data are either overly complex or overly simplistic. In this work we introduce simple yet accurate interactive visual representations of multiple brain image structures and the connectivity among them. We map cortical surfaces extracted from human brain magnetic resonance imaging (MRI) data onto 2D surfaces that preserve shape (angle), extent (are… ▽ More Current connectivity diagrams of human brain image data are either overly complex or overly simplistic. In this work we introduce simple yet accurate interactive visual representations of multiple brain image structures and the connectivity among them. We map cortical surfaces extracted from human brain magnetic resonance imaging (MRI) data onto 2D surfaces that preserve shape (angle), extent (area), and spatial (neighborhood) information for 2D (circular disk) and 3D (spherical) mapping, split these surfaces into separate patches, and cluster functional and diffusion tractography MRI connections between pairs of these patches. The resulting visualizations are easier to compute on and more visually intuitive to interact with than the original data, and facilitate simultaneous exploration of multiple data sets, modalities, and statistical maps. △ Less

Submitted 1 September, 2016; v1 submitted 2 August, 2016; originally announced August 2016.

Comments: SPIE Medical Imaging 2016, Proc. SPIE Medical Imaging: Biomedical Applications in Molecular, Structural, and Functional Imaging, 2016

Journal ref: SPIE Medical Imaging, pp. 97881Y-97881Y. 2016

arXiv:1608.00921 [pdf, other]

Registration of Volumetric Prostate Scans using Curvature Flow

Authors: Saad Nadeem, Rui Shi, Joseph Marino, Wei Zeng, Xianfeng Gu, Arie Kaufman

Abstract: Radiological imaging of the prostate is becoming more popular among researchers and clinicians in searching for diseases, primarily cancer. Scans might be acquired with different equipment or at different times for prognosis monitoring, with patient movement between scans, resulting in multiple datasets that need to be registered. For these cases, we introduce a method for volumetric registration… ▽ More Radiological imaging of the prostate is becoming more popular among researchers and clinicians in searching for diseases, primarily cancer. Scans might be acquired with different equipment or at different times for prognosis monitoring, with patient movement between scans, resulting in multiple datasets that need to be registered. For these cases, we introduce a method for volumetric registration using curvature flow. Multiple prostate datasets are mapped to canonical solid spheres, which are in turn aligned and registered through the use of identified landmarks on or within the gland. Theoretical proof and experimental results show that our method produces homeomorphisms with feature constraints. We provide thorough validation of our method by registering prostate scans of the same patient in different orientations, from different days and using different modes of MRI. Our method also provides the foundation for a general group-wise registration using a standard reference, defined on the complex plane, for any input. In the present context, this can be used for registering as many scans as needed for a single patient or different patients on the basis of age, weight or even malignant and non-malignant attributes to study the differences in general population. Though we present this technique with a specific application to the prostate, it is generally applicable for volumetric registration problems. △ Less

Submitted 2 August, 2016; originally announced August 2016.

Comments: Technical Report Manuscript prepared: July 2014 --> (Keywords: Shape registration, geometry-based techniques, medical visualization, mathematical foundations for visualization)

arXiv:1606.06702 [pdf, other]

doi 10.1117/12.2252281

Crowdsourcing for Identification of Polyp-Free Segments in Virtual Colonoscopy Videos

Authors: Ji Hwan Park, Seyedkoosha Mirhosseini, Saad Nadeem, Joseph Marino, Arie Kaufman, Kevin Baker, Matthew Barish

Abstract: Virtual colonoscopy (VC) allows a physician to virtually navigate within a reconstructed 3D colon model searching for colorectal polyps. Though VC is widely recognized as a highly sensitive and specific test for identifying polyps, one limitation is the reading time, which can take over 30 minutes per patient. Large amounts of the colon are often devoid of polyps, and a way of identifying these po… ▽ More Virtual colonoscopy (VC) allows a physician to virtually navigate within a reconstructed 3D colon model searching for colorectal polyps. Though VC is widely recognized as a highly sensitive and specific test for identifying polyps, one limitation is the reading time, which can take over 30 minutes per patient. Large amounts of the colon are often devoid of polyps, and a way of identifying these polyp-free segments could be of valuable use in reducing the required reading time for the interrogating radiologist. To this end, we have tested the ability of the collective crowd intelligence of non-expert workers to identify polyp candidates and polyp-free regions. We presented twenty short videos flying through a segment of a virtual colon to each worker, and the crowd was asked to determine whether or not a possible polyp was observed within that video segment. We evaluated our framework on Amazon Mechanical Turk and found that the crowd was able to achieve a sensitivity of 80.0% and specificity of 86.5% in identifying video segments which contained a clinically proven polyp. Since each polyp appeared in multiple consecutive segments, all polyps were in fact identified. Using the crowd results as a first pass, 80% of the video segments could in theory be skipped by the radiologist, equating to a significant time savings and enabling more VC examinations to be performed. △ Less

Submitted 24 July, 2017; v1 submitted 21 June, 2016; originally announced June 2016.

Journal ref: Proc. SPIE Medical Imaging 2017, 101380V

arXiv:1307.1739

Anatomical Feature-guided Volumeric Registration of Multimodal Prostate MRI

Authors: Xin Zhao, Arie Kaufman

Abstract: Radiological imaging of prostate is becoming more popular among researchers and clinicians in searching for diseases, primarily cancer. Scans might be acquired at different times, with patient movement between scans, or with different equipment, resulting in multiple datasets that need to be registered. For this issue, we introduce a registration method using anatomical feature-guided mutual infor… ▽ More Radiological imaging of prostate is becoming more popular among researchers and clinicians in searching for diseases, primarily cancer. Scans might be acquired at different times, with patient movement between scans, or with different equipment, resulting in multiple datasets that need to be registered. For this issue, we introduce a registration method using anatomical feature-guided mutual information. Prostate scans of the same patient taken in three different orientations are first aligned for the accurate detection of anatomical features in 3D. Then, our pipeline allows for multiple modalities registration through the use of anatomical features, such as the interior urethra of prostate and gland utricle, in a bijective way. The novelty of this approach is the application of anatomical features as the pre-specified corresponding landmarks for prostate registration. We evaluate the registration results through both artificial and clinical datasets. Registration accuracy is evaluated by performing statistical analysis of local intensity differences or spatial differences of anatomical landmarks between various MR datasets. Evaluation results demonstrate that our method statistics-significantly improves the quality of registration. Although this strategy is tested for MRI-guided brachytherapy, the preliminary results from these experiments suggest that it can be also applied to other settings such as transrectal ultrasound-guided or CT-guided therapy, where the integration of preoperative MRI may have a significant impact upon treatment planning and guidance. △ Less

Submitted 3 November, 2013; v1 submitted 5 July, 2013; originally announced July 2013.

Comments: This paper has been withdrawn by the author due to publication

arXiv:1206.1148 [pdf, other]

From individual to population: Challenges in Medical Visualization

Authors: Charl P. Botha, Bernhard Preim, Arie Kaufman, Shigeo Takahashi, Anders Ynnerman

Abstract: In this paper, we first give a high-level overview of medical visualization development over the past 30 years, focusing on key developments and the trends that they represent. During this discussion, we will refer to a number of key papers that we have also arranged on the medical visualization research timeline. Based on the overview and our observations of the field, we then identify and discus… ▽ More In this paper, we first give a high-level overview of medical visualization development over the past 30 years, focusing on key developments and the trends that they represent. During this discussion, we will refer to a number of key papers that we have also arranged on the medical visualization research timeline. Based on the overview and our observations of the field, we then identify and discuss the medical visualization research challenges that we foresee for the coming decade. △ Less

Submitted 7 August, 2012; v1 submitted 6 June, 2012; originally announced June 2012.

Comments: Improvements based on comments by reviewers: Typos and layout issues fixed. Added two more multi-modal volume rendering references to 2.1. Added more detail on Virtual Colonoscopy to 2.2

Showing 1–35 of 35 results for author: Kaufman, A