Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–41 of 41 results for author: Bennett, P

Searching in archive cs. Search in all archives.
.
  1. MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

    Authors: Qi Chen, Xiubo Geng, Corby Rosset, Carolyn Buractaon, Jingwen Lu, Tao Shen, Kun Zhou, Chenyan Xiong, Yeyun Gong, Paul Bennett, Nick Craswell, Xing Xie, Fan Yang, Bryan Tower, Nikhil Rao, Anlei Dong, Wenqi Jiang, Zheng Liu, Mingqin Li, Chuanjie Liu, Zengzhong Li, Rangan Majumder, Jennifer Neville, Andy Oakley, Knut Magne Risvik , et al. (6 additional authors not shown)

    Abstract: Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals. In this paper, we introduce MS MARCO Web Search, the first large-scale information-rich web dataset, featuring millions of real clicked query-document labels. This dataset closely mimics real-world web document and query distribution, provides rich information for various kinds of down… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures, for associated dataset, see http://github.com/microsoft/MS-MARCO-Web-Search

  2. arXiv:2312.02206  [pdf, other

    cs.AI cs.CL

    Axiomatic Preference Modeling for Longform Question Answering

    Authors: Corby Rosset, Guoqing Zheng, Victor Dibia, Ahmed Awadallah, Paul Bennett

    Abstract: The remarkable abilities of large language models (LLMs) like GPT-4 partially stem from post-training processes like Reinforcement Learning from Human Feedback (RLHF) involving human preferences encoded in a reward model. However, these reward models (RMs) often lack direct knowledge of why, or under what principles, the preferences annotations were made. In this study, we identify principles that… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

    Comments: Accepted to EMNLP 2023

  3. arXiv:2311.02272  [pdf, other

    cs.DC

    Enabling Cross-Language Data Integration and Scalable Analytics in Decentralized Finance

    Authors: Conor Flynn, Kristin P. Bennett, John S. Erickson, Aaron Green, Oshani Seneviratne

    Abstract: With the agile development process of most academic and corporate entities, designing a robust computational back-end system that can support their ever-changing data needs is a constantly evolving challenge. We propose the implementation of a data and language-agnostic system design that handles different data schemes and sources while subsequently providing researchers and developers a way to co… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: 10 pages

    ACM Class: C.3

  4. arXiv:2305.00970  [pdf, other

    cs.CV

    ArK: Augmented Reality with Knowledge Interactive Emergent Ability

    Authors: Qiuyuan Huang, Jae Sung Park, Abhinav Gupta, Paul Bennett, Ran Gong, Subhojit Som, Baolin Peng, Owais Khan Mohammed, Chris Pal, Yejin Choi, Jianfeng Gao

    Abstract: Despite the growing adoption of mixed reality and interactive AI agents, it remains challenging for these systems to generate high quality 2D/3D scenes in unseen environments. The common practice requires deploying an AI agent to collect large amounts of data for model training for every new task. This process is costly, or even impossible, for many domains. In this study, we develop an infinite a… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Report number: EFI-94-11

  5. arXiv:2304.05524  [pdf, other

    cs.LG cs.CL

    Understanding Causality with Large Language Models: Feasibility and Opportunities

    Authors: Cheng Zhang, Stefan Bauer, Paul Bennett, Jiangfeng Gao, Wenbo Gong, Agrin Hilmkil, Joel Jennings, Chao Ma, Tom Minka, Nick Pawlowski, James Vaughan

    Abstract: We assess the ability of large language models (LLMs) to answer causal questions by analyzing their strengths and weaknesses against three types of causal question. We believe that current LLMs can answer causal questions with existing causal knowledge as combined domain experts. However, they are not yet able to provide satisfactory answers for discovering new knowledge or for high-stakes decisio… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  6. arXiv:2302.12358  [pdf, ps, other

    math.PR cs.DM math.CO

    Extending Wormald's Differential Equation Method to One-sided Bounds

    Authors: Patrick Bennett, Calum MacRury

    Abstract: In this note, we formulate a ``one-sided'' version of Wormald's differential equation method. In the standard ``two-sided'' method, one is given a family of random variables which evolve over time and which satisfy some conditions including a tight estimate of the expected change in each variable over one time step. These estimates for the expected one-step changes suggest that the variables ought… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    ACM Class: G.3; G.2.1

  7. arXiv:2302.03754  [pdf, other

    cs.CL

    Augmenting Zero-Shot Dense Retrievers with Plug-in Mixture-of-Memories

    Authors: Suyu Ge, Chenyan Xiong, Corby Rosset, Arnold Overwijk, Jiawei Han, Paul Bennett

    Abstract: In this paper we improve the zero-shot generalization ability of language models via Mixture-Of-Memory Augmentation (MoMA), a mechanism that retrieves augmentation documents from multiple information corpora ("external memories"), with the option to "plug in" new memory at inference time. We develop a joint learning mechanism that trains the augmentation component with latent labels derived from t… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

  8. arXiv:2208.03443  [pdf, other

    cs.HC

    Imagining Future Digital Assistants at Work: A Study of Task Management Needs

    Authors: Yonchanok Khaokaew, Indigo Holcombe-James, Mohammad Saiedur Rahaman, Jonathan Liono, Johanne R. Trippas, Damiano Spina, Nicholas Belkin, Peter Bailey, Paul N. Bennett, Yongli Ren, Mark Sanderson, Falk Scholer, Ryen W. White, Flora D. Salim

    Abstract: Digital Assistants (DAs) can support workers in the workplace and beyond. However, target user needs are not fully understood, and the functions that workers would ideally want a DA to support require further study. A richer understanding of worker needs could help inform the design of future DAs. We investigate user needs of future workplace DAs using data from a user study of 40 workers over a f… ▽ More

    Submitted 6 August, 2022; originally announced August 2022.

    Comments: 59 pages

  9. arXiv:2204.06644  [pdf, other

    cs.LG cs.AI cs.CL

    METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals

    Authors: Payal Bajaj, Chenyan Xiong, Guolin Ke, Xiaodong Liu, Di He, Saurabh Tiwary, Tie-Yan Liu, Paul Bennett, Xia Song, Jianfeng Gao

    Abstract: We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model. Originated in ELECTRA, this training strategy has demonstrated sample-efficiency to pretrain models at the scale of hundreds of millions of parameters. In this work, we conduct a comprehensive empirical study, and propose a recipe, namely "Model generated d… ▽ More

    Submitted 16 April, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Update details in scaled initialization and add acknowledgement

  10. arXiv:2204.04353  [pdf, other

    cs.CL cs.AI cs.CY

    Should we tweet this? Generative response modeling for predicting reception of public health messaging on Twitter

    Authors: Abraham Sanders, Debjani Ray-Majumder, John S. Erickson, Kristin P. Bennett

    Abstract: The way people respond to messaging from public health organizations on social media can provide insight into public perceptions on critical health issues, especially during a global crisis such as COVID-19. It could be valuable for high-impact organizations such as the US Centers for Disease Control and Prevention (CDC) or the World Health Organization (WHO) to understand how these perceptions im… ▽ More

    Submitted 13 May, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: Accepted at ACM WebSci 2022

    ACM Class: I.2.7

  11. arXiv:2204.03243  [pdf, other

    cs.CL cs.LG

    Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

    Authors: Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song

    Abstract: We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators. Following ELECTRA-style pretraining, the main encoder is trained as a discriminator to detect replaced tokens generated by auxiliary masked language models (MLMs). Different from ELECTRA which trains one MLM as the generator, we jointly t… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: ICLR 2022. (Code and Models: https://github.com/microsoft/AMOS)

  12. arXiv:2203.04462  [pdf, other

    cs.LG

    Downstream Fairness Caveats with Synthetic Healthcare Data

    Authors: Karan Bhanot, Ioana Baldini, Dennis Wei, Jiaming Zeng, Kristin P. Bennett

    Abstract: This paper evaluates synthetically generated healthcare data for biases and investigates the effect of fairness mitigation techniques on utility-fairness. Privacy laws limit access to health data such as Electronic Medical Records (EMRs) to preserve patient privacy. Albeit essential, these laws hinder research reproducibility. Synthetic data is a viable solution that can enable access to data simi… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

  13. arXiv:2201.05176  [pdf, other

    cs.IR cs.CL

    Neural Approaches to Conversational Information Retrieval

    Authors: Jianfeng Gao, Chenyan Xiong, Paul Bennett, Nick Craswell

    Abstract: A conversational information retrieval (CIR) system is an information retrieval (IR) system with a conversational interface which allows users to interact with the system to seek information via multi-turn conversations of natural language, in spoken or written form. Recent progress in deep learning has brought tremendous improvements in natural language processing (NLP) and conversational AI, lea… ▽ More

    Submitted 13 January, 2022; originally announced January 2022.

    Comments: Book Draft

  14. arXiv:2111.09525  [pdf, other

    cs.CL

    SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization

    Authors: Philippe Laban, Tobias Schnabel, Paul N. Bennett, Marti A. Hearst

    Abstract: In the summarization domain, a key requirement for summaries is to be factually consistent with the input document. Previous work has found that natural language inference (NLI) models do not perform competitively when applied to inconsistency detection. In this work, we revisit the use of NLI for inconsistency detection, finding that past work suffered from a mismatch in input granularity between… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

    Comments: TACL pre-MIT Press publication version; 11 pages, 2 figures, 5 tables

  15. arXiv:2110.07581  [pdf, other

    cs.IR cs.CL cs.LG

    Zero-Shot Dense Retrieval with Momentum Adversarial Domain Invariant Representations

    Authors: Ji Xin, Chenyan Xiong, Ashwin Srinivasan, Ankita Sharma, Damien Jose, Paul N. Bennett

    Abstract: Dense retrieval (DR) methods conduct text retrieval by first encoding texts in the embedding space and then matching them by nearest neighbor search. This requires strong locality properties from the representation space, i.e, the close allocations of each small group of relevant texts, which are hard to generalize to domains without sufficient training data. In this paper, we aim to improve the g… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

  16. arXiv:2107.03444  [pdf, other

    cs.CL

    Keep it Simple: Unsupervised Simplification of Multi-Paragraph Text

    Authors: Philippe Laban, Tobias Schnabel, Paul Bennett, Marti A. Hearst

    Abstract: This work presents Keep it Simple (KiS), a new approach to unsupervised text simplification which learns to balance a reward across three properties: fluency, salience and simplicity. We train the model with a novel algorithm to optimize the reward (k-SCST), in which the model proposes several candidate simplifications, computes each candidate's reward, and encourages candidates that outperform th… ▽ More

    Submitted 7 July, 2021; originally announced July 2021.

    Comments: Accepted at ACL-IJCNLP 2021, 14 pages, 7 figures

    Journal ref: Association for Computational Linguistics (2021)

  17. arXiv:2106.13375  [pdf, other

    cs.IR cs.CL cs.DL

    Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

    Authors: Yu Wang, Jinchao Li, Tristan Naumann, Chenyan Xiong, Hao Cheng, Robert Tinn, Cliff Wong, Naoto Usuyama, Richard Rogahn, Zhihong Shen, Yang Qin, Eric Horvitz, Paul N. Bennett, Jianfeng Gao, Hoifung Poon

    Abstract: Information overload is a prevalent challenge in many high-value domains. A prominent case in point is the explosion of the biomedical literature on COVID-19, which swelled to hundreds of thousands of papers in a matter of months. In general, biomedical literature expands by two papers every minute, totalling over a million new papers every year. Search in the biomedical realm, and many other vert… ▽ More

    Submitted 16 September, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

    Comments: ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2021 Applied Data Science Track

  18. arXiv:2102.09206  [pdf, other

    cs.LG

    Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder

    Authors: Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, Tieyan Liu, Arnold Overwijk

    Abstract: Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space. Autoencoder-based language models are appealing in dense retrieval as they train the encoder to output high-quality embedding that can reconstruct the input texts. However, in this paper, we provide theoretical analyses and show empirically that an autoencoder language model with… ▽ More

    Submitted 16 September, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

  19. arXiv:2102.08473  [pdf, other

    cs.CL cs.LG

    COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

    Authors: Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song

    Abstract: We present a self-supervised learning framework, COCO-LM, that pretrains Language Models by COrrecting and COntrasting corrupted text sequences. Following ELECTRA-style pretraining, COCO-LM employs an auxiliary language model to corrupt text sequences, upon which it constructs two new tasks for pretraining the main model. The first token-level task, Corrective Language Modeling, is to detect and c… ▽ More

    Submitted 26 October, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021. (Code and Models: https://github.com/microsoft/COCO-LM)

  20. arXiv:2012.14862  [pdf, other

    cs.IR cs.CL

    Few-Shot Text Ranking with Meta Adapted Synthetic Weak Supervision

    Authors: Si Sun, Yingzhuo Qian, Zhenghao Liu, Chenyan Xiong, Kaitao Zhang, Jie Bao, Zhiyuan Liu, Paul Bennett

    Abstract: The effectiveness of Neural Information Retrieval (Neu-IR) often depends on a large scale of in-domain relevance training signals, which are not always available in real-world ranking scenarios. To democratize the benefits of Neu-IR, this paper presents MetaAdaptRank, a domain adaptive learning method that generalizes Neu-IR models from label-rich source domains to few-shot target domains. Drawing… ▽ More

    Submitted 2 June, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: 14 pages, accepted by ACL-IJCNLP 2021 (long paper)

  21. arXiv:2011.01580  [pdf, other

    cs.IR cs.CL

    CMT in TREC-COVID Round 2: Mitigating the Generalization Gaps from Web to Special Domain Search

    Authors: Chenyan Xiong, Zhenghao Liu, Si Sun, Zhuyun Dai, Kaitao Zhang, Shi Yu, Zhiyuan Liu, Hoifung Poon, Jianfeng Gao, Paul Bennett

    Abstract: Neural rankers based on deep pretrained language models (LMs) have been shown to improve many information retrieval benchmarks. However, these methods are affected by their the correlation between pretraining domain and target domain and rely on massive fine-tuning relevance labels. Directly applying pretraining methods to specific domains may result in suboptimal search quality because specific d… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: 5 pages, 3 figures, 2 tables

  22. arXiv:2007.00808  [pdf, other

    cs.IR cs.CL cs.LG

    Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval

    Authors: Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, Arnold Overwijk

    Abstract: Conducting text retrieval in a dense learned representation space has many intriguing advantages over sparse retrieval. Yet the effectiveness of dense retrieval (DR) often requires combination with sparse retrieval. In this paper, we identify that the main bottleneck is in the training mechanisms, where the negative instances used in training are not representative of the irrelevant documents in t… ▽ More

    Submitted 20 October, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

  23. arXiv:2007.00655  [pdf, ps, other

    cs.CL cs.LG stat.ML

    Knowledge-Aware Language Model Pretraining

    Authors: Corby Rosset, Chenyan Xiong, Minh Phan, Xia Song, Paul Bennett, Saurabh Tiwary

    Abstract: How much knowledge do pretrained language models hold? Recent research observed that pretrained transformers are adept at modeling semantics but it is unclear to what degree they grasp human knowledge, or how to ensure they do so. In this paper we incorporate knowledge-awareness in language model pretraining without changing the transformer architecture, inserting explicit knowledge layers, or add… ▽ More

    Submitted 4 February, 2021; v1 submitted 29 June, 2020; originally announced July 2020.

  24. arXiv:2006.05009  [pdf, other

    cs.IR

    Few-Shot Generative Conversational Query Rewriting

    Authors: Shi Yu, Jiahua Liu, Jingqin Yang, Chenyan Xiong, Paul Bennett, Jianfeng Gao, Zhiyuan Liu

    Abstract: Conversational query rewriting aims to reformulate a concise conversational query to a fully specified, context-independent query that can be effectively handled by existing information retrieval systems. This paper presents a few-shot generative approach to conversational query rewriting. We develop two methods, based on rules and self-supervised learning, to generate weak supervision data using… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

    Comments: Accepted by SIGIR 2020

  25. arXiv:2006.00166  [pdf, other

    cs.IR

    Analyzing and Learning from User Interactions for Search Clarification

    Authors: Hamed Zamani, Bhaskar Mitra, Everest Chen, Gord Lueck, Fernando Diaz, Paul N. Bennett, Nick Craswell, Susan T. Dumais

    Abstract: Asking clarifying questions in response to search queries has been recognized as a useful technique for revealing the underlying intent of the query. Clarification has applications in retrieval systems with different interfaces, from the traditional web search interfaces to the limited bandwidth interfaces as in speech-only and small screen devices. Generation and evaluation of clarifying question… ▽ More

    Submitted 29 May, 2020; originally announced June 2020.

    Comments: To appear in the Proceedings of SIGIR 2020

  26. arXiv:1911.06411  [pdf, other

    cs.LG cs.CY stat.ML

    Synthetic Event Time Series Health Data Generation

    Authors: Saloni Dash, Ritik Dutta, Isabelle Guyon, Adrien Pavao, Andrew Yale, Kristin P. Bennett

    Abstract: Synthetic medical data which preserves privacy while maintaining utility can be used as an alternative to real medical data, which has privacy costs and resource constraints associated with it. At present, most models focus on generating cross-sectional health data which is not necessarily representative of real data. In reality, medical data is longitudinal in nature, with a single patient having… ▽ More

    Submitted 27 November, 2019; v1 submitted 14 November, 2019; originally announced November 2019.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract

  27. Generic Intent Representation in Web Search

    Authors: Hongfei Zhang, Xia Song, Chenyan Xiong, Corby Rosset, Paul N. Bennett, Nick Craswell, Saurabh Tiwary

    Abstract: This paper presents GEneric iNtent Encoder (GEN Encoder) which learns a distributed representation space for user intent in search. Leveraging large scale user clicks from Bing search logs as weak supervision of user intent, GEN Encoder learns to map queries with shared clicks into similar embeddings end-to-end and then finetunes on multiple paraphrase tasks. Experimental results on an intrinsic e… ▽ More

    Submitted 24 July, 2019; originally announced July 2019.

    Journal ref: SIGIR 2019: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

  28. arXiv:1907.04358  [pdf, other

    cs.LO q-bio.PE stat.ML

    Making Study Populations Visible through Knowledge Graphs

    Authors: Shruthi Chari, Miao Qi, Nkcheniyere N. Agu, Oshani Seneviratne, James P. McCusker, Kristin P. Bennett, Amar K. Das, Deborah L. McGuinness

    Abstract: Treatment recommendations within Clinical Practice Guidelines (CPGs) are largely based on findings from clinical trials and case studies, referred to here as research studies, that are often based on highly selective clinical populations, referred to here as study cohorts. When medical practitioners apply CPG recommendations, they need to understand how well their patient population matches the ch… ▽ More

    Submitted 9 July, 2019; originally announced July 2019.

    Comments: 16 pages, 4 figures, 1 table, accepted to the ISWC 2019 Resources Track (https://iswc2019.semanticweb.org/call-for-resources-track-papers/)

  29. arXiv:1907.01901  [pdf, other

    q-bio.QM cs.LG stat.ML

    High-Throughput Machine Learning from Electronic Health Records

    Authors: Ross S. Kleiman, Paul S. Bennett, Peggy L. Peissig, Richard L. Berg, Zhaobin Kuang, Scott J. Hebbring, Michael D. Caldwell, David Page

    Abstract: The widespread digitization of patient data via electronic health records (EHRs) has created an unprecedented opportunity to use machine learning algorithms to better predict disease risk at the patient level. Although predictive models have previously been constructed for a few important diseases, such as breast cancer and myocardial infarction, we currently know very little about how accurately… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

  30. arXiv:1906.05912  [pdf, ps, other

    cs.LG stat.ML

    A Variational Autoencoder for Probabilistic Non-Negative Matrix Factorisation

    Authors: Steven Squires, Adam Prügel Bennett, Mahesan Niranjan

    Abstract: We introduce and demonstrate the variational autoencoder (VAE) for probabilistic non-negative matrix factorisation (PAE-NMF). We design a network which can perform non-negative matrix factorisation (NMF) and add in aspects of a VAE to make the coefficients of the latent space probabilistic. By restricting the weights in the final layer of the network to be non-negative and using the non-negative W… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

  31. arXiv:1902.01632  [pdf, ps, other

    cs.LG stat.ML

    Minimum description length as an objective function for non-negative matrix factorization

    Authors: Steven Squires, Adam Prugel Bennett, Mahesan Niranjan

    Abstract: Non-negative matrix factorization (NMF) is a dimensionality reduction technique which tends to produce a sparse representation of data. Commonly, the error between the actual and recreated matrices is used as an objective function, but this method may not produce the type of representation we desire as it allows for the complexity of the model to grow, constrained only by the size of the subspace… ▽ More

    Submitted 5 February, 2019; originally announced February 2019.

  32. arXiv:1811.11190  [pdf, other

    cs.LG cs.AI stat.ML

    Semantically-aware population health risk analyses

    Authors: Alexander New, Sabbir M. Rashid, John S. Erickson, Deborah L. McGuinness, Kristin P. Bennett

    Abstract: One primary task of population health analysis is the identification of risk factors that, for some subpopulation, have a significant association with some health condition. Examples include finding lifestyle factors associated with chronic diseases and finding genetic mutations associated with diseases in precision health. We develop a combined semantic and machine learning system that uses a hea… ▽ More

    Submitted 27 November, 2018; originally announced November 2018.

    Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:cs/0101200

  33. arXiv:1808.04880  [pdf, other

    stat.ML cs.LG stat.AP

    A Precision Environment-Wide Association Study of Hypertension via Supervised Cadre Models

    Authors: Alexander New, Kristin P. Bennett

    Abstract: We consider the problem in precision health of grouping people into subpopulations based on their degree of vulnerability to a risk factor. These subpopulations cannot be discovered with traditional clustering techniques because their quality is evaluated with a supervised metric: the ease of modeling a response variable over observations within them. Instead, we apply the supervised cadre model (… ▽ More

    Submitted 9 December, 2018; v1 submitted 14 August, 2018; originally announced August 2018.

    Comments: 9 pages, 5 figures

  34. arXiv:1807.07991  [pdf, other

    cs.AI

    Knowledge Integration for Disease Characterization: A Breast Cancer Example

    Authors: Oshani Seneviratne, Sabbir M. Rashid, Shruthi Chari, James P. McCusker, Kristin P. Bennett, James A. Hendler, Deborah L. McGuinness

    Abstract: With the rapid advancements in cancer research, the information that is useful for characterizing disease, staging tumors, and creating treatment and survivorship plans has been changing at a pace that creates challenges when physicians try to remain current. One example involves increasing usage of biomarkers when characterizing the pathologic prognostic stage of a breast tumor. We present our se… ▽ More

    Submitted 20 July, 2018; originally announced July 2018.

    Comments: International Semantic Web Conference (Resource Track)

  35. arXiv:1802.07578  [pdf, other

    cs.HC cs.IR

    Improving Recommender Systems Beyond the Algorithm

    Authors: Tobias Schnabel, Paul N. Bennett, Thorsten Joachims

    Abstract: Recommender systems rely heavily on the predictive accuracy of the learning algorithm. Most work on improving accuracy has focused on the learning algorithm itself. We argue that this algorithmic focus is myopic. In particular, since learning algorithms generally improve with more and better data, we propose shaping the feedback generation process as an alternate and complementary route to improvi… ▽ More

    Submitted 21 February, 2018; originally announced February 2018.

  36. Cadre Modeling: Simultaneously Discovering Subpopulations and Predictive Models

    Authors: Alexander New, Curt Breneman, Kristin P. Bennett

    Abstract: We consider the problem in regression analysis of identifying subpopulations that exhibit different patterns of response, where each subpopulation requires a different underlying model. Unlike statistical cohorts, these subpopulations are not known a priori; thus, we refer to them as cadres. When the cadres and their associated models are interpretable, modeling leads to insights about the subpopu… ▽ More

    Submitted 23 October, 2018; v1 submitted 7 February, 2018; originally announced February 2018.

    Comments: 8 pages, 6 figures

    Journal ref: In 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 2018

  37. arXiv:1510.07545  [pdf, other

    cs.HC cs.IR cs.LG

    Using Shortlists to Support Decision Making and Improve Recommender System Performance

    Authors: Tobias Schnabel, Paul N. Bennett, Susan T. Dumais, Thorsten Joachims

    Abstract: In this paper, we study shortlists as an interface component for recommender systems with the dual goal of supporting the user's decision process, as well as improving implicit feedback elicitation for increased recommendation quality. A shortlist is a temporary list of candidates that the user is currently considering, e.g., a list of a few movies the user is currently considering for viewing. Fr… ▽ More

    Submitted 8 February, 2016; v1 submitted 26 October, 2015; originally announced October 2015.

    Comments: 11 pages in WWW 2016

  38. arXiv:1503.01613  [pdf, other

    cs.CC math.CO math.PR

    Space proof complexity for random 3-CNFs

    Authors: Patrick Bennett, Ilario Bonacina, Nicola Galesi, Tony Huynh, Mike Molloy, Paul Wollan

    Abstract: We investigate the space complexity of refuting $3$-CNFs in Resolution and algebraic systems. We prove that every Polynomial Calculus with Resolution refutation of a random $3$-CNF $φ$ in $n$ variables requires, with high probability, $Ω(n)$ distinct monomials to be kept simultaneously in memory. The same construction also proves that every Resolution refutation $φ$ requires, with high probability… ▽ More

    Submitted 2 April, 2015; v1 submitted 5 March, 2015; originally announced March 2015.

    Comments: arXiv admin note: substantial text overlap with arXiv:1411.1619

  39. arXiv:1405.1486  [pdf, other

    cs.IR cs.SI physics.soc-ph

    Events and Controversies: Influences of a Shocking News Event on Information Seeking

    Authors: Danai Koutra, Paul Bennett, Eric Horvitz

    Abstract: It has been suggested that online search and retrieval contributes to the intellectual isolation of users within their preexisting ideologies, where people's prior views are strengthened and alternative viewpoints are infrequently encountered. This so-called "filter bubble" phenomenon has been called out as especially detrimental when it comes to dialog among people on controversial, emotionally c… ▽ More

    Submitted 6 May, 2014; originally announced May 2014.

  40. arXiv:1209.6570  [pdf, ps, other

    math.CO cs.DM

    A greedy algorithm for finding a large 2-matching on a random cubic graph

    Authors: Deepak Bal, Patrick Bennett, Tom Bohman, Alan Frieze

    Abstract: A 2-matching of a graph $G$ is a spanning subgraph with maximum degree two. The size of a 2-matching $U$ is the number of edges in $U$ and this is at least $n-\k(U)$ where $n$ is the number of vertices of $G$ and $\k$ denotes the number of components. In this paper, we analyze the performance of a greedy algorithm \textsc{2greedy} for finding a large 2-matching on a random 3-regular graph. We prov… ▽ More

    Submitted 28 September, 2012; originally announced September 2012.

    Comments: 23pp

  41. arXiv:1108.5397  [pdf, ps, other

    stat.ML cs.LG q-bio.QM

    Prediction of peptide bonding affinity: kernel methods for nonlinear modeling

    Authors: Charles Bergeron, Theresa Hepburn, C. Matthew Sundling, Michael Krein, Bill Katt, Nagamani Sukumar, Curt M. Breneman, Kristin P. Bennett

    Abstract: This paper presents regression models obtained from a process of blind prediction of peptide binding affinity from provided descriptors for several distinct datasets as part of the 2006 Comparative Evaluation of Prediction Algorithms (COEPRA) contest. This paper finds that kernel partial least squares, a nonlinear partial least squares (PLS) algorithm, outperforms PLS, and that the incorporation o… ▽ More

    Submitted 26 August, 2011; originally announced August 2011.