Search | arXiv e-print repository

Non-detectable patterns hidden within sequences of bits

Authors: David Allen, Jose J La Luz, Guarionex Salivia, Jonathan Hardwick

Abstract: In this paper we construct families of bit sequences using combinatorial methods. Each sequence is derived by con- verting a collection of numbers encoding certain combinatorial nu- merics from objects exhibiting symmetry in various dimensions. Using the algorithms first described in [1] we show that the NIST testing suite described in publication 800-22 does not detect these symmetries hidden wit… ▽ More In this paper we construct families of bit sequences using combinatorial methods. Each sequence is derived by con- verting a collection of numbers encoding certain combinatorial nu- merics from objects exhibiting symmetry in various dimensions. Using the algorithms first described in [1] we show that the NIST testing suite described in publication 800-22 does not detect these symmetries hidden within these sequences. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2311.12987 [pdf]

Volatility and irregularity Capturing in stock price indices using time series Generative adversarial networks (TimeGAN)

Authors: Leonard Mushunje, David Allen, Shelton Peiris

Abstract: This paper captures irregularities in financial time series data, particularly stock prices, in the presence of COVID-19 shock. We conjectured that jumps and irregularities are embedded in stock data due to the pandemic shock, which brings forth irregular trends in the time series data. We put forward that efficient and robust forecasting methods are needed to predict stock closing prices in the p… ▽ More This paper captures irregularities in financial time series data, particularly stock prices, in the presence of COVID-19 shock. We conjectured that jumps and irregularities are embedded in stock data due to the pandemic shock, which brings forth irregular trends in the time series data. We put forward that efficient and robust forecasting methods are needed to predict stock closing prices in the presence of the pandemic shock. This piece of information is helpful to investors as far as confidence risk and return boost are concerned. Generative adversarial networks of a time series nature are used to provide new ways of modeling and learning the proper and suitable distribution for the financial time series data under complex setups. Ideally, these traditional models are liable to producing high forecasting errors, and they need to be more robust to capture dependency structures and other stylized facts like volatility in stock markets. The TimeGAN model is used, effectively dealing with this risk of poor forecasts. Using the DAX stock index from January 2010 to November 2022, we trained the LSTM, GRU, WGAN, and TimeGAN models as benchmarks and forecasting errors were noted, and our TimeGAN outperformed them all as indicated by a small forecasting error. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: 36 pages

arXiv:2305.07961 [pdf, other]

Leveraging Large Language Models in Conversational Recommender Systems

Authors: Luke Friedman, Sameer Ahuja, David Allen, Zhenning Tan, Hakim Sidahmed, Changbo Long, Jun Xie, Gabriel Schubiner, Ajay Patel, Harsh Lara, Brian Chu, Zexi Chen, Manoj Tiwari

Abstract: A Conversational Recommender System (CRS) offers increased transparency and control to users by enabling them to engage with the system through a real-time multi-turn dialogue. Recently, Large Language Models (LLMs) have exhibited an unprecedented ability to converse naturally and incorporate world knowledge and common-sense reasoning into language understanding, unlocking the potential of this pa… ▽ More A Conversational Recommender System (CRS) offers increased transparency and control to users by enabling them to engage with the system through a real-time multi-turn dialogue. Recently, Large Language Models (LLMs) have exhibited an unprecedented ability to converse naturally and incorporate world knowledge and common-sense reasoning into language understanding, unlocking the potential of this paradigm. However, effectively leveraging LLMs within a CRS introduces new technical challenges, including properly understanding and controlling a complex conversation and retrieving from external sources of information. These issues are exacerbated by a large, evolving item corpus and a lack of conversational data for training. In this paper, we provide a roadmap for building an end-to-end large-scale CRS using LLMs. In particular, we propose new implementations for user preference understanding, flexible dialogue management and explainable recommendations as part of an integrated architecture powered by LLMs. For improved personalization, we describe how an LLM can consume interpretable natural language user profiles and use them to modulate session-level context. To overcome conversational data limitations in the absence of an existing production CRS, we propose techniques for building a controllable LLM-based user simulator to generate synthetic conversations. As a proof of concept we introduce RecLLM, a large-scale CRS for YouTube videos built on LaMDA, and demonstrate its fluency and diverse functionality through some illustrative example conversations. △ Less

Submitted 16 May, 2023; v1 submitted 13 May, 2023; originally announced May 2023.

arXiv:2301.02130 [pdf]

A deep learning approach to using wearable seismocardiography (SCG) for diagnosing aortic valve stenosis and predicting aortic hemodynamics obtained by 4D flow MRI

Authors: Mahmoud E. Khani, Ethan M. I. Johnson, Aparna Sodhi, Joshua Robinson, Cynthia K. Rigsby, Bradly D. Allen, Michael Markl

Abstract: In this paper, we explored the use of deep learning for the prediction of aortic flow metrics obtained using 4D flow MRI using wearable seismocardiography (SCG) devices. 4D flow MRI provides a comprehensive assessment of cardiovascular hemodynamics, but it is costly and time-consuming. We hypothesized that deep learning could be used to identify pathological changes in blood flow, such as elevated… ▽ More In this paper, we explored the use of deep learning for the prediction of aortic flow metrics obtained using 4D flow MRI using wearable seismocardiography (SCG) devices. 4D flow MRI provides a comprehensive assessment of cardiovascular hemodynamics, but it is costly and time-consuming. We hypothesized that deep learning could be used to identify pathological changes in blood flow, such as elevated peak systolic velocity Vmax in patients with heart valve diseases, from SCG signals. We also investigated the ability of this deep learning technique to differentiate between patients diagnosed with aortic valve stenosis (AS), non-AS patients with a bicuspid aortic valve (BAV), non-AS patients with a mechanical aortic valve (MAV), and healthy subjects with a normal tricuspid aortic valve (TAV). In a study of 77 subjects who underwent same-day 4D flow MRI and SCG, we found that the Vmax values obtained using deep learning and SCGs were in good agreement with those obtained by 4D flow MRI. Additionally, subjects with TAV, BAV, MAV, and AS could be classified with ROC-AUC values of 92%, 95%, 81%, and 83%, respectively. This suggests that SCG obtained using low-cost wearable electronics may be used as a supplement to 4D flow MRI exams or as a screening tool for aortic valve disease. △ Less

Submitted 5 January, 2023; originally announced January 2023.

Comments: 16 pages, 4 figures

arXiv:2101.00441 [pdf, other]

doi 10.1016/j.orl.2011.10.008

A space-indexed formulation of packing boxes into a larger box

Authors: Sam D. Allen, Edmund K. Burke, Jakub Marecek

Abstract: Current integer programming solvers fail to decide whether 12 unit cubes can be packed into a 1x1x11 box within an hour using the natural relaxation of Chen/Padberg. We present an alternative relaxation of the problem of packing boxes into a larger box, which makes it possible to solve much larger instances. Current integer programming solvers fail to decide whether 12 unit cubes can be packed into a 1x1x11 box within an hour using the natural relaxation of Chen/Padberg. We present an alternative relaxation of the problem of packing boxes into a larger box, which makes it possible to solve much larger instances. △ Less

Submitted 2 January, 2021; originally announced January 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:1412.2526

Journal ref: Operations Research Letters, Volume 40, Issue 1, January 2012, Pages 20-24

arXiv:1908.10541 [pdf, other]

Search and Rescue under the Forest Canopy using Multiple UAVs

Authors: Yulun Tian, Katherine Liu, Kyel Ok, Loc Tran, Danette Allen, Nicholas Roy, Jonathan P. How

Abstract: We present a multi-robot system for GPS-denied search and rescue under the forest canopy. Forests are particularly challenging environments for collaborative exploration and mapping, in large part due to the existence of severe perceptual aliasing which hinders reliable loop closure detection for mutual localization and map fusion. Our proposed system features unmanned aerial vehicles (UAVs) that… ▽ More We present a multi-robot system for GPS-denied search and rescue under the forest canopy. Forests are particularly challenging environments for collaborative exploration and mapping, in large part due to the existence of severe perceptual aliasing which hinders reliable loop closure detection for mutual localization and map fusion. Our proposed system features unmanned aerial vehicles (UAVs) that perform onboard sensing, estimation, and planning. When communication is available, each UAV transmits compressed tree-based submaps to a central ground station for collaborative simultaneous localization and mapping (CSLAM). To overcome high measurement noise and perceptual aliasing, we use the local configuration of a group of trees as a distinctive feature for robust loop closure detection. Furthermore, we propose a novel procedure based on cycle consistent multiway matching to recover from incorrect pairwise data associations. The returned global data association is guaranteed to be cycle consistent, and is shown to improve both precision and recall compared to the input pairwise associations. The proposed multi-UAV system is validated both in simulation and during real-world collaborative exploration missions at NASA Langley Research Center. △ Less

Submitted 7 June, 2020; v1 submitted 28 August, 2019; originally announced August 2019.

Comments: IJRR revision

arXiv:1404.7414 [pdf, ps, other]

doi 10.5334/jors.an

Summary of the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1)

Authors: Daniel S. Katz, Sou-Cheng T. Choi, Hilmar Lapp, Ketan Maheshwari, Frank Löffler, Matthew Turk, Marcus D. Hanwell, Nancy Wilkins-Diehr, James Hetherington, James Howison, Shel Swenson, Gabrielle D. Allen, Anne C. Elster, Bruce Berriman, Colin Venters

Abstract: Challenges related to development, deployment, and maintenance of reusable software for science are becoming a growing concern. Many scientists' research increasingly depends on the quality and availability of software upon which their works are built. To highlight some of these issues and share experiences, the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1)… ▽ More Challenges related to development, deployment, and maintenance of reusable software for science are becoming a growing concern. Many scientists' research increasingly depends on the quality and availability of software upon which their works are built. To highlight some of these issues and share experiences, the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1) was held in November 2013 in conjunction with the SC13 Conference. The workshop featured keynote presentations and a large number (54) of solicited extended abstracts that were grouped into three themes and presented via panels. A set of collaborative notes of the presentations and discussion was taken during the workshop. Unique perspectives were captured about issues such as comprehensive documentation, development and deployment practices, software licenses and career paths for developers. Attribution systems that account for evidence of software contribution and impact were also discussed. These include mechanisms such as Digital Object Identifiers, publication of "software papers", and the use of online systems, for example source code repositories like GitHub. This paper summarizes the issues and shared experiences that were discussed, including cross-cutting issues and use cases. It joins a nascent literature seeking to understand what drives software work in science, and how it is impacted by the reward systems of science. These incentives can determine the extent to which developers are motivated to build software for the long-term, for the use of others, and whether to work collaboratively or separately. It also explores community building, leadership, and dynamics in relation to successful scientific software. △ Less

Submitted 12 June, 2014; v1 submitted 29 April, 2014; originally announced April 2014.

Comments: Journal of Open Research Software, 2014

arXiv:1404.7152 [pdf, other]

doi 10.1109/BigData.2014.7004256

Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization

Authors: Ryan Compton, David Jurgens, David Allen

Abstract: Geographically annotated social media is extremely valuable for modern information retrieval. However, when researchers can only access publicly-visible data, one quickly finds that social media users rarely publish location information. In this work, we provide a method which can geolocate the overwhelming majority of active Twitter users, independent of their location sharing preferences, using… ▽ More Geographically annotated social media is extremely valuable for modern information retrieval. However, when researchers can only access publicly-visible data, one quickly finds that social media users rarely publish location information. In this work, we provide a method which can geolocate the overwhelming majority of active Twitter users, independent of their location sharing preferences, using only publicly-visible Twitter data. Our method infers an unknown user's location by examining their friend's locations. We frame the geotagging problem as an optimization over a social network with a total variation-based objective and provide a scalable and distributed algorithm for its solution. Furthermore, we show how a robust estimate of the geographic dispersion of each user's ego network can be used as a per-user accuracy measure which is effective at removing outlying errors. Leave-many-out evaluation shows that our method is able to infer location for 101,846,236 Twitter users at a median error of 6.38 km, allowing us to geotag over 80\% of public tweets. △ Less

Submitted 3 March, 2015; v1 submitted 28 April, 2014; originally announced April 2014.

Comments: 9 pages, 8 figures, accepted to IEEE BigData 2014, Compton, Ryan, David Jurgens, and David Allen. "Geotagging one hundred million twitter accounts with total variation minimization." Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 2014

MSC Class: 68T99 ACM Class: G.1.6; H.2.8; H.3.4

arXiv:1212.2455 [pdf]

New Advances in Inference by Recursive Conditioning

Authors: David Allen, Adnan Darwiche

Abstract: Recursive Conditioning (RC) was introduced recently as the first any-space algorithm for inference in Bayesian networks which can trade time for space by varying the size of its cache at the increment needed to store a floating point number. Under full caching, RC has an asymptotic time and space complexity which is comparable to mainstream algorithms based on variable elimination… ▽ More Recursive Conditioning (RC) was introduced recently as the first any-space algorithm for inference in Bayesian networks which can trade time for space by varying the size of its cache at the increment needed to store a floating point number. Under full caching, RC has an asymptotic time and space complexity which is comparable to mainstream algorithms based on variable elimination and clustering (exponential in the network treewidth and linear in its size). We show two main results about RC in this paper. First, we show that its actual space requirements under full caching are much more modest than those needed by mainstream methods and study the implications of this finding. Second, we show that RC can effectively deal with determinism in Bayesian networks by employing standard logical techniques, such as unit resolution, allowing a significant reduction in its time requirements in certain cases. We illustrate our results using a number of benchmark networks, including the very challenging ones that arise in genetic linkage analysis. △ Less

Submitted 19 October, 2012; originally announced December 2012.

Comments: Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003)

Report number: UAI-P-2003-PG-2-10

arXiv:1207.1372 [pdf]

Exploiting Evidence in Probabilistic Inference

Authors: Mark Chavira, David Allen, Adnan Darwiche

Abstract: We define the notion of compiling a Bayesian network with evidence and provide a specific approach for evidence-based compilation, which makes use of logical processing. The approach is practical and advantageous in a number of application areas-including maximum likelihood estimation, sensitivity analysis, and MAP computations-and we provide specific empirical results in the domain of genetic lin… ▽ More We define the notion of compiling a Bayesian network with evidence and provide a specific approach for evidence-based compilation, which makes use of logical processing. The approach is practical and advantageous in a number of application areas-including maximum likelihood estimation, sensitivity analysis, and MAP computations-and we provide specific empirical results in the domain of genetic linkage analysis. We also show that the approach is applicable for networks that do not contain determinism, and show that it empirically subsumes the performance of the quickscore algorithm when applied to noisy-or networks. △ Less

Submitted 4 July, 2012; originally announced July 2012.

Comments: Appears in Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005)

Report number: UAI-P-2005-PG-112-119

arXiv:1106.3508 [pdf]

Surrogate Parenthood: Protected and Informative Graphs

Authors: Barbara Blaustein, Adriane Chapman, Len Seligman, M. David Allen, Arnon Rosenthal

Abstract: Many applications, including provenance and some analyses of social networks, require path-based queries over graph-structured data. When these graphs contain sensitive information, paths may be broken, resulting in uninformative query results. This paper presents innovative techniques that give users more informative graph query results; the techniques leverage a common industry practice of provi… ▽ More Many applications, including provenance and some analyses of social networks, require path-based queries over graph-structured data. When these graphs contain sensitive information, paths may be broken, resulting in uninformative query results. This paper presents innovative techniques that give users more informative graph query results; the techniques leverage a common industry practice of providing what we call surrogates: alternate, less sensitive versions of nodes and edges releasable to a broader community. We describe techniques for interposing surrogate nodes and edges to protect sensitive graph components, while maximizing graph connectivity and giving users as much information as possible. In this work, we formalize the problem of creating a protected account G' of a graph G. We provide a utility measure to compare the informativeness of alternate protected accounts and an opacity measure for protected accounts, which indicates the likelihood that an attacker can recreate the topology of the original graph from the protected account. We provide an algorithm to create a maximally useful protected account of a sensitive graph, and show through evaluation with the PLUS prototype that using surrogates and protected accounts adds value for the user, with no significant impact on the time required to generate results for graph queries. △ Less

Submitted 17 June, 2011; originally announced June 2011.

Comments: VLDB2011

arXiv:0909.1771 [pdf]

The Role of Schema Matching in Large Enterprises

Authors: Ken Smith, Michael Morse, Peter Mork, Maya Li, Arnon Rosenthal, David Allen, Len Seligman, Chris Wolf

Abstract: To date, the principal use case for schema matching research has been as a precursor for code generation, i.e., constructing mappings between schema elements with the end goal of data transfer. In this paper, we argue that schema matching plays valuable roles independent of mapping construction, especially as schemata grow to industrial scales. Specifically, in large enterprises human decision m… ▽ More To date, the principal use case for schema matching research has been as a precursor for code generation, i.e., constructing mappings between schema elements with the end goal of data transfer. In this paper, we argue that schema matching plays valuable roles independent of mapping construction, especially as schemata grow to industrial scales. Specifically, in large enterprises human decision makers and planners are often the immediate consumer of information derived from schema matchers, instead of schema mapping tools. We list a set of real application areas illustrating this role for schema matching, and then present our experiences tackling a customer problem in one of these areas. We describe the matcher used, where the tool was effective, where it fell short, and our lessons learned about how well current schema matching technology is suited for use in large enterprises. Finally, we suggest a new agenda for schema matching research based on these experiences. △ Less

Submitted 9 September, 2009; originally announced September 2009.

Comments: CIDR 2009

arXiv:0905.1594 [pdf, other]

A Recommender System to Support the Scholarly Communication Process

Authors: Marko A. Rodriguez, David W. Allen, Joshua Shinavier, Gary Ebersole

Abstract: The number of researchers, articles, journals, conferences, funding opportunities, and other such scholarly resources continues to grow every year and at an increasing rate. Many services have emerged to support scholars in navigating particular aspects of this resource-rich environment. Some commercial publishers provide recommender and alert services for the articles and journals in their digi… ▽ More The number of researchers, articles, journals, conferences, funding opportunities, and other such scholarly resources continues to grow every year and at an increasing rate. Many services have emerged to support scholars in navigating particular aspects of this resource-rich environment. Some commercial publishers provide recommender and alert services for the articles and journals in their digital libraries. Similarly, numerous noncommercial social bookmarking services have emerged for citation sharing. While these services do provide some support, they lack an understanding of the various problem-solving scenarios that researchers face daily. Example scenarios, to name a few, include when a scholar is in search of an article related to another article of interest, when a scholar is in search of a potential collaborator for a funding opportunity, when a scholar is in search of an optimal venue to which to submit their article, and when a scholar, in the role of an editor, is in search of referees to review an article. All of these example scenarios can be represented as a problem in information filtering by means of context-sensitive recommendation. This article presents an overview of a context-sensitive recommender system to support the scholarly communication process that is based on the standards and technology set forth by the Semantic Web initiative. △ Less

Submitted 11 May, 2009; originally announced May 2009.

Report number: KRS-2009-02 ACM Class: H.3.5; H.3.7; G.2.2

arXiv:0805.0866 [pdf]

Processing and Characterization of Precision Microparts from Nickel-based Materials

Authors: D. Allen, H. J. Almond, K. Bedner, M. Cabezza, B. Courtot, A. Duval, S. A. Impey, M. Saumer

Abstract: The objective of this research was to study the influence of electroplating parameters on electrodeposit characteristics for the production of nickel (Ni) and nickel-iron (Ni-Fe) microparts by photoelectroforming. The research focused on the most relevant parameter for industry, which is the current density, because it determines the process time and the consumed energy. The results of the Ni an… ▽ More The objective of this research was to study the influence of electroplating parameters on electrodeposit characteristics for the production of nickel (Ni) and nickel-iron (Ni-Fe) microparts by photoelectroforming. The research focused on the most relevant parameter for industry, which is the current density, because it determines the process time and the consumed energy. The results of the Ni and Ni-Fe characterisations can be divided into two aspects closely linked with each other ; the morphology and the hardness. △ Less

Submitted 7 May, 2008; originally announced May 2008.

Comments: Submitted on behalf of EDA Publishing Association (http://irevues.inist.fr/handle/2042/16838)

Journal ref: Dans Symposium on Design, Test, Integration and Packaging of MEMS/MOEMS - DTIP 2008, Nice : France (2008)

arXiv:0711.3302 [pdf]

The Effects of Additives on the Physical Properties of Electroformed Nickel and on the Stretch of Photoelectroformed Nickel Components

Authors: D. Allen, N. Duclos, I. Garbutt, M. Saumer, Ch. Dhum, M. Schmitt, J. E. Hoffmann

Abstract: The process of nickel electroforming is becoming increasingly important in the manufacture of MST products, as it has the potential to replicate complex geometries with extremely high fidelity. Electroforming of nickel uses multi-component electrolyte formulations in order to maximise desirable product properties. In addition to nickel sulphamate (the major electrolyte component), formulation ad… ▽ More The process of nickel electroforming is becoming increasingly important in the manufacture of MST products, as it has the potential to replicate complex geometries with extremely high fidelity. Electroforming of nickel uses multi-component electrolyte formulations in order to maximise desirable product properties. In addition to nickel sulphamate (the major electrolyte component), formulation additives can also comprise nickel chloride (to increase nickel anode dissolution), sulphamic acid (to control pH), boric acid (to act as a pH buffer), hardening/levelling agents (to increase deposit hardness and lustre) and wetting agents (to aid surface wetting and thus prevent gas bubbles and void formation). This paper investigates the effects of some of these variables on internal stress and stretch as a function of applied current density. △ Less

Submitted 21 November, 2007; originally announced November 2007.

Comments: Submitted on behalf of TIMA Editions (http://irevues.inist.fr/tima-editions)

Journal ref: Dans Symposium on Design, Test, Integration and Packaging of MEMS/MOEMS - DTIP 2006, Stresa, Lago Maggiore : Italie (2006)

arXiv:cs/0109085 [pdf]

Policy for access: Framing the question

Authors: David Allen

Abstract: Five years after the '96 Telecommunications Act, we still find precious little local facilities-based competition. In response there are calls in Congress and even from the FCC for new legislation to "free the Bells." However, the same ideology drove policy, not just five years ago, but also almost twenty years back with the first modern push for "freedom," namely divestiture. How might we fra… ▽ More Five years after the '96 Telecommunications Act, we still find precious little local facilities-based competition. In response there are calls in Congress and even from the FCC for new legislation to "free the Bells." However, the same ideology drove policy, not just five years ago, but also almost twenty years back with the first modern push for "freedom," namely divestiture. How might we frame the question of policy for local access to engender a more fruitful approach? The starting point for this analysis is the network--not bits and bytes, but the human network. With the human network as starting point, the unit of analysis is the community--specifically, the individual in a tension with community. There are two core ideas. The first takes a behavioral approach to the economics--and the relative share between beneficial chaos and order, in economic affairs, becomes explicit. If the first main idea provides a conceptual base for open source, the second core idea distinguishes open source from open design, ie at the information 'frontier' we push forward. The resulting policy frame for access is worked out in the detailed, concrete steps of an extended thought experiment. A small town setting (Concord, Massachusetts) grounds the discussion in the real world. The purpose overall is to stimulate new thinking which may break out of the conundrum where periodic rounds to legislate 'freedom' produce the opposite, recursively. The ultimate aim is better fit between our analytically-driven expectations and economic outcomes. △ Less

Submitted 20 October, 2001; v1 submitted 24 September, 2001; originally announced September 2001.

Comments: 13 pages; abstract expanded

Report number: TPRC-2001-008 ACM Class: K.4.m Miscellaneous

Showing 1–16 of 16 results for author: Allen, D