-
Non-detectable patterns hidden within sequences of bits
Authors:
David Allen,
Jose J La Luz,
Guarionex Salivia,
Jonathan Hardwick
Abstract:
In this paper we construct families of bit sequences using combinatorial methods. Each sequence is derived by con- verting a collection of numbers encoding certain combinatorial nu- merics from objects exhibiting symmetry in various dimensions. Using the algorithms first described in [1] we show that the NIST testing suite described in publication 800-22 does not detect these symmetries hidden wit…
▽ More
In this paper we construct families of bit sequences using combinatorial methods. Each sequence is derived by con- verting a collection of numbers encoding certain combinatorial nu- merics from objects exhibiting symmetry in various dimensions. Using the algorithms first described in [1] we show that the NIST testing suite described in publication 800-22 does not detect these symmetries hidden within these sequences.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Volatility and irregularity Capturing in stock price indices using time series Generative adversarial networks (TimeGAN)
Authors:
Leonard Mushunje,
David Allen,
Shelton Peiris
Abstract:
This paper captures irregularities in financial time series data, particularly stock prices, in the presence of COVID-19 shock. We conjectured that jumps and irregularities are embedded in stock data due to the pandemic shock, which brings forth irregular trends in the time series data. We put forward that efficient and robust forecasting methods are needed to predict stock closing prices in the p…
▽ More
This paper captures irregularities in financial time series data, particularly stock prices, in the presence of COVID-19 shock. We conjectured that jumps and irregularities are embedded in stock data due to the pandemic shock, which brings forth irregular trends in the time series data. We put forward that efficient and robust forecasting methods are needed to predict stock closing prices in the presence of the pandemic shock. This piece of information is helpful to investors as far as confidence risk and return boost are concerned. Generative adversarial networks of a time series nature are used to provide new ways of modeling and learning the proper and suitable distribution for the financial time series data under complex setups. Ideally, these traditional models are liable to producing high forecasting errors, and they need to be more robust to capture dependency structures and other stylized facts like volatility in stock markets. The TimeGAN model is used, effectively dealing with this risk of poor forecasts. Using the DAX stock index from January 2010 to November 2022, we trained the LSTM, GRU, WGAN, and TimeGAN models as benchmarks and forecasting errors were noted, and our TimeGAN outperformed them all as indicated by a small forecasting error.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Leveraging Large Language Models in Conversational Recommender Systems
Authors:
Luke Friedman,
Sameer Ahuja,
David Allen,
Zhenning Tan,
Hakim Sidahmed,
Changbo Long,
Jun Xie,
Gabriel Schubiner,
Ajay Patel,
Harsh Lara,
Brian Chu,
Zexi Chen,
Manoj Tiwari
Abstract:
A Conversational Recommender System (CRS) offers increased transparency and control to users by enabling them to engage with the system through a real-time multi-turn dialogue. Recently, Large Language Models (LLMs) have exhibited an unprecedented ability to converse naturally and incorporate world knowledge and common-sense reasoning into language understanding, unlocking the potential of this pa…
▽ More
A Conversational Recommender System (CRS) offers increased transparency and control to users by enabling them to engage with the system through a real-time multi-turn dialogue. Recently, Large Language Models (LLMs) have exhibited an unprecedented ability to converse naturally and incorporate world knowledge and common-sense reasoning into language understanding, unlocking the potential of this paradigm. However, effectively leveraging LLMs within a CRS introduces new technical challenges, including properly understanding and controlling a complex conversation and retrieving from external sources of information. These issues are exacerbated by a large, evolving item corpus and a lack of conversational data for training. In this paper, we provide a roadmap for building an end-to-end large-scale CRS using LLMs. In particular, we propose new implementations for user preference understanding, flexible dialogue management and explainable recommendations as part of an integrated architecture powered by LLMs. For improved personalization, we describe how an LLM can consume interpretable natural language user profiles and use them to modulate session-level context. To overcome conversational data limitations in the absence of an existing production CRS, we propose techniques for building a controllable LLM-based user simulator to generate synthetic conversations. As a proof of concept we introduce RecLLM, a large-scale CRS for YouTube videos built on LaMDA, and demonstrate its fluency and diverse functionality through some illustrative example conversations.
△ Less
Submitted 16 May, 2023; v1 submitted 13 May, 2023;
originally announced May 2023.
-
A deep learning approach to using wearable seismocardiography (SCG) for diagnosing aortic valve stenosis and predicting aortic hemodynamics obtained by 4D flow MRI
Authors:
Mahmoud E. Khani,
Ethan M. I. Johnson,
Aparna Sodhi,
Joshua Robinson,
Cynthia K. Rigsby,
Bradly D. Allen,
Michael Markl
Abstract:
In this paper, we explored the use of deep learning for the prediction of aortic flow metrics obtained using 4D flow MRI using wearable seismocardiography (SCG) devices. 4D flow MRI provides a comprehensive assessment of cardiovascular hemodynamics, but it is costly and time-consuming. We hypothesized that deep learning could be used to identify pathological changes in blood flow, such as elevated…
▽ More
In this paper, we explored the use of deep learning for the prediction of aortic flow metrics obtained using 4D flow MRI using wearable seismocardiography (SCG) devices. 4D flow MRI provides a comprehensive assessment of cardiovascular hemodynamics, but it is costly and time-consuming. We hypothesized that deep learning could be used to identify pathological changes in blood flow, such as elevated peak systolic velocity Vmax in patients with heart valve diseases, from SCG signals. We also investigated the ability of this deep learning technique to differentiate between patients diagnosed with aortic valve stenosis (AS), non-AS patients with a bicuspid aortic valve (BAV), non-AS patients with a mechanical aortic valve (MAV), and healthy subjects with a normal tricuspid aortic valve (TAV). In a study of 77 subjects who underwent same-day 4D flow MRI and SCG, we found that the Vmax values obtained using deep learning and SCGs were in good agreement with those obtained by 4D flow MRI. Additionally, subjects with TAV, BAV, MAV, and AS could be classified with ROC-AUC values of 92%, 95%, 81%, and 83%, respectively. This suggests that SCG obtained using low-cost wearable electronics may be used as a supplement to 4D flow MRI exams or as a screening tool for aortic valve disease.
△ Less
Submitted 5 January, 2023;
originally announced January 2023.
-
A space-indexed formulation of packing boxes into a larger box
Authors:
Sam D. Allen,
Edmund K. Burke,
Jakub Marecek
Abstract:
Current integer programming solvers fail to decide whether 12 unit cubes can be packed into a 1x1x11 box within an hour using the natural relaxation of Chen/Padberg. We present an alternative relaxation of the problem of packing boxes into a larger box, which makes it possible to solve much larger instances.
Current integer programming solvers fail to decide whether 12 unit cubes can be packed into a 1x1x11 box within an hour using the natural relaxation of Chen/Padberg. We present an alternative relaxation of the problem of packing boxes into a larger box, which makes it possible to solve much larger instances.
△ Less
Submitted 2 January, 2021;
originally announced January 2021.
-
Search and Rescue under the Forest Canopy using Multiple UAVs
Authors:
Yulun Tian,
Katherine Liu,
Kyel Ok,
Loc Tran,
Danette Allen,
Nicholas Roy,
Jonathan P. How
Abstract:
We present a multi-robot system for GPS-denied search and rescue under the forest canopy. Forests are particularly challenging environments for collaborative exploration and mapping, in large part due to the existence of severe perceptual aliasing which hinders reliable loop closure detection for mutual localization and map fusion. Our proposed system features unmanned aerial vehicles (UAVs) that…
▽ More
We present a multi-robot system for GPS-denied search and rescue under the forest canopy. Forests are particularly challenging environments for collaborative exploration and mapping, in large part due to the existence of severe perceptual aliasing which hinders reliable loop closure detection for mutual localization and map fusion. Our proposed system features unmanned aerial vehicles (UAVs) that perform onboard sensing, estimation, and planning. When communication is available, each UAV transmits compressed tree-based submaps to a central ground station for collaborative simultaneous localization and mapping (CSLAM). To overcome high measurement noise and perceptual aliasing, we use the local configuration of a group of trees as a distinctive feature for robust loop closure detection. Furthermore, we propose a novel procedure based on cycle consistent multiway matching to recover from incorrect pairwise data associations. The returned global data association is guaranteed to be cycle consistent, and is shown to improve both precision and recall compared to the input pairwise associations. The proposed multi-UAV system is validated both in simulation and during real-world collaborative exploration missions at NASA Langley Research Center.
△ Less
Submitted 7 June, 2020; v1 submitted 28 August, 2019;
originally announced August 2019.
-
Summary of the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1)
Authors:
Daniel S. Katz,
Sou-Cheng T. Choi,
Hilmar Lapp,
Ketan Maheshwari,
Frank Löffler,
Matthew Turk,
Marcus D. Hanwell,
Nancy Wilkins-Diehr,
James Hetherington,
James Howison,
Shel Swenson,
Gabrielle D. Allen,
Anne C. Elster,
Bruce Berriman,
Colin Venters
Abstract:
Challenges related to development, deployment, and maintenance of reusable software for science are becoming a growing concern. Many scientists' research increasingly depends on the quality and availability of software upon which their works are built. To highlight some of these issues and share experiences, the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1)…
▽ More
Challenges related to development, deployment, and maintenance of reusable software for science are becoming a growing concern. Many scientists' research increasingly depends on the quality and availability of software upon which their works are built. To highlight some of these issues and share experiences, the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1) was held in November 2013 in conjunction with the SC13 Conference. The workshop featured keynote presentations and a large number (54) of solicited extended abstracts that were grouped into three themes and presented via panels. A set of collaborative notes of the presentations and discussion was taken during the workshop.
Unique perspectives were captured about issues such as comprehensive documentation, development and deployment practices, software licenses and career paths for developers. Attribution systems that account for evidence of software contribution and impact were also discussed. These include mechanisms such as Digital Object Identifiers, publication of "software papers", and the use of online systems, for example source code repositories like GitHub.
This paper summarizes the issues and shared experiences that were discussed, including cross-cutting issues and use cases. It joins a nascent literature seeking to understand what drives software work in science, and how it is impacted by the reward systems of science. These incentives can determine the extent to which developers are motivated to build software for the long-term, for the use of others, and whether to work collaboratively or separately. It also explores community building, leadership, and dynamics in relation to successful scientific software.
△ Less
Submitted 12 June, 2014; v1 submitted 29 April, 2014;
originally announced April 2014.
-
Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization
Authors:
Ryan Compton,
David Jurgens,
David Allen
Abstract:
Geographically annotated social media is extremely valuable for modern information retrieval. However, when researchers can only access publicly-visible data, one quickly finds that social media users rarely publish location information. In this work, we provide a method which can geolocate the overwhelming majority of active Twitter users, independent of their location sharing preferences, using…
▽ More
Geographically annotated social media is extremely valuable for modern information retrieval. However, when researchers can only access publicly-visible data, one quickly finds that social media users rarely publish location information. In this work, we provide a method which can geolocate the overwhelming majority of active Twitter users, independent of their location sharing preferences, using only publicly-visible Twitter data.
Our method infers an unknown user's location by examining their friend's locations. We frame the geotagging problem as an optimization over a social network with a total variation-based objective and provide a scalable and distributed algorithm for its solution. Furthermore, we show how a robust estimate of the geographic dispersion of each user's ego network can be used as a per-user accuracy measure which is effective at removing outlying errors.
Leave-many-out evaluation shows that our method is able to infer location for 101,846,236 Twitter users at a median error of 6.38 km, allowing us to geotag over 80\% of public tweets.
△ Less
Submitted 3 March, 2015; v1 submitted 28 April, 2014;
originally announced April 2014.
-
New Advances in Inference by Recursive Conditioning
Authors:
David Allen,
Adnan Darwiche
Abstract:
Recursive Conditioning (RC) was introduced recently as the first any-space algorithm for inference in Bayesian networks which can trade time for space by varying the size of its cache at the increment needed to store a floating point number. Under full caching, RC has an asymptotic time and space complexity which is comparable to mainstream algorithms based on variable elimination…
▽ More
Recursive Conditioning (RC) was introduced recently as the first any-space algorithm for inference in Bayesian networks which can trade time for space by varying the size of its cache at the increment needed to store a floating point number. Under full caching, RC has an asymptotic time and space complexity which is comparable to mainstream algorithms based on variable elimination and clustering (exponential in the network treewidth and linear in its size). We show two main results about RC in this paper. First, we show that its actual space requirements under full caching are much more modest than those needed by mainstream methods and study the implications of this finding. Second, we show that RC can effectively deal with determinism in Bayesian networks by employing standard logical techniques, such as unit resolution, allowing a significant reduction in its time requirements in certain cases. We illustrate our results using a number of benchmark networks, including the very challenging ones that arise in genetic linkage analysis.
△ Less
Submitted 19 October, 2012;
originally announced December 2012.
-
Exploiting Evidence in Probabilistic Inference
Authors:
Mark Chavira,
David Allen,
Adnan Darwiche
Abstract:
We define the notion of compiling a Bayesian network with evidence and provide a specific approach for evidence-based compilation, which makes use of logical processing. The approach is practical and advantageous in a number of application areas-including maximum likelihood estimation, sensitivity analysis, and MAP computations-and we provide specific empirical results in the domain of genetic lin…
▽ More
We define the notion of compiling a Bayesian network with evidence and provide a specific approach for evidence-based compilation, which makes use of logical processing. The approach is practical and advantageous in a number of application areas-including maximum likelihood estimation, sensitivity analysis, and MAP computations-and we provide specific empirical results in the domain of genetic linkage analysis. We also show that the approach is applicable for networks that do not contain determinism, and show that it empirically subsumes the performance of the quickscore algorithm when applied to noisy-or networks.
△ Less
Submitted 4 July, 2012;
originally announced July 2012.
-
Surrogate Parenthood: Protected and Informative Graphs
Authors:
Barbara Blaustein,
Adriane Chapman,
Len Seligman,
M. David Allen,
Arnon Rosenthal
Abstract:
Many applications, including provenance and some analyses of social networks, require path-based queries over graph-structured data. When these graphs contain sensitive information, paths may be broken, resulting in uninformative query results. This paper presents innovative techniques that give users more informative graph query results; the techniques leverage a common industry practice of provi…
▽ More
Many applications, including provenance and some analyses of social networks, require path-based queries over graph-structured data. When these graphs contain sensitive information, paths may be broken, resulting in uninformative query results. This paper presents innovative techniques that give users more informative graph query results; the techniques leverage a common industry practice of providing what we call surrogates: alternate, less sensitive versions of nodes and edges releasable to a broader community. We describe techniques for interposing surrogate nodes and edges to protect sensitive graph components, while maximizing graph connectivity and giving users as much information as possible. In this work, we formalize the problem of creating a protected account G' of a graph G. We provide a utility measure to compare the informativeness of alternate protected accounts and an opacity measure for protected accounts, which indicates the likelihood that an attacker can recreate the topology of the original graph from the protected account. We provide an algorithm to create a maximally useful protected account of a sensitive graph, and show through evaluation with the PLUS prototype that using surrogates and protected accounts adds value for the user, with no significant impact on the time required to generate results for graph queries.
△ Less
Submitted 17 June, 2011;
originally announced June 2011.
-
The Role of Schema Matching in Large Enterprises
Authors:
Ken Smith,
Michael Morse,
Peter Mork,
Maya Li,
Arnon Rosenthal,
David Allen,
Len Seligman,
Chris Wolf
Abstract:
To date, the principal use case for schema matching research has been as a precursor for code generation, i.e., constructing mappings between schema elements with the end goal of data transfer. In this paper, we argue that schema matching plays valuable roles independent of mapping construction, especially as schemata grow to industrial scales. Specifically, in large enterprises human decision m…
▽ More
To date, the principal use case for schema matching research has been as a precursor for code generation, i.e., constructing mappings between schema elements with the end goal of data transfer. In this paper, we argue that schema matching plays valuable roles independent of mapping construction, especially as schemata grow to industrial scales. Specifically, in large enterprises human decision makers and planners are often the immediate consumer of information derived from schema matchers, instead of schema mapping tools. We list a set of real application areas illustrating this role for schema matching, and then present our experiences tackling a customer problem in one of these areas. We describe the matcher used, where the tool was effective, where it fell short, and our lessons learned about how well current schema matching technology is suited for use in large enterprises. Finally, we suggest a new agenda for schema matching research based on these experiences.
△ Less
Submitted 9 September, 2009;
originally announced September 2009.
-
A Recommender System to Support the Scholarly Communication Process
Authors:
Marko A. Rodriguez,
David W. Allen,
Joshua Shinavier,
Gary Ebersole
Abstract:
The number of researchers, articles, journals, conferences, funding opportunities, and other such scholarly resources continues to grow every year and at an increasing rate. Many services have emerged to support scholars in navigating particular aspects of this resource-rich environment. Some commercial publishers provide recommender and alert services for the articles and journals in their digi…
▽ More
The number of researchers, articles, journals, conferences, funding opportunities, and other such scholarly resources continues to grow every year and at an increasing rate. Many services have emerged to support scholars in navigating particular aspects of this resource-rich environment. Some commercial publishers provide recommender and alert services for the articles and journals in their digital libraries. Similarly, numerous noncommercial social bookmarking services have emerged for citation sharing. While these services do provide some support, they lack an understanding of the various problem-solving scenarios that researchers face daily. Example scenarios, to name a few, include when a scholar is in search of an article related to another article of interest, when a scholar is in search of a potential collaborator for a funding opportunity, when a scholar is in search of an optimal venue to which to submit their article, and when a scholar, in the role of an editor, is in search of referees to review an article. All of these example scenarios can be represented as a problem in information filtering by means of context-sensitive recommendation. This article presents an overview of a context-sensitive recommender system to support the scholarly communication process that is based on the standards and technology set forth by the Semantic Web initiative.
△ Less
Submitted 11 May, 2009;
originally announced May 2009.
-
Processing and Characterization of Precision Microparts from Nickel-based Materials
Authors:
D. Allen,
H. J. Almond,
K. Bedner,
M. Cabezza,
B. Courtot,
A. Duval,
S. A. Impey,
M. Saumer
Abstract:
The objective of this research was to study the influence of electroplating parameters on electrodeposit characteristics for the production of nickel (Ni) and nickel-iron (Ni-Fe) microparts by photoelectroforming. The research focused on the most relevant parameter for industry, which is the current density, because it determines the process time and the consumed energy. The results of the Ni an…
▽ More
The objective of this research was to study the influence of electroplating parameters on electrodeposit characteristics for the production of nickel (Ni) and nickel-iron (Ni-Fe) microparts by photoelectroforming. The research focused on the most relevant parameter for industry, which is the current density, because it determines the process time and the consumed energy. The results of the Ni and Ni-Fe characterisations can be divided into two aspects closely linked with each other ; the morphology and the hardness.
△ Less
Submitted 7 May, 2008;
originally announced May 2008.
-
The Effects of Additives on the Physical Properties of Electroformed Nickel and on the Stretch of Photoelectroformed Nickel Components
Authors:
D. Allen,
N. Duclos,
I. Garbutt,
M. Saumer,
Ch. Dhum,
M. Schmitt,
J. E. Hoffmann
Abstract:
The process of nickel electroforming is becoming increasingly important in the manufacture of MST products, as it has the potential to replicate complex geometries with extremely high fidelity. Electroforming of nickel uses multi-component electrolyte formulations in order to maximise desirable product properties. In addition to nickel sulphamate (the major electrolyte component), formulation ad…
▽ More
The process of nickel electroforming is becoming increasingly important in the manufacture of MST products, as it has the potential to replicate complex geometries with extremely high fidelity. Electroforming of nickel uses multi-component electrolyte formulations in order to maximise desirable product properties. In addition to nickel sulphamate (the major electrolyte component), formulation additives can also comprise nickel chloride (to increase nickel anode dissolution), sulphamic acid (to control pH), boric acid (to act as a pH buffer), hardening/levelling agents (to increase deposit hardness and lustre) and wetting agents (to aid surface wetting and thus prevent gas bubbles and void formation). This paper investigates the effects of some of these variables on internal stress and stretch as a function of applied current density.
△ Less
Submitted 21 November, 2007;
originally announced November 2007.
-
Policy for access: Framing the question
Authors:
David Allen
Abstract:
Five years after the '96 Telecommunications Act, we still find precious little local facilities-based competition. In response there are calls in Congress and even from the FCC for new legislation to "free the Bells." However, the same ideology drove policy, not just five years ago, but also almost twenty years back with the first modern push for "freedom," namely divestiture.
How might we fra…
▽ More
Five years after the '96 Telecommunications Act, we still find precious little local facilities-based competition. In response there are calls in Congress and even from the FCC for new legislation to "free the Bells." However, the same ideology drove policy, not just five years ago, but also almost twenty years back with the first modern push for "freedom," namely divestiture.
How might we frame the question of policy for local access to engender a more fruitful approach? The starting point for this analysis is the network--not bits and bytes, but the human network. With the human network as starting point, the unit of analysis is the community--specifically, the individual in a tension with community. There are two core ideas.
The first takes a behavioral approach to the economics--and the relative share between beneficial chaos and order, in economic affairs, becomes explicit.
If the first main idea provides a conceptual base for open source, the second core idea distinguishes open source from open design, ie at the information 'frontier' we push forward.
The resulting policy frame for access is worked out in the detailed, concrete steps of an extended thought experiment. A small town setting (Concord, Massachusetts) grounds the discussion in the real world. The purpose overall is to stimulate new thinking which may break out of the conundrum where periodic rounds to legislate 'freedom' produce the opposite, recursively. The ultimate aim is better fit between our analytically-driven expectations and economic outcomes.
△ Less
Submitted 20 October, 2001; v1 submitted 24 September, 2001;
originally announced September 2001.