-
ConvoCache: Smart Re-Use of Chatbot Responses
Authors:
Conor Atkins,
Ian Wood,
Mohamed Ali Kaafar,
Hassan Asghar,
Nardine Basta,
Michal Kepkowski
Abstract:
We present ConvoCache, a conversational caching system that solves the problem of slow and expensive generative AI models in spoken chatbots. ConvoCache finds a semantically similar prompt in the past and reuses the response. In this paper we evaluate ConvoCache on the DailyDialog dataset. We find that ConvoCache can apply a UniEval coherence threshold of 90% and respond to 89% of prompts using th…
▽ More
We present ConvoCache, a conversational caching system that solves the problem of slow and expensive generative AI models in spoken chatbots. ConvoCache finds a semantically similar prompt in the past and reuses the response. In this paper we evaluate ConvoCache on the DailyDialog dataset. We find that ConvoCache can apply a UniEval coherence threshold of 90% and respond to 89% of prompts using the cache with an average latency of 214ms, replacing LLM and voice synthesis that can take over 1s. To further reduce latency we test prefetching and find limited usefulness. Prefetching with 80% of a request leads to a 63% hit rate, and a drop in overall coherence. ConvoCache can be used with any chatbot to reduce costs by reducing usage of generative AI by up to 89%.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
SenTopX: Benchmark for User Sentiment on Various Topics
Authors:
Hina Qayyum,
Muhammad Ikram,
Benjamin Zhao,
Ian Wood,
Mohamad Ali Kaafar,
Nicolas Kourtellis
Abstract:
Toxic sentiment analysis on Twitter (X) often focuses on specific topics and events such as politics and elections. Datasets of toxic users in such research are typically gathered through lexicon-based techniques, providing only a cross-sectional view. his approach has a tight confine for studying toxic user behavior and effective platform moderation. To identify users consistently spreading toxic…
▽ More
Toxic sentiment analysis on Twitter (X) often focuses on specific topics and events such as politics and elections. Datasets of toxic users in such research are typically gathered through lexicon-based techniques, providing only a cross-sectional view. his approach has a tight confine for studying toxic user behavior and effective platform moderation. To identify users consistently spreading toxicity, a longitudinal analysis of their tweets is essential. However, such datasets currently do not exist.
This study addresses this gap by collecting a longitudinal dataset from 143K Twitter users, covering the period from 2007 to 2021, amounting to a total of 293 million tweets. Using topic modeling, we extract all topics discussed by each user and categorize users into eight groups based on the predominant topic in their timelines. We then analyze the sentiments of each group using 16 toxic scores. Our research demonstrates that examining users longitudinally reveals a distinct perspective on their comprehensive personality traits and their overall impact on the platform. Our comprehensive dataset is accessible to researchers for additional analysis.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Exploring the Distinctive Tweeting Patterns of Toxic Twitter Users
Authors:
Hina Qayyum,
Muhammad Ikram,
Benjamin Zi Hao Zhao,
Ian D. Wood,
Nicolas Kourtellis,
Mohamed Ali Kaafar
Abstract:
In the pursuit of bolstering user safety, social media platforms deploy active moderation strategies, including content removal and user suspension. These measures target users engaged in discussions marked by hate speech or toxicity, often linked to specific keywords or hashtags. Nonetheless, the increasing prevalence of toxicity indicates that certain users adeptly circumvent these measures. Thi…
▽ More
In the pursuit of bolstering user safety, social media platforms deploy active moderation strategies, including content removal and user suspension. These measures target users engaged in discussions marked by hate speech or toxicity, often linked to specific keywords or hashtags. Nonetheless, the increasing prevalence of toxicity indicates that certain users adeptly circumvent these measures. This study examines consistently toxic users on Twitter (rebranded as X) Rather than relying on traditional methods based on specific topics or hashtags, we employ a novel approach based on patterns of toxic tweets, yielding deeper insights into their behavior. We analyzed 38 million tweets from the timelines of 12,148 Twitter users and identified the top 1,457 users who consistently exhibit toxic behavior, relying on metrics like the Gini index and Toxicity score. By comparing their posting patterns to those of non-consistently toxic users, we have uncovered distinctive temporal patterns, including contiguous activity spans, inter-tweet intervals (referred to as 'Burstiness'), and churn analysis. These findings provide strong evidence for the existence of a unique tweeting pattern associated with toxic behavior on Twitter. Crucially, our methodology transcends Twitter and can be adapted to various social media platforms, facilitating the identification of consistently toxic users based on their posting behavior. This research contributes to ongoing efforts to combat online toxicity and offers insights for refining moderation strategies in the digital realm. We are committed to open research and will provide our code and data to the research community.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Challenges with Passwordless FIDO2 in an Enterprise Setting: A Usability Study
Authors:
Michal Kepkowski,
Maciej Machulak,
Ian Wood,
Dali Kaafar
Abstract:
Fast Identity Online 2 (FIDO2), a modern authentication protocol, is gaining popularity as a default strong authentication mechanism. It has been recognized as a leading candidate to overcome limitations (e.g., it is phishing resistant) of existing authentication solutions. However, the task of deprecating weak methods such as password-based authentication is not trivial and requires a comprehensi…
▽ More
Fast Identity Online 2 (FIDO2), a modern authentication protocol, is gaining popularity as a default strong authentication mechanism. It has been recognized as a leading candidate to overcome limitations (e.g., it is phishing resistant) of existing authentication solutions. However, the task of deprecating weak methods such as password-based authentication is not trivial and requires a comprehensive approach. While security, privacy, and end-user usability of FIDO2 have been addressed in both academic and industry literature, the difficulties associated with its integration with production environments, such as solution completeness or edge-case support, have received little attention. In particular, complex environments such as enterprise identity management pose unique challenges for any authentication system. In this paper, we identify challenging enterprise identity lifecycle use cases (e.g., remote workforce and legacy systems) by conducting a usability study, in which 118 professionals shared their perception of challenges to FIDO2 integration from their hands-on field experience. Our analysis of the user study results revealed serious gaps such as account recovery (selected by over 60% of our respondents), and identify priority development areas for the FIDO2 community.
△ Less
Submitted 13 September, 2023; v1 submitted 15 August, 2023;
originally announced August 2023.
-
An analysis of scam baiting calls: Identifying and extracting scam stages and scripts
Authors:
Ian Wood,
Michal Kepkowski,
Leron Zinatullin,
Travis Darnley,
Mohamed Ali Kaafar
Abstract:
Phone scams remain a difficult problem to tackle due to the combination of protocol limitations, legal enforcement challenges and advances in technology enabling attackers to hide their identities and reduce costs. Scammers use social engineering techniques to manipulate victims into revealing their personal details, purchasing online vouchers or transferring funds, causing significant financial l…
▽ More
Phone scams remain a difficult problem to tackle due to the combination of protocol limitations, legal enforcement challenges and advances in technology enabling attackers to hide their identities and reduce costs. Scammers use social engineering techniques to manipulate victims into revealing their personal details, purchasing online vouchers or transferring funds, causing significant financial losses. This paper aims to establish a methodology with which to semi-automatically analyze scam calls and infer information about scammers, their scams and their strategies at scale. Obtaining data for the study of scam calls is challenging, as true scam victims do not in general record their conversations. Instead, we draw from the community of ``scam baiters'' on YouTube: individuals who interact knowingly with phone scammers and publicly publish their conversations. These can not be considered as true scam calls, however they do provide a valuable opportunity to study scammer scripts and techniques, as the scammers are unaware that they are not speaking to a true scam victim for the bulk of the call. We applied topic and time series modeling alongside emotion recognition to scammer utterances and found clear evidence of scripted scam progressions that matched our expectations from close reading. We identified social engineering techniques associated with identified script stages including the apparent use of emotion as a social engineering tool. Our analyses provide new insights into strategies used by scammers and presents an effective methodology to infer such at scale. This work serves as a first step in building a better understanding of phone scam techniques, forming the ground work for more effective detection and prevention mechanisms that draw on a deeper understanding of the phone scam phenomenon.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Those Aren't Your Memories, They're Somebody Else's: Seeding Misinformation in Chat Bot Memories
Authors:
Conor Atkins,
Benjamin Zi Hao Zhao,
Hassan Jameel Asghar,
Ian Wood,
Mohamed Ali Kaafar
Abstract:
One of the new developments in chit-chat bots is a long-term memory mechanism that remembers information from past conversations for increasing engagement and consistency of responses. The bot is designed to extract knowledge of personal nature from their conversation partner, e.g., stating preference for a particular color. In this paper, we show that this memory mechanism can result in unintende…
▽ More
One of the new developments in chit-chat bots is a long-term memory mechanism that remembers information from past conversations for increasing engagement and consistency of responses. The bot is designed to extract knowledge of personal nature from their conversation partner, e.g., stating preference for a particular color. In this paper, we show that this memory mechanism can result in unintended behavior. In particular, we found that one can combine a personal statement with an informative statement that would lead the bot to remember the informative statement alongside personal knowledge in its long term memory. This means that the bot can be tricked into remembering misinformation which it would regurgitate as statements of fact when recalling information relevant to the topic of conversation. We demonstrate this vulnerability on the BlenderBot 2 framework implemented on the ParlAI platform and provide examples on the more recent and significantly larger BlenderBot 3 model. We generate 150 examples of misinformation, of which 114 (76%) were remembered by BlenderBot 2 when combined with a personal statement. We further assessed the risk of this misinformation being recalled after intervening innocuous conversation and in response to multiple questions relevant to the injected memory. Our evaluation was performed on both the memory-only and the combination of memory and internet search modes of BlenderBot 2. From the combinations of these variables, we generated 12,890 conversations and analyzed recalled misinformation in the responses. We found that when the chat bot is questioned on the misinformation topic, it was 328% more likely to respond with the misinformation as fact when the misinformation was in the long-term memory.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
A longitudinal study of the top 1% toxic Twitter profiles
Authors:
Hina Qayyum,
Benjamin Zi Hao Zhao,
Ian D. Wood,
Muhammad Ikram,
Mohamed Ali Kaafar,
Nicolas Kourtellis
Abstract:
Toxicity is endemic to online social networks including Twitter. It follows a Pareto like distribution where most of the toxicity is generated by a very small number of profiles and as such, analyzing and characterizing these toxic profiles is critical. Prior research has largely focused on sporadic, event centric toxic content to characterize toxicity on the platform. Instead, we approach the pro…
▽ More
Toxicity is endemic to online social networks including Twitter. It follows a Pareto like distribution where most of the toxicity is generated by a very small number of profiles and as such, analyzing and characterizing these toxic profiles is critical. Prior research has largely focused on sporadic, event centric toxic content to characterize toxicity on the platform. Instead, we approach the problem of characterizing toxic content from a profile centric point of view. We study 143K Twitter profiles and focus on the behavior of the top 1 percent producers of toxic content on Twitter, based on toxicity scores of their tweets availed by Perspective API. With a total of 293M tweets, spanning 16 years of activity, the longitudinal data allow us to reconstruct the timelines of all profiles involved. We use these timelines to gauge the behavior of the most toxic Twitter profiles compared to the rest of the Twitter population. We study the pattern of tweet posting from highly toxic accounts, based on the frequency and how prolific they are, the nature of hashtags and URLs, profile metadata, and Botometer scores. We find that the highly toxic profiles post coherent and well articulated content, their tweets keep to a narrow theme with lower diversity in hashtags, URLs, and domains, they are thematically similar to each other, and have a high likelihood of bot like behavior, likely to have progenitors with intentions to influence, based on high fake followers score. Our work contributes insight into the top 1 percent of toxic profiles on Twitter and establishes the profile centric approach to investigate toxicity on Twitter to be beneficial.
△ Less
Submitted 25 March, 2023;
originally announced March 2023.
-
Unintended Memorization and Timing Attacks in Named Entity Recognition Models
Authors:
Rana Salal Ali,
Benjamin Zi Hao Zhao,
Hassan Jameel Asghar,
Tham Nguyen,
Ian David Wood,
Dali Kaafar
Abstract:
Named entity recognition models (NER), are widely used for identifying named entities (e.g., individuals, locations, and other information) in text documents. Machine learning based NER models are increasingly being applied in privacy-sensitive applications that need automatic and scalable identification of sensitive information to redact text for data sharing. In this paper, we study the setting…
▽ More
Named entity recognition models (NER), are widely used for identifying named entities (e.g., individuals, locations, and other information) in text documents. Machine learning based NER models are increasingly being applied in privacy-sensitive applications that need automatic and scalable identification of sensitive information to redact text for data sharing. In this paper, we study the setting when NER models are available as a black-box service for identifying sensitive information in user documents and show that these models are vulnerable to membership inference on their training datasets. With updated pre-trained NER models from spaCy, we demonstrate two distinct membership attacks on these models. Our first attack capitalizes on unintended memorization in the NER's underlying neural network, a phenomenon NNs are known to be vulnerable to. Our second attack leverages a timing side-channel to target NER models that maintain vocabularies constructed from the training data. We show that different functional paths of words within the training dataset in contrast to words not previously seen have measurable differences in execution time. Revealing membership status of training samples has clear privacy implications, e.g., in text redaction, sensitive words or phrases to be found and removed, are at risk of being detected in the training dataset. Our experimental evaluation includes the redaction of both password and health data, presenting both security risks and privacy/regulatory issues. This is exacerbated by results that show memorization with only a single phrase. We achieved 70% AUC in our first attack on a text redaction use-case. We also show overwhelming success in the timing attack with 99.23% AUC. Finally we discuss potential mitigation approaches to realize the safe use of NER models in light of the privacy and security implications of membership inference attacks.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
How Not to Handle Keys: Timing Attacks on FIDO Authenticator Privacy
Authors:
Michal Kepkowski,
Lucjan Hanzlik,
Ian Wood,
Mohamed Ali Kaafar
Abstract:
This paper presents a timing attack on the FIDO2 (Fast IDentity Online) authentication protocol that allows attackers to link user accounts stored in vulnerable authenticators, a serious privacy concern. FIDO2 is a new standard specified by the FIDO industry alliance for secure token online authentication. It complements the W3C WebAuthn specification by providing means to use a USB token or other…
▽ More
This paper presents a timing attack on the FIDO2 (Fast IDentity Online) authentication protocol that allows attackers to link user accounts stored in vulnerable authenticators, a serious privacy concern. FIDO2 is a new standard specified by the FIDO industry alliance for secure token online authentication. It complements the W3C WebAuthn specification by providing means to use a USB token or other authenticator as a second factor during the authentication process. From a cryptographic perspective, the protocol is a simple challenge-response where the elliptic curve digital signature algorithm is used to sign challenges. To protect the privacy of the user the token uses unique key pairs per service. To accommodate for small memory, tokens use various techniques that make use of a special parameter called a key handle sent by the service to the token. We identify and analyse a vulnerability in the way the processing of key handles is implemented that allows attackers to remotely link user accounts on multiple services. We show that for vulnerable authenticators there is a difference between the time it takes to process a key handle for a different service but correct authenticator, and for a different authenticator but correct service. This difference can be used to perform a timing attack allowing an adversary to link user's accounts across services. We present several real world examples of adversaries that are in a position to execute our attack and can benefit from linking accounts. We found that two of the eight hardware authenticators we tested were vulnerable despite FIDO level 1 certification. This vulnerability cannot be easily mitigated on authenticators because, for security reasons, they usually do not allow firmware updates. In addition, we show that due to the way existing browsers implement the WebAuthn standard, the attack can be executed remotely.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
A deep dive into the consistently toxic 1% of Twitter
Authors:
Hina Qayyum,
Benjamin Zi Hao Zhao,
Ian D. Wood,
Muhammad Ikram,
Mohamed Ali Kaafar,
Nicolas Kourtellis
Abstract:
Misbehavior in online social networks (OSN) is an ever-growing phenomenon. The research to date tends to focus on the deployment of machine learning to identify and classify types of misbehavior such as bullying, aggression, and racism to name a few. The main goal of identification is to curb natural and mechanical misconduct and make OSNs a safer place for social discourse. Going beyond past work…
▽ More
Misbehavior in online social networks (OSN) is an ever-growing phenomenon. The research to date tends to focus on the deployment of machine learning to identify and classify types of misbehavior such as bullying, aggression, and racism to name a few. The main goal of identification is to curb natural and mechanical misconduct and make OSNs a safer place for social discourse. Going beyond past works, we perform a longitudinal study of a large selection of Twitter profiles, which enables us to characterize profiles in terms of how consistently they post highly toxic content. Our data spans 14 years of tweets from 122K Twitter profiles and more than 293M tweets. From this data, we selected the most extreme profiles in terms of consistency of toxic content and examined their tweet texts, and the domains, hashtags, and URLs they shared. We found that these selected profiles keep to a narrow theme with lower diversity in hashtags, URLs, and domains, they are thematically similar to each other (in a coordinated manner, if not through intent), and have a high likelihood of bot-like behavior (likely to have progenitors with intentions to influence). Our work contributes a substantial and longitudinal online misbehavior dataset to the research community and establishes the consistency of a profile's toxic behavior as a useful factor when exploring misbehavior as potential accessories to influence operations on OSNs.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
Small Cohort of Epilepsy Patients Showed Increased Activity on Facebook before Sudden Unexpected Death
Authors:
Ian B. Wood,
Rion Brattig Correia,
Wendy R. Miller,
Luis M. Rocha
Abstract:
Sudden Unexpected Death in Epilepsy (SUDEP) remains a leading cause of death in people with epilepsy. Despite the constant risk for patients and bereavement to family members, to date the physiological mechanisms of SUDEP remain unknown. Here we explore the potential to identify putative predictive signals of SUDEP from online digital behavioral data using text and sentiment analysis. Specifically…
▽ More
Sudden Unexpected Death in Epilepsy (SUDEP) remains a leading cause of death in people with epilepsy. Despite the constant risk for patients and bereavement to family members, to date the physiological mechanisms of SUDEP remain unknown. Here we explore the potential to identify putative predictive signals of SUDEP from online digital behavioral data using text and sentiment analysis. Specifically, we analyze Facebook timelines of six epilepsy patients deceased due to SUDEP, donated by surviving family members. We find preliminary evidence for behavioral changes detectable by text and sentiment analysis tools. Namely, in the months preceding their SUDEP event patient social media timelines show: i) increase in verbosity; ii) increased use of functional words; and iii) sentiment shifts as measured by different sentiment analysis tools. Combined, these results suggest that social media engagement, as well as its sentiment, may serve as possible early-warning signals for SUDEP in people with epilepsy. While the small sample of patient timelines analyzed in this study prevents generalization, our preliminary investigation demonstrates the potential of social media data as complementary data in larger studies of SUDEP and epilepsy.
△ Less
Submitted 19 January, 2022;
originally announced January 2022.
-
ECOL-R: Encouraging Copying in Novel Object Captioning with Reinforcement Learning
Authors:
Yufei Wang,
Ian D. Wood,
Stephen Wan,
Mark Johnson
Abstract:
Novel Object Captioning is a zero-shot Image Captioning task requiring describing objects not seen in the training captions, but for which information is available from external object detectors. The key challenge is to select and describe all salient detected novel objects in the input images. In this paper, we focus on this challenge and propose the ECOL-R model (Encouraging Copying of Object La…
▽ More
Novel Object Captioning is a zero-shot Image Captioning task requiring describing objects not seen in the training captions, but for which information is available from external object detectors. The key challenge is to select and describe all salient detected novel objects in the input images. In this paper, we focus on this challenge and propose the ECOL-R model (Encouraging Copying of Object Labels with Reinforced Learning), a copy-augmented transformer model that is encouraged to accurately describe the novel object labels. This is achieved via a specialised reward function in the SCST reinforcement learning framework (Rennie et al., 2017) that encourages novel object mentions while maintaining the caption quality. We further restrict the SCST training to the images where detected objects are mentioned in reference captions to train the ECOL-R model. We additionally improve our copy mechanism via Abstract Labels, which transfer knowledge from known to novel object types, and a Morphological Selector, which determines the appropriate inflected forms of novel object labels. The resulting model sets new state-of-the-art on the nocaps (Agrawal et al., 2019) and held-out COCO (Hendricks et al., 2016) benchmarks.
△ Less
Submitted 24 January, 2021;
originally announced January 2021.
-
Mining social media data for biomedical signals and health-related behavior
Authors:
Rion Brattig Correia,
Ian B. Wood,
Johan Bollen,
Luis M. Rocha
Abstract:
Social media data has been increasingly used to study biomedical and health-related phenomena. From cohort level discussions of a condition to planetary level analyses of sentiment, social media has provided scientists with unprecedented amounts of data to study human behavior and response associated with a variety of health conditions and medical treatments. Here we review recent work in mining s…
▽ More
Social media data has been increasingly used to study biomedical and health-related phenomena. From cohort level discussions of a condition to planetary level analyses of sentiment, social media has provided scientists with unprecedented amounts of data to study human behavior and response associated with a variety of health conditions and medical treatments. Here we review recent work in mining social media for biomedical, epidemiological, and social phenomena information relevant to the multilevel complexity of human health. We pay particular attention to topics where social media data analysis has shown the most progress, including pharmacovigilance, sentiment analysis especially for mental health, and other areas. We also discuss a variety of innovative uses of social media data for health-related applications and important limitations in social media data access and use.
△ Less
Submitted 28 January, 2020;
originally announced January 2020.
-
Global labor flow network reveals the hierarchical organization and dynamics of geo-industrial clusters in the world economy
Authors:
Jaehyuk Park,
Ian Wood,
Elise Jing,
Azadeh Nematzadeh,
Souvik Ghosh,
Michael Conover,
Yong-Yeol Ahn
Abstract:
Groups of firms often achieve a competitive advantage through the formation of geo-industrial clusters. Although many exemplary clusters, such as Hollywood or Silicon Valley, have been frequently studied, systematic approaches to identify and analyze the hierarchical structure of the geo-industrial clusters at the global scale are rare. In this work, we use LinkedIn's employment histories of more…
▽ More
Groups of firms often achieve a competitive advantage through the formation of geo-industrial clusters. Although many exemplary clusters, such as Hollywood or Silicon Valley, have been frequently studied, systematic approaches to identify and analyze the hierarchical structure of the geo-industrial clusters at the global scale are rare. In this work, we use LinkedIn's employment histories of more than 500 million users over 25 years to construct a labor flow network of over 4 million firms across the world and apply a recursive network community detection algorithm to reveal the hierarchical structure of geo-industrial clusters. We show that the resulting geo-industrial clusters exhibit a stronger association between the influx of educated-workers and financial performance, compared to existing aggregation units. Furthermore, our additional analysis of the skill sets of educated-workers supplements the relationship between the labor flow of educated-workers and productivity growth. We argue that geo-industrial clusters defined by labor flow provide better insights into the growth and the decline of the economy than other common economic units.
△ Less
Submitted 19 March, 2019; v1 submitted 12 February, 2019;
originally announced February 2019.
-
Human Sexual Cycles are Driven by Culture and Match Collective Moods
Authors:
Ian B. Wood,
Pedro Leal Varela,
Johan Bollen,
Luis M. Rocha,
Joana Gonçalves-Sá
Abstract:
It is a long-standing question whether human sexual and reproductive cycles are affected predominantly by biology or culture. The literature is mixed with respect to whether biological or cultural factors best explain the reproduction cycle phenomenon, with biological explanations dominating the argument. The biological hypothesis proposes that human reproductive cycles are an adaptation to the se…
▽ More
It is a long-standing question whether human sexual and reproductive cycles are affected predominantly by biology or culture. The literature is mixed with respect to whether biological or cultural factors best explain the reproduction cycle phenomenon, with biological explanations dominating the argument. The biological hypothesis proposes that human reproductive cycles are an adaptation to the seasonal cycles caused by hemisphere positioning, while the cultural hypothesis proposes that conception dates vary mostly due to cultural factors, such as vacation schedule or religious holidays. However, for many countries, common records used to investigate these hypotheses are incomplete or unavailable, biasing existing analysis towards primarily Christian countries in the Northern Hemisphere. Here we show that interest in sex peaks sharply online during major cultural and religious celebrations, regardless of hemisphere location. This online interest, when shifted by nine months, corresponds to documented human birth cycles, even after adjusting for numerous factors such as language, season, and amount of free time due to holidays. We further show that mood, measured independently on Twitter, contains distinct collective emotions associated with those cultural celebrations, and these collective moods correlate with sex search volume outside of these holidays as well. Our results provide converging evidence that the cyclic sexual and reproductive behavior of human populations is mostly driven by culture and that this interest in sex is associated with specific emotions, characteristic of, but not limited to, major cultural and religious celebrations.
△ Less
Submitted 27 October, 2017; v1 submitted 12 July, 2017;
originally announced July 2017.
-
Element-centric clustering comparison unifies overlaps and hierarchy
Authors:
Alexander J. Gates,
Ian B. Wood,
William P. Hetrick,
Yong-Yeol Ahn
Abstract:
Clustering is one of the most universal approaches for understanding complex data. A pivotal aspect of clustering analysis is quantitatively comparing clusterings; clustering comparison is the basis for many tasks such as clustering evaluation, consensus clustering, and tracking the temporal evolution of clusters. In particular, the extrinsic evaluation of clustering methods requires comparing the…
▽ More
Clustering is one of the most universal approaches for understanding complex data. A pivotal aspect of clustering analysis is quantitatively comparing clusterings; clustering comparison is the basis for many tasks such as clustering evaluation, consensus clustering, and tracking the temporal evolution of clusters. In particular, the extrinsic evaluation of clustering methods requires comparing the uncovered clusterings to planted clusterings or known metadata. Yet, as we demonstrate, existing clustering comparison measures have critical biases which undermine their usefulness, and no measure accommodates both overlapping and hierarchical clusterings. Here we unify the comparison of disjoint, overlapping, and hierarchically structured clusterings by proposing a new element-centric framework: elements are compared based on the relationships induced by the cluster structure, as opposed to the traditional cluster-centric philosophy. We demonstrate that, in contrast to standard clustering similarity measures, our framework does not suffer from critical biases and naturally provides unique insights into how the clusterings differ. We illustrate the strengths of our framework by revealing new insights into the organization of clusters in two applications: the improved classification of schizophrenia based on the overlapping and hierarchical community structure of fMRI brain networks, and the disentanglement of various social homophily factors in Facebook social networks. The universality of clustering suggests far-reaching impact of our framework throughout all areas of science.
△ Less
Submitted 12 June, 2019; v1 submitted 19 June, 2017;
originally announced June 2017.
-
Designing a minimalist socially aware robotic agent for the home
Authors:
Matthew R. Francisco,
Ian Wood,
Selma Šabanović,
Luis M. Rocha
Abstract:
We present a minimalist social robot that relies on long timeseries of low resolution data such as mechanical vibration, temperature, lighting, sounds and collisions. Our goal is to develop an experimental system for growing socially situated robotic agents whose behavioral repertoire is subsumed by the social order of the space. To get there we are designing robots that use their simple sensors a…
▽ More
We present a minimalist social robot that relies on long timeseries of low resolution data such as mechanical vibration, temperature, lighting, sounds and collisions. Our goal is to develop an experimental system for growing socially situated robotic agents whose behavioral repertoire is subsumed by the social order of the space. To get there we are designing robots that use their simple sensors and motion feedback routines to recognize different classes of human activity and then associate to each class a range of appropriate behaviors. We use the Katie Family of robots, built on the iRobot Create platform, an Arduino Uno, and a Raspberry Pi. We describe its sensor abilities and exploratory tests that allow us to develop hypotheses about what objects (sensor data) correspond to something known and observable by a human subject. We use machine learning methods to classify three social scenarios from over a hundred experiments, demonstrating that it is possible to detect social situations with high accuracy, using the low-resolution sensors from our minimalist robot.
△ Less
Submitted 26 June, 2014;
originally announced June 2014.
-
(Non-)Equivalence of Universal Priors
Authors:
Ian Wood,
Peter Sunehag,
Marcus Hutter
Abstract:
Ray Solomonoff invented the notion of universal induction featuring an aptly termed "universal" prior probability function over all possible computable environments. The essential property of this prior was its ability to dominate all other such priors. Later, Levin introduced another construction --- a mixture of all possible priors or `universal mixture'. These priors are well known to be equiva…
▽ More
Ray Solomonoff invented the notion of universal induction featuring an aptly termed "universal" prior probability function over all possible computable environments. The essential property of this prior was its ability to dominate all other such priors. Later, Levin introduced another construction --- a mixture of all possible priors or `universal mixture'. These priors are well known to be equivalent up to multiplicative constants. Here, we seek to clarify further the relationships between these three characterisations of a universal prior (Solomonoff's, universal mixtures, and universally dominant priors). We see that the the constructions of Solomonoff and Levin define an identical class of priors, while the class of universally dominant priors is strictly larger. We provide some characterisation of the discrepancy.
△ Less
Submitted 16 November, 2011;
originally announced November 2011.