-
Leveraging Deep Learning and Online Source Sentiment for Financial Portfolio Management
Authors:
Paraskevi Nousi,
Loukia Avramelou,
Georgios Rodinos,
Maria Tzelepi,
Theodoros Manousis,
Konstantinos Tsampazis,
Kyriakos Stefanidis,
Dimitris Spanos,
Manos Kirtas,
Pavlos Tosidis,
Avraam Tsantekidis,
Nikolaos Passalis,
Anastasios Tefas
Abstract:
Financial portfolio management describes the task of distributing funds and conducting trading operations on a set of financial assets, such as stocks, index funds, foreign exchange or cryptocurrencies, aiming to maximize the profit while minimizing the loss incurred by said operations. Deep Learning (DL) methods have been consistently excelling at various tasks and automated financial trading is…
▽ More
Financial portfolio management describes the task of distributing funds and conducting trading operations on a set of financial assets, such as stocks, index funds, foreign exchange or cryptocurrencies, aiming to maximize the profit while minimizing the loss incurred by said operations. Deep Learning (DL) methods have been consistently excelling at various tasks and automated financial trading is one of the most complex one of those. This paper aims to provide insight into various DL methods for financial trading, under both the supervised and reinforcement learning schemes. At the same time, taking into consideration sentiment information regarding the traded assets, we discuss and demonstrate their usefulness through corresponding research studies. Finally, we discuss commonly found problems in training such financial agents and equip the reader with the necessary knowledge to avoid these problems and apply the discussed methods in practice.
△ Less
Submitted 24 October, 2023; v1 submitted 23 July, 2023;
originally announced September 2023.
-
Computational Team Assembly with Fairness Constraints
Authors:
Rodrigo Borges,
Otto Sahlgrens,
Sami Koivunen,
Kostas Stefanidis,
Thomas Olsson,
Arto Laitinen
Abstract:
Team assembly is a problem that demands trade-offs between multiple fairness criteria and computational optimization. We focus on four criteria: (i) fair distribution of workloads within the team, (ii) fair distribution of skills and expertise regarding project requirements, (iii) fair distribution of protected classes in the team, and (iv) fair distribution of the team cost among protected classe…
▽ More
Team assembly is a problem that demands trade-offs between multiple fairness criteria and computational optimization. We focus on four criteria: (i) fair distribution of workloads within the team, (ii) fair distribution of skills and expertise regarding project requirements, (iii) fair distribution of protected classes in the team, and (iv) fair distribution of the team cost among protected classes. For this problem, we propose a two-stage algorithmic solution. First, a multi-objective optimization procedure is executed and the Pareto candidates that satisfy the project requirements are selected. Second, N random groups are formed containing combinations of these candidates, and a second round of multi-objective optimization is executed, but this time for selecting the groups that optimize the team-assembly criteria.
△ Less
Submitted 24 June, 2023; v1 submitted 12 June, 2023;
originally announced June 2023.
-
Auditing for Spatial Fairness
Authors:
Dimitris Sacharidis,
Giorgos Giannopoulos,
George Papastefanatos,
Kostas Stefanidis
Abstract:
This paper studies algorithmic fairness when the protected attribute is location. To handle protected attributes that are continuous, such as age or income, the standard approach is to discretize the domain into predefined groups, and compare algorithmic outcomes across groups. However, applying this idea to location raises concerns of gerrymandering and may introduce statistical bias. Prior work…
▽ More
This paper studies algorithmic fairness when the protected attribute is location. To handle protected attributes that are continuous, such as age or income, the standard approach is to discretize the domain into predefined groups, and compare algorithmic outcomes across groups. However, applying this idea to location raises concerns of gerrymandering and may introduce statistical bias. Prior work addresses these concerns but only for regularly spaced locations, while raising other issues, most notably its inability to discern regions that are likely to exhibit spatial unfairness. Similar to established notions of algorithmic fairness, we define spatial fairness as the statistical independence of outcomes from location. This translates into requiring that for each region of space, the distribution of outcomes is identical inside and outside the region. To allow for localized discrepancies in the distribution of outcomes, we compare how well two competing hypotheses explain the observed outcomes. The null hypothesis assumes spatial fairness, while the alternate allows different distributions inside and outside regions. Their goodness of fit is then assessed by a likelihood ratio test. If there is no significant difference in how well the two hypotheses explain the observed outcomes, we conclude that the algorithm is spatially fair.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
A Civil Protection Early Warning System to Improve the Resilience of Adriatic-Ionian Territories to Natural and Man-made Risk
Authors:
Agorakis Bompotas,
Christos Anagnostopoulos,
Athanasios Kalogeras,
Georgios Kalogeras,
Georgios Mylonas,
Kyriakos Stefanidis,
Christos Alexakos,
Miranda Dandoulaki
Abstract:
We are currently witnessing an increased occurrence of extreme weather events, causing a great deal of disruption and distress across the globe. In this setting, the importance and utility of Early Warning Systems is becoming increasingly obvious. In this work, we present the design of an early warning system called TransCPEarlyWarning, aimed at seven countries in the Adriatic-Ionian area in Europ…
▽ More
We are currently witnessing an increased occurrence of extreme weather events, causing a great deal of disruption and distress across the globe. In this setting, the importance and utility of Early Warning Systems is becoming increasingly obvious. In this work, we present the design of an early warning system called TransCPEarlyWarning, aimed at seven countries in the Adriatic-Ionian area in Europe. The overall objective is to increase the level of cooperation among national civil protection institutions in these countries, addressing natural and man-made risks from the early warning stage and improving the intervention capabilities of civil protection mechanisms. The system utilizes an innovative approach with a lever effect, while also aiming to support the whole system of Civil Protection.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
Fairness in Rankings and Recommendations: An Overview
Authors:
Evaggelia Pitoura,
Kostas Stefanidis,
Georgia Koutrika
Abstract:
We increasingly depend on a variety of data-driven algorithmic systems to assist us in many aspects of life. Search engines and recommender systems amongst others are used as sources of information and to help us in making all sort of decisions from selecting restaurants and books, to choosing friends and careers. This has given rise to important concerns regarding the fairness of such systems. In…
▽ More
We increasingly depend on a variety of data-driven algorithmic systems to assist us in many aspects of life. Search engines and recommender systems amongst others are used as sources of information and to help us in making all sort of decisions from selecting restaurants and books, to choosing friends and careers. This has given rise to important concerns regarding the fairness of such systems. In this work, we aim at presenting a toolkit of definitions, models and methods used for ensuring fairness in rankings and recommendations. Our objectives are three-fold: (a) to provide a solid framework on a novel, quickly evolving, and impactful domain, (b) to present related methods and put them into perspective, and (c) to highlight open challenges and research paths for future work.
△ Less
Submitted 31 August, 2021; v1 submitted 13 April, 2021;
originally announced April 2021.
-
Benchmarking Blocking Algorithms for Web Entities
Authors:
Vasilis Efthymiou,
Kostas Stefanidis,
Vassilis Christophides
Abstract:
An increasing number of entities are described by interlinked data rather than documents on the Web. Entity Resolution (ER) aims to identify descriptions of the same real-world entity within one or across knowledge bases in the Web of data. To reduce the required number of pairwise comparisons among descriptions, ER methods typically perform a pre-processing step, called \emph{blocking}, which pla…
▽ More
An increasing number of entities are described by interlinked data rather than documents on the Web. Entity Resolution (ER) aims to identify descriptions of the same real-world entity within one or across knowledge bases in the Web of data. To reduce the required number of pairwise comparisons among descriptions, ER methods typically perform a pre-processing step, called \emph{blocking}, which places similar entity descriptions into blocks and thus only compare descriptions within the same block. We experimentally evaluate several blocking methods proposed for the Web of data using real datasets, whose characteristics significantly impact their effectiveness and efficiency. The proposed experimental evaluation framework allows us to better understand the characteristics of the missed matching entity descriptions and contrast them with ground truth obtained from different kinds of relatedness links.
△ Less
Submitted 19 May, 2020;
originally announced May 2020.
-
Piveau: A Large-scale Open Data Management Platform based on Semantic Web Technologies
Authors:
Fabian Kirstein,
Kyriakos Stefanidis,
Benjamin Dittwald,
Simon Dutkowski,
Sebastian Urbanek,
Manfred Hauswirth
Abstract:
The publication and (re)utilization of Open Data is still facing multiple barriers on technical, organizational and legal levels. This includes limitations in interfaces, search capabilities, provision of quality information and the lack of definite standards and implementation guidelines. Many Semantic Web specifications and technologies are specifically designed to address the publication of dat…
▽ More
The publication and (re)utilization of Open Data is still facing multiple barriers on technical, organizational and legal levels. This includes limitations in interfaces, search capabilities, provision of quality information and the lack of definite standards and implementation guidelines. Many Semantic Web specifications and technologies are specifically designed to address the publication of data on the web. In addition, many official publication bodies encourage and foster the development of Open Data standards based on Semantic Web principles. However, no existing solution for managing Open Data takes full advantage of these possibilities and benfits. In this paper, we present our solution "Piveau", a fully-fledged Open Data management solution, based on Semantic Web technologies. It harnesses a variety of standards, like RDF, DCAT, DQV, and SKOS, to overcome the barriers in Open Data publication. The solution puts a strong focus on assuring data quality and scalability. We give a detailed description of the underlying, highly scalable, service-oriented architecture, how we integrated the aforementioned standards, and used a triplestore as our primary database. We have evaluated our work in a comprehensive feature comparison to established solutions and through a practical application in a production environment, the European Data Portal. Our solution is available as Open Source.
△ Less
Submitted 6 May, 2020;
originally announced May 2020.
-
End-to-End Entity Resolution for Big Data: A Survey
Authors:
Vassilis Christophides,
Vasilis Efthymiou,
Themis Palpanas,
George Papadakis,
Kostas Stefanidis
Abstract:
One of the most important tasks for improving data quality and the reliability of data analytics results is Entity Resolution (ER). ER aims to identify different descriptions that refer to the same real-world entity, and remains a challenging problem. While previous works have studied specific aspects of ER (and mostly in traditional settings), in this survey, we provide for the first time an end-…
▽ More
One of the most important tasks for improving data quality and the reliability of data analytics results is Entity Resolution (ER). ER aims to identify different descriptions that refer to the same real-world entity, and remains a challenging problem. While previous works have studied specific aspects of ER (and mostly in traditional settings), in this survey, we provide for the first time an end-to-end view of modern ER workflows, and of the novel aspects of entity indexing and matching methods in order to cope with more than one of the Big Data characteristics simultaneously. We present the basic concepts, processing steps and execution strategies that have been proposed by different communities, i.e., database, semantic Web and machine learning, in order to cope with the loose structuredness, extreme diversity, high speed and large scale of entity descriptions used by real-world applications. Finally, we provide a synthetic discussion of the existing approaches, and conclude with a detailed presentation of open research directions.
△ Less
Submitted 19 August, 2020; v1 submitted 15 May, 2019;
originally announced May 2019.
-
MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities
Authors:
Vasilis Efthymiou,
George Papadakis,
Kostas Stefanidis,
Vassilis Christophides
Abstract:
Entity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of entity descriptions published in the Web of Data. To address them, we propose the MinoanER framework that simultaneously fulfills full automation, support of highly heterogeneous entities, and massive parallelization…
▽ More
Entity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of entity descriptions published in the Web of Data. To address them, we propose the MinoanER framework that simultaneously fulfills full automation, support of highly heterogeneous entities, and massive parallelization of the ER process. MinoanER leverages a token-based similarity of entities to define a new metric that derives the similarity of neighboring entities from the most important relations, as they are indicated only by statistics. A composite blocking method is employed to capture different sources of matching evidence from the content, neighbors, or names of entities. The search space of candidate pairs for comparison is compactly abstracted by a novel disjunctive blocking graph and processed by a non-iterative, massively parallel matching algorithm that consists of four generic, schema-agnostic matching rules that are quite robust with respect to their internal configuration. We demonstrate that the effectiveness of MinoanER is comparable to existing ER tools over real KBs exhibiting low Variety, but it outperforms them significantly when matching KBs with high Variety.
△ Less
Submitted 15 May, 2019;
originally announced May 2019.
-
Tracking the History and Evolution of Entities: Entity-centric Temporal Analysis of Large Social Media Archives
Authors:
Pavlos Fafalios,
Vasileios Iosifidis,
Kostas Stefanidis,
Eirini Ntoutsi
Abstract:
How did the popularity of the Greek Prime Minister evolve in 2015? How did the predominant sentiment about him vary during that period? Were there any controversial sub-periods? What other entities were related to him during these periods? To answer these questions, one needs to analyze archived documents and data about the query entities, such as old news articles or social media archives. In par…
▽ More
How did the popularity of the Greek Prime Minister evolve in 2015? How did the predominant sentiment about him vary during that period? Were there any controversial sub-periods? What other entities were related to him during these periods? To answer these questions, one needs to analyze archived documents and data about the query entities, such as old news articles or social media archives. In particular, user-generated content posted in social networks, like Twitter and Facebook, can be seen as a comprehensive documentation of our society, and thus meaningful analysis methods over such archived data are of immense value for sociologists, historians and other interested parties who want to study the history and evolution of entities and events. To this end, in this paper we propose an entity-centric approach to analyze social media archives and we define measures that allow studying how entities were reflected in social media in different time periods and under different aspects, like popularity, attitude, controversiality, and connectedness with other entities. A case study using a large Twitter archive of four years illustrates the insights that can be gained by such an entity-centric and multi-aspect analysis.
△ Less
Submitted 24 October, 2018;
originally announced October 2018.
-
A Flexible Framework for Defining, Representing and Detecting Changes on the Data Web
Authors:
Yannis Roussakis,
Ioannis Chrysakis,
Kostas Stefanidis,
Giorgos Flouris,
Yannis Stavrakas
Abstract:
The dynamic nature of Web data gives rise to a multitude of problems related to the identification, computation and management of the evolving versions and the related changes. In this paper, we consider the problem of change recognition in RDF datasets, i.e., the problem of identifying, and when possible give semantics to, the changes that led from one version of an RDF dataset to another. Despit…
▽ More
The dynamic nature of Web data gives rise to a multitude of problems related to the identification, computation and management of the evolving versions and the related changes. In this paper, we consider the problem of change recognition in RDF datasets, i.e., the problem of identifying, and when possible give semantics to, the changes that led from one version of an RDF dataset to another. Despite our RDF focus, our approach is sufficiently general to engulf different data models that can be encoded in RDF, such as relational or multi-dimensional. In fact, we propose a flexible, extendible and data-model-independent methodology of defining changes that can capture the peculiarities and needs of different data models and applications, while being formally robust due to the satisfaction of the properties of completeness and unambiguity. Further, we propose an ontology of changes for storing the detected changes that allows automated processing and analysis of changes, cross-snapshot queries (spanning across different versions), as well as queries involving both changes and data. To detect changes and populate said ontology, we propose a customizable detection algorithm, which is applicable to different data models and applications requiring the detection of custom, user-defined changes. Finally, we provide a proof-of-concept application and evaluation of our framework for different data models.
△ Less
Submitted 12 January, 2015;
originally announced January 2015.
-
Finding the Right Set of Users: Generalized Constraints for Group Recommendations
Authors:
Kostas Stefanidis,
Evaggelia Pitoura
Abstract:
Recently, group recommendations have attracted considerable attention. Rather than recommending items to individual users, group recommenders recommend items to groups of users. In this position paper, we introduce the problem of forming an appropriate group of users to recommend an item when constraints apply to the members of the group. We present a formal model of the problem and an algorithm f…
▽ More
Recently, group recommendations have attracted considerable attention. Rather than recommending items to individual users, group recommenders recommend items to groups of users. In this position paper, we introduce the problem of forming an appropriate group of users to recommend an item when constraints apply to the members of the group. We present a formal model of the problem and an algorithm for its solution. Finally, we identify several directions for future work.
△ Less
Submitted 26 February, 2013;
originally announced February 2013.