Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

FRAGOLA: Fabulous RAnking of GastrOnomy LocAtions

2013, Lecture Notes in Computer Science

FRAGOLA: Fabulous RAnking of GastrOnomy LocAtions Ana Alvarado, Oriana Baldizán, Marlene Goncalves, and Marı́a-Esther Vidal Universidad Simón Bolı́var, Venezuela {aalvarado,obaldizan,mgoncalves,mvidal}@ldc.usb.ve Abstract. Nowadays, large open datasets are frequently accessed to select, for example, restaurants that best meet gastronomy criteria and are closer to their current geo-spatial locations. We have developed a skylinebased ranking approach named FOPA, which is able to efficiently rank resources that fullfil this type of multi-objective queries. As a proof of concept, we developed FRAGOLA (Fabulous RAnking of GastrOnomy LocAtions), a tool that implements FOPA and ranks gastronomy locations based on multi-objective criteria. We will demonstrate FRAGOLA, and attendees will observe scenarios where FOPA overcomes performance of existing skyline-based approaches by up to two orders of magnitude. 1 Introduction Under the umbrella of the Semantic Web and the Open Data initiatives, large datasets have been published and can be publicly accessed from any node of the Internet. Although the democratization of the information provides the basis to manage large volumes of data, there are still applications where it is important to efficiently identify only the best tuples that satisfy a user requirement. Particularly, large datasets of government and private recreational data are available, and these data can be accessed to identify the places that best meet users’ cuisine requirements and are closer to her current geo-spatial location. Based on related work, we devised a solution to this ranking problem and developed techniques able to identify the gastronomy locations that best meet these multi-objective queries, i.e., the gastronomy locations are not better than other gastronomy locations in terms of the multi-objective criteria. The set of non-dominated points is known as skyline, i.e., set of points such that, none of them is better than the rest [2,3]. We developed an algorithm that combines ideas from the approaches described by Balke et al. [2] and Chen et al. [3] to compute the skyline points that best meet a multi-objective query; the algorithm implements different pruning criteria and avoids traversing the whole space of data to compute the skyline. Thus, the execution time as well as the number of probes required to output the answer is minimized. We illustrate the performance of our approach on a dataset of gastronomy locations downloaded from Zagat,1 where restaurants are 1 http://www.zagat.com/paris Y.T. Demey and H. Panetto (Eds.): OTM 2013 Workshops, LNCS 8186, pp. 408–413, 2013. c Springer-Verlag Berlin Heidelberg 2013  FRAGOLA: Fabulous RAnking of GastrOnomy LocAtions 409 characterized by six parameters, and queries required to rank the best restaurants expressed in terms of these parameters as well as with respect to different geo-spatial locations. The demo is published at http://fragola.ldc.usb.ve/. 2 The FRAGOLA System An RDF document is comprised of triples that describe resources in terms of several properties; formally, this can be seen as a set of multi-dimensional points that describe each resource in terms of their properties. A multi-objective query is comprised of: i) a condition or list of RDF properties, and the MIN or MAX directives indicating if the values of the corresponding property must be minimized or maximized, and ii) the user current location. The answer of a query q corresponds to the points or resources in the multi-dimensional dataset D that are incomparable, i.e., the skyline that is composed of all the points p, such that: i) there is not other point p′ in D with values better or equal than p in all the attributes of p, and ii) other points in the skyline are better than p in at least one attribute. The problem of computing the skyline is polynomial on the size of the dataset, and the goal of state-of-the-art skyline algorithms is to compute the set of incomparable points without having to perform a polynomial number of comparisons [4,5]. The FRAGOLA System seeks to illustrate how our Final Object Prunning Algorithm (FOPA) [1] achieves this goal by using some data properties, and extending features of the algorithms RSJFH [3] and IDSA [2]. These three algorithms assume that the data is stored following a vertically partitioned table representation, i.e., for each dimension or RDF property a, there exists a relation aR composed of two attributes, Subject and Value; tuples are ordered according to the attribute Value. Further, indices are kept on top of these tables to provide direct and sequential access, and a data structure is used to track the last values of the objects seen in each dimension. The algorithms work on iterations, where the best entry(ries) in each of the vertical tables is(are) considered in one iteration. The goal of the three algorithms is to minimize the number of comparisons between data dimensions to compute the skyline. First, the RDFSkyJoinWithFullHeader (RSJFH) algorithm proposed by Chen et al. [3], uses the data structure named header point, to record the worst values of the tuples explored in previous iterations; this information is used to guide the pruning of tuples seen in future iterations. Second, the Improved Distributed Skyline Algorithm (IDSA) proposed by Balke et al. [2] guides the search into the space of the final object, i.e., an object that has been considered in all the vertical tables; once the final object is found IDSA can ensure that a super-set of the skyline has been found, and a post-processing step is fired to discard the tuples in the super-set that are not incomparable. Although experimental studies reported in the literature [2,3] suggest that RSJFH and IDSA are efficient, both may suffer of the following drawbacks: i) RSJFH produces incomplete results for multi-objective criteria of three or more dimensions, and ii) IDSA performs poorly when the final object is comprised of the worst value of at least one dimension. To overcome these limitations,