Abstract
User data is becoming increasingly available in various domains from the social Web to location check-ins and smartphone usage traces. Due to the sparsity and impurity of user data, we propose to analyze labeled groups of users instead of individuals, e.g., “countryside teachers who watch Woody Allen movies.” When chosen appropriately, labeled groups provide quick and useful insights on user data. Analysis of user groups is often non-trivial due its huge volume. In this paper, we introduce AugMan, a framework for the efficient summarization of user groups via abstraction. Our framework performs a dynamic, data-driven and lossless abstraction which helps analysts obtain high quality insights on user data without being overwhelmed. Our experiments show that AugManoffers representative and informative abstractions in a scalable fashion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The dataset is publicly available at https://goo.gl/ZQ6doV.
References
Omidvar-Tehrani, B., Amer-Yahia, S., Termier, A.: Interactive user group analysis. In: CIKM 2015 (2015)
Parida, L.: Redescription mining: structure theory and algorithms. In: AAAI 2005 (2005)
Amer-Yahia, S., Tehrani, B.O., Roy, S.B., Shabib, N.: Group recommendation with temporal affinities. In: EDBT (2015)
Kargar, M., An, A., Zihayat, M.: Efficient bi-objective team formation in social networks. In: Flach, P.A., Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS, vol. 7524, pp. 483–498. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33486-3_31
Cao, C.C., She, J., Tong, Y., Chen, L.: Whom to ask?: jury selection for decision making tasks on micro-blog services. VLDB (2012)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
Van Leeuwen, M., Ukkonen, A.: Discovering skylines of subgroup sets. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS, vol. 8190, pp. 272–287. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40994-3_18
Jordan, M., Pfarr, N.: Forget the quantified-self, we need to build the quantified-us (2014)
Bayer, J., Taillard, M.: Story-driven data analysis (2013)
Vreeken, J., Van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Mining Knowl. Discov. 23(1), 169–214 (2011)
Das, M., Amer-Yahia, S., Das, G., Yu, C.: Mri: meaningful interpretations of collaborative ratings. PVLDB 4(11), 1063–1074 (2011)
Fopa, L., Jouanot, F., Termier, A., Tchuente, M., Iegorov, O.: Benchmarking of triple stores scalability for MPSoC trace analysis. In: 2nd International workshop on Benchmarking RDF Systems (BeRSys 2014) (2014)
Omidvar-Tehrani, B., Amer-Yahia, S., Termier, A., Bertaux, A., Gaussier, É., Rousset, M.-C.: Towards a framework for semantic exploration of frequent patterns. In: IMMoA (2013)
Srikant, R., Agrawal, R.: Mining Generalized Association Rules. IBM Research Division, New York (1995)
Marinica, C., Guillet, F., Briand, H.: Post-processing of discovered association rules using ontologies. In: ICDMW. IEEE (2008)
Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Workshop on Frequent Itemset Mining Implementations (2004)
Omidvar-Tehrani, B., Amer-Yahia, S., Dutot, P.-F., Trystram, D.: Multi-objective group discovery on the social web. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS, vol. 9851, pp. 296–312. Springer, Cham (2016). doi:10.1007/978-3-319-46128-1_19
Grouplens. Movielens dataset: Grouplens research group. http://grouplens.org/datasets/movielens/
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining Knowl. Discov. 1(1) (1997)
Ziegler, C.-N.: Book-crossing dataset. http://www2.informatik.uni-freiburg.de/~cziegler/BX/
LastFM. Million song dataset. https://labrosa.ee.columbia.edu/millionsong/lastfm
DBLP. Bibliographic database for computer sciences. https://hpi.de/naumann/projects/repeatability/datasets/dblp-dataset.html
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications, vol. 27. ACM (1998)
Amiri, B., Hossain, L., Crowford, J.: A multiobjective hybrid evolutionary algorithm for clustering in social networks. In: Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation. ACM (2012)
Cruz, J.D., Bothorel, C., Poulet, F.: Entropy based community detection in augmented social networks. In: CASoN. IEEE (2011)
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD (1993)
Liu, Z., Heer, J.: The effects of interactive latency on exploratory visual analysis. IEEE TVCG 20(12) (2014)
IMDb. Internet movie database. http://www.imdb.com
Miller, G.: Human memory and the storage of information. IRE Trans. Inf. Theory 2(3), 129–137 (1956)
Kamat, N., Jayachandran, P., Tunga, K., Nandi, A.: Distributed and interactive cube exploration. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE). IEEE (2014)
Huh, S.-Y., Moon, K.-H., Lee, H.: A data abstraction approach for query relaxation. Inf. Softw. Technol. 42(6), 407–418 (2000)
Bertini, E., Santucci, G.: Quality metrics for 2D scatterplot graphics: automatically reducing visual clutter. In: Butz, A., Krüger, A., Olivier, P. (eds.) SG 2004. LNCS, vol. 3031, pp. 77–89. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24678-7_8
Kabadayi, S., Julien, C.: A local data abstraction and communication paradigm for pervasive computing. In: PerCom. IEEE (2007)
Sankar, K., Sobha, L.: An approach to text summarization. In: PCLIAWS3. Association for Computational Linguistics (2009)
Bonchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D.: ExAnte: anticipated data reduction in constrained pattern mining. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 59–70. Springer, Heidelberg (2003). doi:10.1007/978-3-540-39804-2_8
Xin, D., Shen, X., Mei, Q., Han, J.: Discovering interesting patterns through user’s interactive feedback. In: KDD (2006)
De Bie, T., Kontonasios, K.-N., Spyropoulou, E.: A framework for mining interesting pattern sets. SIGKDD Explor. 12, 92–100 (2011)
Nandi, A., Yu, C., Bohannon, P., Ramakrishnan, R.: Distributed cube materialization on holistic measures. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE), pp. 183–194. IEEE (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Omidvar-Tehrani, B., Amer-Yahia, S. (2017). Online Lattice-Based Abstraction of User Groups. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10438. Springer, Cham. https://doi.org/10.1007/978-3-319-64468-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-64468-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64467-7
Online ISBN: 978-3-319-64468-4
eBook Packages: Computer ScienceComputer Science (R0)