research-article

Probabilistic data structures for big data analytics: : A comprehensive review

Authors:

Amritpal Singh,

Albert Y. ZomayaAuthors Info & Claims

Volume 188, Issue C

https://doi.org/10.1016/j.knosys.2019.104987

Published: 05 January 2020 Publication History

Abstract

An exponential increase in the data generation resources is widely observed in last decade, because of evolution in technologies such as-cloud computing, IoT, social networking, etc. This enormous and unlimited growth of data has led to a paradigm shift in storage and retrieval patterns from traditional data structures to Probabilistic Data Structures (PDS). PDS are a group of data structures that are extremely useful for Big data and streaming applications in order to avoid high-latency analytical processes. These data structures use hash functions to compactly represent a set of items in stream-based computing while providing approximations with error bounds so that well-formed approximations get built into data collections directly. Compared to traditional data structures, PDS use much less memory and constant time in processing complex queries. This paper provides a detailed discussion of various issues which are normally encountered in massive data sets such as-storage, retrieval, query,etc. Further, role of PDS in solving these issues is also discussed where these data structures are used as temporary accumulators in query processing. Several variants of existing PDS along with their application areas have also been explored which give a holistic view of domains where these data structures can be applied for efficient storage and retrieval of massive data sets. Mathematical proofs of various parameters considered in the PDS have also been discussed in the paper. Moreover, the relative comparison of various PDS with respect to various parameters is also explored.

References

[1]

García S., Ramírez-Gallego S., Luengo J., Benítez J.M., Herrera F., Big data preprocessing: methods and prospects, Big Data Anal. 1 (1) (2016) 9.

[2]

Rutkowski L., Jaworski M., Duda P., Basic concepts of data stream mining, in: Stream Data Mining: Algorithms and Their Probabilistic Properties, Springer, 2020, pp. 13–33.

[3]

Srinivasan C., Rajesh B., Saikalyan P., Premsagar K., Yadav E.S., A review on the different types of internet of things (IoT), J. Adv. Res. Dyn. Control Syst. 11 (1) (2019) 154–158.

[4]

Singh M.P., Hoque M.A., Tarkoma S., Analysis of systems to process massive data stream, CoRR (2016).

[5]

Bi J., Zhang C., An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme, Knowl.-Based Syst. 158 (2018) 81–93.

Digital Library

[6]

Gan W., Lin J.C.-W., Chao H.-C., Fujita H., Yu P.S., Correlated utility-based pattern mining, Inform. Sci. 504 (2019) 470–486. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0020025519306139.

Digital Library

[7]

Gakhov A., Probabilistic Data Structures and Algorithms for Big Data Applications, BoD–Books on Demand, 2019.

[8]

Katsov I., Probabilistic data structures for web analytics and data mining, 2012, [Online]. Available: https://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/. (Accessed online: May 2016).

[9]

Bloom B.H., Space/time trade-offs in hash coding with allowable errors, Commun. ACM 13 (7) (1970) 422–426.

Digital Library

[10]

Bender M.A., Farach-Colton M., Johnson R., Kraner R., Kuszmaul B.C., Medjedovic D., Montes P., Shetty P., Spillane R.P., Zadok E., Don’t thrash: How to Cache your hash on flash, Proc. VLDB Endow. 5 (11) (2012) 1627–1637.

[11]

Tarkoma S., Rothenberg C.E., Lagerspetz E., Theory and practice of bloom filters for distributed systems, IEEE Commun. Surv. Tutor. 14 (1) (2012) 131–155.

[12]

Kirsch A., Mitzenmacher M., Distance-sensitive bloom filters, in: Proceedings of the Meeting on Algorithm Engineering & Expermiments, Vol. 6, SIAM, Philadelphia, PA, USA, 2006, pp. 41–50.

[13]

Bruck J., Gao J., Jiang A., Weighted bloom filter, in: IEEE International Symposium on Information Theory, IEEE, 2006.

[14]

Fan L., Cao P., Almeida J., Broder A.Z., Summary Cache: A scalable wide-area web Cache sharing protocol, IEEE/ACM Trans. Netw. 8 (3) (2000) 281–293.

Digital Library

[15]

Bonomi F., Mitzenmacher M., Panigrahy R., Singh S., Varghese G., An improved construction for counting bloom filters, in: Proceedings of the 14th Conference on Annual European Symposium, Vol. 14, ESA’06, Springer-Verlag, London, UK, 2006, pp. 684–695.

[16]

Guo D., Wu J., Chen H., Yuan Y., Luo X., The dynamic bloom filters, IEEE Trans. Knowl. Data Eng. 22 (1) (2010) 120–133.

[17]

Almeida P.S., Baquero C., Preguiça N., Hutchison D., Scalable bloom filters, Inf. Process. Lett. 101 (6) (2007) 255–261.

[18]

Deng F., Rafiei D., Approximately detecting duplicates for streaming data using stable bloom filters, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD’06, ACM, New York, USA, 2006, pp. 25–36.

[19]

Kirsch A., Mitzenmacher M., Less hashing, same performance: Building a better bloom filter, Random Struct. Algorithms 33 (2) (2008) 187–218.

[20]

Geravand S., Ahmadi M., Bloom filter applications in network security: A state-of-the-art survey, Comput. Netw. 57 (18) (2013) 4047–4064.

[21]

Choi K.W., Wiriaatmadja D.T., Hossain E., Discovering mobile applications in cellular device-to-device communications: Hash function and bloom filter-based approach, IEEE Trans. Mob. Comput. 15 (2) (2016) 336–349.

[22]

Verma K., Hasbullah H., Bloom-filter based IP-CHOCK detection scheme for denial of service attacks in VANET, Secur. Commun. Netw. 8 (5) (2015) 864–878.

[23]

Song W., Wang B., Wang Q., Peng Z., Lou W., Cui Y., A privacy-preserved full-text retrieval algorithm over encrypted data for cloud storage applications, J. Parallel Distrib. Comput. 99 (2017) 14–27.

[24]

Groza B., Murvay P.-S., Efficient intrusion detection with bloom filtering in controller area networks, IEEE Trans. Inf. Forensi. Secur. 14 (4) (2019) 1037–1051.

[25]

Cheng K., Hot spot tracking by time-decaying bloom filters and reservoir sampling, in: International Conference on Advanced Information Networking and Applications, Springer, 2019, pp. 1147–1156.

[26]

Najam M., Rasool R.U., Ahmad H.F., Ashraf U., Malik A.W., Pattern matching for DNA sequencing data using multiple bloom filters, Biomed. Res. Int. 2019 (2019).

[27]

Quora M., What are the best applications of Bloom filters?, 2014, [Online]. Available: https://www.quora.com/What-are-the-best-applications-of-Bloom-filters. (Accessed Online: Feb 2017).

[28]

Singh A., Garg S., Kaur K., Batra S., Kumar N., Choo K.-K.R., Fuzzy-folded bloom filter-as-a-service for big data storage on cloud, IEEE Trans. Ind. Inf. (2018).

[29]

Liu P., Wang H., Gao S., Yang T., Zou L., Uden L., Li X., ID bloom filter: Achieving faster multi-set membership query in network applications, in: 2018 IEEE International Conference on Communications, ICC, IEEE, 2018, pp. 1–6.

[30]

Lu J., Wan Y., Li Y., Zhang C., Dai H., Wang Y., Zhang G., Liu B., Ultra-fast bloom filters using SIMD techniques, IEEE Trans. Parallel Distrib. Syst. 30 (4) (2019) 953–964.

Digital Library

[31]

Patgiri R., Nayak S., Borgohain S.K., rdbf: A r-Dimensional Bloom Filter for massive scale membership query, J. Netw. Comput. Appl. (2019).

[32]

Sun Z., Gao S., Liu B., Wang Y., Yang T., Cui B., Magic cube bloom filter: Answering membership queries for multiple sets, in: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), IEEE, 2019, pp. 1–8.

[33]

Mitzenmacher M., Compressed bloom filters, IEEE/ACM Trans. Netw. 10 (5) (2002) 604–612.

Digital Library

[34]

Cohen S., Matias Y., Spectral bloom filters, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD’03, ACM, New York, USA, 2003, pp. 241–252.

[35]

Kumar A., Xu J.J., Li L., Wang J., Space-code bloom filter for efficient traffic flow measurement, in: Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement, IMC’03, ACM, New York, USA, 2003, pp. 167–172.

[36]

Goh E.-J., Secure Indexes, IACR Cryptology, 2003, pp. 2–16. ePrint Archive.

[37]

Shanmugasundaram K., Brönnimann H., Memon N., Payload attribution via hierarchical bloom filters, in: Proceedings of the 11th ACM Conference on Computer and Communications Security, CCS’04, ACM, New York, USA, 2004, pp. 31–41.

[38]

Chazelle B., Kilian J., Rubinfeld R., Tal A., The bloomier filter: An efficient data structure for static support lookup tables, in: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA’04, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2004, pp. 30–39.

[39]

Xiao M.-Z., Dai Y.-F., Li X.-M., Split bloom filter, Tien Tzu Hsueh Pao/Acta Electron. Sin. 32 (2004) 241–245.

[40]

F. Chang, W. chang Feng, K. Li, Approximate caches for packet classification, in: Twenty-Third AnnualJoint Conference of the IEEE Computer and Communications Societies, INFOCOM’04, Vol. 4, 2004, 2196–2207.

[41]

Y. Lu, B. Prabhakar, F. Bonomi, Bloom Filters: Design Innovations and Novel Applications, in: Proc. of the Forty-Third Annual Allerton Conference, 2005.

[42]

Donnet B., Baynat B., Friedman T., Retouched bloom filters: Allowing networked applications to trade off selected false positives against false negatives, in: Proceedings of the ACM CoNEXT Conference, CoNEXT’06, ACM, New York, USA, 2006, pp. 13:1–13:12.

[43]

Bruck J., Gao J., Jiang A.A., Adaptive Bloom Filter, California Institute of Technology, 2006, [Online]. Available: http://authors.library.caltech.edu/26103/1/etr072.pdf.

[44]

Zhong M., Lu P., Shen K., Seiferas J., Optimizing data popularity conscious bloom filters, in: Proceedings of the Twenty-Seventh ACM Symposium on Principles of Distributed Computing, PODC’08, ACM, New York, NY, USA, 2008, pp. 355–364.

[45]

M. Ahmadi, S. Wong, A Memory-Optimized Bloom Filter Using an Additional Hashing Function, in: IEEE Global Telecommunications Conference, GLOBECOM’08, 2008, pp. 1–5.

[46]

Goel A., Gupta P., Small subset queries and bloom filters using ternary associative memories, with applications, SIGMETRICS Perform. Eval. Rev. 38 (1) (2010) 143–154.

[47]

Rothenberg C.E., Macapuna C.A.B., Verdi F.L., Magalhaes M.F., The deletable bloom filter: a new member of the bloom family, IEEE Commun. Lett. 14 (6) (2010) 557–559.

[48]

Laufer R.P., Velloso P.B., Duarte O.C.M.B., A generalized bloom filter to secure distributed network applications, Comput. Netw. 55 (8) (2011) 1804–1819.

[49]

Dautrich J.L. Jr., Ravishankar C.V., Inferential time-decaying bloom filters, in: Proceedings of the 16th International Conference on Extending Database Technology, EDBT’13, ACM, New York, USA, 2013, pp. 239–250.

[50]

F. Concas, P. Xu, M.A. Hoque, J. Lu, S. Tarkoma, Multiple set matching and pre-filtering with bloom multifilters.

[51]

Mitzenmacher M., A model for learned bloom filters and related structures, 2018, arXiv preprint arXiv:1802.00884.

[52]

Singh A., Batra S., Streamed data analysis using adaptable bloom filter, Comput. Inform. 37 (3) (2018) 693–716.

[53]

Hua Y., Xiao B., Veeravalli B., Feng D., Locality-sensitive bloom filter for approximate membership query, IEEE Trans. Comput. 61 (6) (2012) 817–830.

[54]

Negi S., Dubey A., Bagchi A., Yadav M., Yadav N., Raj J., Dynamic partition bloom filters: A bounded false positive solution for dynamic set membership, 2019, arXiv preprint arXiv:1901.06493.

[55]

Mousavi N., Tripunitara M., Constructing cascade bloom filters for efficient access enforcement, Comput. Secur. 81 (2019) 1–14.

[56]

Knuth D.E., The Art of Computer Programming: Sorting and Searching, Addison-Wesley, 1998.

[57]

Al-hisnawi M., Ahmadi M., Deep packet inspection using quotient filter, IEEE Commun. Lett. 20 (11) (2016) 2217–2220.

[58]

Dutta S., Narang A., Bera S.K., Streaming quotient filter: A near optimal approximate duplicate detection approach for data streams, Proc. VLDB Endow. 6 (8) (2013) 589–600.

[59]

Goudarzi P., Malazi H.T., Ahmadi M., Khorramshahr: A scalable peer to peer architecture for port warehouse management system, J. Netw. Comput. Appl. 76 (2016) 49–59.

[60]

Garg S., Singh A., Kaur K., Aujla G.S., Batra S., Kumar N., Obaidat M.S., Edge computing-based security framework for big data analytics in VANETs, IEEE Netw. 33 (2) (2019) 72–81.

[61]

Garg S., Singh A., Kaur K., Batra S., Kumar N., Obaidat M.S., Edge-based content delivery for providing qoe in wireless networks using quotient filter, in: 2018 IEEE International Conference on Communications, ICC, IEEE, 2018, pp. 1–6.

[62]

Shubbar R., Ahmadi M., Efficient name matching based on a fast two-dimensional filter in named data networking, Int. J. Parallel Emergent Distrib. Syst. 34 (2) (2019) 203–221.

[63]

Boyer R.S., Moore J.S., MJRTY—A fast majority vote algorithm, in: Boyer R.S. (Ed.), Automated Reasoning: Essays in Honor of Woody Bledsoe, Springer Netherlands, Dordrecht, 1991, pp. 105–117,.

[64]

Manku G.S., Motwani R., Approximate frequency counts over data streams, in: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB Endowment, 2002, pp. 346–357.

[65]

Metwally A., Agrawal D., El Abbadi A., Efficient computation of frequent and top-k elements in data streams, in: International Conference on Database Theory, ICDT’05, Springer-Verlag, Berlin, Heidelberg, 2005, pp. 398–412,.

Digital Library

[66]

Cormode G., Muthukrishnan S., An improved data stream summary: the count-min sketch and its applications, J. Algorithms 55 (1) (2005) 58–75.

Digital Library

[67]

Charikar M., Chen K., Farach-Colton M., Finding frequent items in data streams, Autom. Lang. Programming (2002) 784.

[68]

Moore J.S., A fast majority vote algorithm, in: Automated Reasoning: Essays in Honor of Woody Bledsoe, 1981, [Online]. Available: ftp://www.cs.utexas.edu/pub/boyer/ics-reports/cmp32.pdf.

[69]

Matusevych S., Smola A.J., Ahmed A., Hokusai-sketching streams in real time, in: Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, UAI’12, AUAI Press, Arlington, Virginia, United States, 2012, pp. 594–603.

[70]

Morris R., Counting large numbers of events in small registers, Commun. ACM 21 (10) (1978) 840–842.

Digital Library

[71]

Wegman M., Sample counting, 1984, private Communication.

[72]

Flajolet P., Martin G.N., Probabilistic counting algorithms for data base applications, J. Comput. System Sci. 31 (2) (1985) 182–209.

Digital Library

[73]

Durand M., Flajolet P., Loglog counting of large cardinalities, In ESA, 2003, pp. 605–617.

[74]

Fusy E., Giroire F., Estimating the number of active flows in a data stream over a sliding window, in: Proceedings of the Meeting on Analytic Algorithmics and Combinatorics, ANALCO’07, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2007, pp. 223–231.

[75]

P. Flajolet, E. Fusy, O. Gandouet, Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm, in: Proceedings of the International Conference on Analysis of Algorithms, AOFA’07, 2007 [Online]. Available: http://cscubs.cs.uni-bonn.de/2016/proceedings/paper-03.pdf.

[76]

Heule S., Nunkesser M., Hall A., Hyperloglog in practice: Algorithmic engineering of a state of the art Cardinality estimation algorithm, in: Proceedings of the 16th International Conference on Extending Database Technology, EDBT ’13, ACM, New York, NY, USA, 2013, pp. 683–692,.

Digital Library

[77]

Wu S., Count min sketch and its applications, 2014, http://grigory.us/files/cm-sketch.pdf. (Accessed Online: April 2016).

[78]

Mewtoo S., Count min sketch, 2010, https://sites.google.com/site/countminsketch/. (Accessed on: December 2016).

[79]

P.P. Talukdar, W.W. Cohen, Scaling graph-based semi supervised learning to large number of labels using count-min sketch, in: AISTATS, 2014, pp. 940–947, https://arxiv.org/abs/1310.2959.

[80]

Hoang X.D., Pham H.K., A review on hot-IP finding methods and its application in early DDoS target detection, Future Internet 8 (4) (2016) 52.

[81]

Pitel G., Fouquier G., Marchand E., Mouhamadsultane A., Count-min tree sketch: Approximate counting for NLP, in: 2nd International Symposium on Web Algorithms, Vol. 1, ISWAG’2016, Deauville, France, 2016.

[82]

Bonelli N., Callegari C., Procissi G., A probabilistic counting framework for distributed measurements, IEEE Access 7 (2019) 22644–22659.

[83]

Zhu X., Wu G., Zhang H., Wang S., Ma B., Dynamic count-min sketch for analytical queries over continuous data streams, in: 2018 IEEE 25th International Conference on High Performance Computing, HiPC, 2018, pp. 225–234.

[84]

Karnezos T., HLL talk at SFPUG, 2014, https://research.neustar.biz/2014/09/23/hll-talk-at-sfpug/. (Accessed online: Jan 2017).

[85]

Wu W., Naughton J.F., Singh H., Sampling-based query re-optimization, in: Proceedings of the International Conference on Management of Data, SIGMOD’16, ACM, New York, NY, USA, 2016, pp. 1721–1736.

[86]

Georganas E., Buluç A., Chapman J., Oliker L., Rokhsar D., Yelick K., Parallel de bruijn graph construction and traversal for de novo genome assembly, in: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE Press, 2014, pp. 437–448.

[87]

Drakopoulos G., Kontopoulos S., Makris C., Eventually consistent cardinality estimation with applications in biodata mining, in: Proceedings of the 31st Annual ACM Symposium on Applied Computing, ACM, 2016, pp. 941–944.

[88]

Zhao Y., Guo S., Yang Y., Hermes: An optimization of hyperloglog counting in real-time data processing, in: International Joint Conference on Neural Networks, IJCNN, IEEE, 2016, pp. 1890–1895.

[89]

Dietzel S., Peter A., Kargl F., Secure cluster-based in-network information aggregation for vehicular networks, in: IEEE 81st Vehicular Technology Conference, VTC Spring, IEEE, 2015, pp. 1–5.

[90]

Cormode G., Streaming methods in data analysis, in: Maneth S. (Ed.), Data Science: 30th British International Conference on Databases, BICOD 2015, Edinburgh, UK, July 6–8, 2015, Proceedings, Springer International Publishing, Cham, 2015, pp. 3–6.

[91]

Zhou Z., Hajek B., Per-flow Cardinality estimation based on virtual loglog sketching, in: 2019 53rd Annual Conference on Information Sciences and Systems (CISS), IEEE, 2019, pp. 1–6.

[92]

Baker D.N., Langmead B., Dashing: Fast and Accurate Genomic Distances with HyperLogLog, BioRxiv, Cold Spring Harbor Laboratory, 2018.

[93]

Rajaraman A., Ullman J.D., Mining of Massive Datasets, Cambridge University Press, New York, NY, USA, 2011, [Online]. Available: http://infolab.stanford.edu/ ullman/mmds/book.pdf.

[94]

Al-Fuqaha A., Similarity Analysis and Distance Min-Hashing Locality Sensitive Hashing, Western Michigan University, 2014, https://cs.wmich.edu/ alfuqaha/summer14/cs6530/lectures/SimilarityAnalysis.pdf. (Accessed online: March 2017).

[95]

Shakhnarovich G., Indyk P., Darrell T., Locality sensitive hashing, 2007, https://en.wikipedia.org/wiki/Locality-sensitive-hashing/. (Accessed online: December 2016).

[96]

Broder A., On the resemblance and containment of documents, in: Proceedings of the Compression and Complexity of Sequences, SEQUENCES ’97, IEEE Computer Society, Washington, DC, USA, 1997, pp. 21–29.

[97]

Datar M., Muthukrishnan S., Estimating rarity and similarity over data stream windows, in: Möhring R., Raman R. (Eds.), Algorithms — ESA 2002: 10th Annual European Symposium Rome, Springer, Berlin, Heidelberg, Berlin, Heidelberg, 2002, pp. 323–335,.

[98]

Chum O., Perd’och M., Matas J., Geometric min-hashing: Finding a thick needle in a haystack, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR’09, IEEE, 2009, pp. 17–24.

[99]

Ioffe S., Improved consistent sampling, weighted minhash and l1 sketching, in: 10th International Conference on Data Mining, ICDM, IEEE, 2010, pp. 246–255,.

Digital Library

[100]

A.Z. Broder, C.G. Nelson, Method for determining the resemining the resemblance of documents, US Patent 6,230,155, Google Patents, 2001.

[101]

Ondov B.D., Treangen T.J., Melsted P., Mallonee A.B., Bergman N.H., Koren S., Phillippy A.M., Mash: fast genome and metagenome distance estimation using minhash, Genome Biol. 17 (1) (2016) 132.

[102]

Thaiyalnayaki S., Sasikala J., Indexing near-duplicate images in web search using minhash algorithm, in: International Conference on Processing of Materials, Minerals and Energy, PMME, Elsevier, 2016, pp. 1–7.

[103]

Lee S.-J., Min J.-K., An efficient large graph clustering technique based on min-hash, J. Korean Inst. Inf. Sci. Eng. 43 (3) (2016) 380–388.

[104]

Drew J., Moore T., Hahsler M., Polymorphic malware detection using sequence classification methods, in: Security and Privacy Workshops, SPW, IEEE, 2016, pp. 81–87.

[105]

Lee S.-H., Song M.-U., Jung J.-K., Chung T.-M., A study of malicious code classification system using minhash in network quarantine using SDN, in: International Conference on Computer Science and Its Applications, Springer, 2016, pp. 594–599.

[106]

Rao B., Zhu E., Searching web data using minhash LSH, in: Proceedings of the International Conference on Management of Data, SIGMOD’16, ACM, New York, NY, USA, 2016, pp. 2257–2258.

[107]

Gionis A., Indyk P., Motwani R.e., Similarity search in high dimensions via hashing, in: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB’99, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1999, pp. 518–529.

[108]

Wang J., Shen H.T., Song J., Ji J., Hashing for similarity search: A survey, CoRR abs/1408.2927 (2014).

[109]

Chierichetti F., Kumar R., Lsh-preserving functions and their applications, J. ACM 62 (5) (2015) 33.

[110]

Becker A., Ducas L., Gama N., Laarhoven T., New directions in nearest neighbor searching with applications to lattice sieving, in: Proceedings of the Twenty-Seventh Annual ACM–SIAM Symposium on Discrete Algorithms, SIAM, 2016, pp. 10–24.

[111]

Kang Z., Ooi W.T., Sun Q., Hierarchical, non-uniform locality sensitive hashing and its application to video identification, in: IEEE International Conference on Multimedia and Expo, Vol. 1, ICME’04, IEEE, 2004, pp. 743–746.

[112]

C. Soh, H.B.K. Tan, Y.L. Arnatovich, L. Wang, Detecting Clones in Android Applications through Analyzing User Interfaces, in: 2015 IEEE 23rd International Conference on Program Comprehension, 2015, pp. 163–173.

[113]

Berlin K., Koren S., Chin C.-S., Drake J.P., Landolin J.M., Phillippy A.M., Assembling large genomes with single-molecule sequencing and locality-sensitive hashing, Nature Biotechnol. 33 (6) (2015) 623–630.

[114]

Naderi H., Vinod P., Conti M., Parsa S., Alaeiyan M.H., Malware signature generation using locality sensitive hashing, in: International Conference on Security & Privacy, Springer, 2019, pp. 115–124.

[115]

Li Y., Hu L., Xia K., Luo J., Fast distributed video deduplication via locality-sensitive hashing with similarity ranking, EURASIP J. Image Video Process. 2019 (1) (2019) 51.

[116]

Indyk P., Motwani R., Approximate nearest neighbors: Towards removing the curse of dimensionality, in: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC’98, ACM, New York, USA, 1998, pp. 604–613.

[117]

Datar M., Immorlica N., Indyk P., Mirrokni V.S., Locality-sensitive hashing scheme based on P-stable distributions, in: Proceedings of the Twentieth Annual Symposium on Computational Geometry, SCG’4, ACM, 2004, pp. 253–262.

[118]

Charikar M.S., Similarity estimation techniques from rounding algorithms, in: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC’02, ACM, New York, USA, 2002, pp. 380–388.

Cited By

Kuszmaul WWalzer SMohar BShinkar IO'Donnell R(2024)Space Lower Bounds for Dynamic Filters and Value-Dynamic RetrievalProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649649(1153-1164)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3618260.3649649
Kumar MSingh A(2024)Anomalous vehicle recognition in smart cities using persistent bloom filterTransactions on Emerging Telecommunications Technologies10.1002/ett.489635:1Online publication date: 15-Jan-2024
https://dl.acm.org/doi/10.1002/ett.4896
Muhammad KObaidat MHussain TSer JKumar NTanveer MDoctor F(2021)Fuzzy Logic in Surveillance Big Video Data AnalysisACM Computing Surveys10.1145/344469354:3(1-33)Online publication date: 21-May-2021
https://dl.acm.org/doi/10.1145/3444693

Index Terms

Probabilistic data structures for big data analytics: A comprehensive review
1. Information systems
  1. Data management systems
  2. Information storage systems
    1. Record storage systems
2. Theory of computation
  1. Design and analysis of algorithms
  2. Theory and algorithms for application domains
    1. Database theory
      1. Data structures and algorithms for data management

Index terms have been assigned to the content through auto-classification.

Recommendations

Big Data Analytics
Responsible Big Data Analytics for E-Business Services
ICBDR '21: Proceedings of the 5th International Conference on Big Data Research

This paper examines responsible big data analytics for e-business services and looks at how to use responsible big data analytics to obtain responsible e-business services. It addresses why responsibility matters to big data analytics and e-business ...
Big Data Analytics with R and Hadoop

Comments

Information & Contributors

Information

Published In

cover image Knowledge-Based Systems

Knowledge-Based Systems Volume 188, Issue C

Jan 2020

541 pages

ISSN:0950-7051

Issue’s Table of Contents

Elsevier B.V.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 05 January 2020

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kuszmaul WWalzer SMohar BShinkar IO'Donnell R(2024)Space Lower Bounds for Dynamic Filters and Value-Dynamic RetrievalProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649649(1153-1164)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3618260.3649649
Kumar MSingh A(2024)Anomalous vehicle recognition in smart cities using persistent bloom filterTransactions on Emerging Telecommunications Technologies10.1002/ett.489635:1Online publication date: 15-Jan-2024
https://dl.acm.org/doi/10.1002/ett.4896
Muhammad KObaidat MHussain TSer JKumar NTanveer MDoctor F(2021)Fuzzy Logic in Surveillance Big Video Data AnalysisACM Computing Surveys10.1145/344469354:3(1-33)Online publication date: 21-May-2021
https://dl.acm.org/doi/10.1145/3444693

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents