Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3077286.3077289acmconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
research-article

A Comparison of Fuzzy Hashes: Evaluation, Guidelines, and Future Suggestions

Published: 13 April 2017 Publication History

Abstract

Fuzzy hashes are gaining recognition in the realm of digital forensics as an alternative method of recognizing similar or nearly identical files. However, despite a high diversity of algorithms in existence, only SSDEEP and sdhash are widely used. This paper describes several viable fuzzy hashes not in wide use and compares them in detail, examining both security and non-security flaws. Finally, we develop guidelines to put these hashes to practical use, as well as suggesting improvements to several of the hashes.

References

[1]
Harald Baier and Frank Breitinger. 2011. Security aspects of piecewise hashing in computer forensics. In IT Security Incident Management and IT Forensics (IMF), 2011 Sixth International Conference on. IEEE, 21--36.
[2]
Frank Breitinger, Knut Petter Astebøl, Harald Baier, and Christoph Busch. 2013. mvHash-B-a new approach for similarity preserving hashing. In IT Security Incident Management and IT Forensics (IMF), 2013 Seventh International Conference on. IEEE, 33--44.
[3]
Frank Breitinger and Harald Baier. 2012. A fuzzy hashing approach based on random sequences and hamming distance. In Proceedings of the Conference on Digital Forensics, Security and Law. Association of Digital Forensics, Security and Law, 89.
[4]
Frank Breitinger and Harald Baier. 2012. Similarity preserving hashing: Eligible properties and a new algorithm mrsh-v2. In International Conference on Digital Forensics and Cyber Crime. Springer, 167--182.
[5]
Frank Breitinger, Harald Baier, and Jesse Beckingham. 2012. Security and implementation analysis of the similarity digest sdhash. In First International Baltic Conference on Network Security & Forensics (NeSeFo).
[6]
Frank Breitinger, Georgios Stivaktakis, and Harald Baier. 2013. FRASH: A framework to test algorithms of similarity hashing. Digital Investigation 10 (2013), S50--S58.
[7]
Long Chen and Guoyin Wang. 2008. An efficient piecewise hashing method for computer forensics. In Knowledge Discovery and Data Mining, 2008. WKDD 2008. First International Workshop on. IEEE, 635--638.
[8]
Ernesto Damiani, Sabrina De Capitani di Vimercati, Stefano Paraboschi, and Pierangela Samarati. 2004. An Open Digest-based Technique for Spam Detection. ISCA PDCS 2004 (2004), 559--564.
[9]
Ken Dunham. 2013. A fuzzy future in malware research. The ISSA Journal 11, 8 (2013), 17--18.
[10]
Paul Jaccard. 1901. Distribution de la Flore Alpine: dans le Bassin des dranses et dans quelques régions voisines. Rouge.
[11]
Jesse Kornblum. 2006. Identifying almost identical files using context triggered piecewise hashing. Digital investigation 3 (2006), 91--97.
[12]
Udi Manber and others. 1994. Finding Similar Files in a Large File System. In Usenix Winter, Vol. 94. 1--10.
[13]
Jonathan Oliver, Chun Cheng, and Yanggui Chen. 2013. TLSH--A Locality Sensitive Hash. In Cybercrime and Trustworthy Computing Workshop (CTC), 2013 Fourth. IEEE, 7--13.
[14]
Vassil Roussev. 2009. Hashing and data fingerprinting in digital forensics. Computing in Science and Engineering 7, 2 (2009), 49--55.
[15]
Vassil Roussev. 2010. Data fingerprinting with similarity digests. In IFIP International Conference on Digital Forensics. Springer, 207--226.
[16]
Vassil Roussev. 2011. An evaluation of forensic similarity hashes. digital investigation 8 (2011), S34--S41.
[17]
Vassil Roussev, Golden G Richard, and Lodovico Marziale. 2007. Multi-resolution similarity hashing. digital investigation 4 (2007), 105--113.

Cited By

View all
  • (2024)Sparrow: Flexible Memory Deduplication in Android Systems with Similar-Page Awareness2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546588(1-6)Online publication date: 25-Mar-2024
  • (2023)De-identified Bayesian personal identity matching for privacy-preserving record linkage despite errors: development and validationBMC Medical Informatics and Decision Making10.1186/s12911-023-02176-623:1Online publication date: 5-May-2023
  • (2023)Asteria-Pro: Enhancing Deep Learning-based Binary Code Similarity Detection by Incorporating Domain KnowledgeACM Transactions on Software Engineering and Methodology10.1145/360461133:1(1-40)Online publication date: 24-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ACMSE '17: Proceedings of the 2017 ACM Southeast Conference
April 2017
275 pages
ISBN:9781450350242
DOI:10.1145/3077286
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 April 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Digital Forensics
  2. Fuzzy Hashing
  3. Survey

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ACM SE '17
Sponsor:
ACM SE '17: SouthEast Conference
April 13 - 15, 2017
GA, Kennesaw, USA

Acceptance Rates

ACMSE '17 Paper Acceptance Rate 21 of 34 submissions, 62%;
Overall Acceptance Rate 502 of 1,023 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)3
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Sparrow: Flexible Memory Deduplication in Android Systems with Similar-Page Awareness2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546588(1-6)Online publication date: 25-Mar-2024
  • (2023)De-identified Bayesian personal identity matching for privacy-preserving record linkage despite errors: development and validationBMC Medical Informatics and Decision Making10.1186/s12911-023-02176-623:1Online publication date: 5-May-2023
  • (2023)Asteria-Pro: Enhancing Deep Learning-based Binary Code Similarity Detection by Incorporating Domain KnowledgeACM Transactions on Software Engineering and Methodology10.1145/360461133:1(1-40)Online publication date: 24-Nov-2023
  • (2023)Third-Party Library Dependency for Large-Scale SCA in the C/C++ Ecosystem: How Far Are We?Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598143(1383-1395)Online publication date: 12-Jul-2023
  • (2023)Poster Abstract: SmartAppZoo: a Repository of SmartThings Apps for IoT BenchmarkingProceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation10.1145/3576842.3589162(448-449)Online publication date: 9-May-2023
  • (2023)An Empirical Study of Smart Contract Decompilers2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00011(1-12)Online publication date: Mar-2023
  • (2023)Improving Malicious PDF Detection with a Robust Stacking Ensemble Approach2023 20th Annual International Conference on Privacy, Security and Trust (PST)10.1109/PST58708.2023.10320164(1-5)Online publication date: 21-Aug-2023
  • (2022)Security Hardened and Privacy Preserved Android Malware Detection Using Fuzzy Hash of Reverse Engineered Source CodeSecurity and Communication Networks10.1155/2022/79722302022(1-11)Online publication date: 12-Sep-2022
  • (2022)Find My IoT Device – An Efficient and Effective Approximate Matching Algorithm to Identify IoT Traffic FlowsDigital Forensics and Cyber Crime10.1007/978-3-031-06365-7_5(72-92)Online publication date: 4-Jun-2022
  • (2022)Quantifying Paging on Recoverable Data from Windows User-Space ModulesDigital Forensics and Cyber Crime10.1007/978-3-031-06365-7_1(1-19)Online publication date: 4-Jun-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media