Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1315245.1315286acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
Article

Polyglot: automatic extraction of protocol message format using dynamic binary analysis

Published: 28 October 2007 Publication History

Abstract

Protocol reverse engineering, the process of extracting the application-level protocol used by an implementation, without access to the protocol specification, is important for many network security applications. Recent work [17] has proposed protocol reverse engineering by using clustering on network traces. That kind of approach is limited by the lack of semantic information on network traces. In this paper we propose a new approach using program binaries. Our approach, shadowing, uses dynamic analysis and is based on a unique intuition - the way that an implementation of the protocol processes the received application data reveals a wealth of information about the protocol message format. We have implemented our approach in a system called Polyglot and evaluated it extensively using real-world implementations of five different protocols: DNS, HTTP, IRC, Samba and ICQ. We compare our results with the manually crafted message format, included in Wireshark, one of the state-of-the-art protocol analyzers. The differences we find are small and usually due to different implementations handling fields in different ways. Finding such differences between implementations is an added benefit, as they are important for problems such as fingerprint generation, fuzzing, and error detection.

References

[1]
How Samba Was Written. http://samba.org/ftp/tridge/misc/french cafe.txt.
[2]
Icqlib: The ICQ Library. http://kicq.sourceforge.net/icqlib.shtml.
[3]
Libyahoo2: A C Library for Yahoo! Messenger. http://libyahoo2.sourceforge.net.
[4]
MSN Messenger Protocol. http://www.hypothetic.org/docs/msn/index.php.
[5]
Qemu: Open Source Processor Emulator. http://fabrice.bellard.free.fr/qemu/.
[6]
Tcpdump. http://www.tcpdump.org/.
[7]
The UnOfficial AIM/OSCAR Protocol Specification. http://www.oilcan.org/oscar/.
[8]
Wireshark, Network Protocol Analyzer. http://www.wireshark.org.
[9]
M. A. Beddoe. Network Protocol Analysis Using Bioinformatics Algorithms. http://www.baselineresearch.net/PI/.
[10]
N. Borisov, D. J. Brumley, H. J. Wang, and C. Guo. Generic Application-Level Protocol Analyzer and Its Language. Network and Distributed System Security Symposium, San Diego, CA, February 2007.
[11]
D. Brumley, J. Caballero, Z. Liang, J. Newsome, and D. Song. Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation. USENIX Security Symposium, Boston, MA, August 2007.
[12]
J. Caballero, S. Venkataraman, P. Poosankam, M. G. Kang, D. Song, and A. Blum. FiG: Automatic Fingerprint Generation. Network and Distributed System Security Symposium, San Diego, CA, February 2007.
[13]
J. Chow, B. Pfaff, T. Garfinkel, K. Christopher, and M. Rosenblum. Understanding Data Lifetime Via Whole System Simulation. USENIX Security Symposium, San Diego, CA, August 2004.
[14]
M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L. Zhou, L. Zhang, and P. Barham. Vigilante: End-to-End Containment of Internet Worms. Symposium on Operating Systems Principles, Brighton, United Kingdom, October 2005.
[15]
J. R. Crandall, S. F. Wu, and F. T. Chong. Minos: Architectural Support for Protecting Control Data. ACM Transactions on Architecture and Code Optimization, December 2006.
[16]
D. Crocker and P. Overell. Augmented BNF for Syntax Specifications: ABNF. RFC 4234 (Draft Standard), 4234, October 2005.
[17]
W. Cui, J. Kannan, and H. J. Wang. Discoverer: Automatic Protocol Description Generation from Network Traces. USENIX Security Symposium, Boston, MA, August 2007.
[18]
W. Cui, V. Paxson, N. C. Weaver, and R. H. Katz. Protocol-Independent Adaptive Replay of Application Dialog. Network and Distributed System Security Symposium, San Diego, CA, February 2006.
[19]
H. Dreger, A. Feldmann, M. Mai, V. Paxson, and R.Sommer. Dynamic Application-Layer Protocol Analysis for Network Intrusion Detection. USENIX Security Symposium, Vancouver, Canada, July 2006.
[20]
C. D. Grosso, G. Antoniol, M. D. Penta, P. Galinier, and E. Merlo. Improving Network Applications Security: A New Heuristic to Generate Stress Testing Data. Genetic and Evolutionary Computation Conference, June 2005.
[21]
P. Haffner, S. Sen, O. Spatscheck, and D. Wang. ACAS: Automated Construction of Application Signatures. ACM SIGCOMM, Workshop on Mining network data, Philadelphia, PA, October 2005.
[22]
J. Kannan, J. Jung, V. Paxson, and C. E. Koksal. Semi-Automated Discovery of Application Session Structure. Internet Measurement Conference, Rio de Janeiro, Brazil, October 2006.
[23]
C. Leita, K. Mermoud, and M. Dacier. ScriptGen: An Automated Script Generation Tool for Honeyd. Annual Computer Security Applications Conference, Tucson, AZ, December 2005.
[24]
J. Lim, T. Reps, and B. Liblit. Extracting Output Formats from Executables. Working Conference on Reverse Engineering, Benevento, Italy, October 2006.
[25]
J. Ma, K. Levchenko, C. Kreibich, S. Savage, and G. M. Voelker. Unexpected Means of Protocol Inference. Internet Measurement Conference, Rio de Janeiro, Brazil, October 2006.
[26]
P. McMinn, M. Harman, D. Binkley, and P. Tonella. The Species Per Path Approach to SearchBased Test Data Generation. International Symposium on Software Testing and Analysis, July 2006.
[27]
P. V. Mockapetris. Domain Names - Implementation and Specification. RFC 1035 (Standard), IETF Request for Comments 1035, November 1987.
[28]
J. Newsome and D. Song. Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software. Network and Distributed System Security Symposium, San Diego, CA, February 2005.
[29]
J. Newsome, D. Brumley, and D. Song. Vulnerability-Specific Execution Filtering for Exploit Prevention on Commodity Software. Network and Distributed System Security Symposium, San Diego, CA, February 2006.
[30]
J. Newsome, D. Brumley, J. Franklin, and D. Song. Replayer: Automatic Protocol Replay By Binary Analysis. ACM Conference on Computer and Communications Security, Alexandria, VA, October 2006.
[31]
P. Oehlert. Violating Assumptions with Fuzzing. IEEE Security and Privacy, 3(2), March 2005.
[32]
R. Pang, M. Allman, M. Bennett, J. Lee, V. Paxson, and B. Tierney. A First Look At Modern Enterprise Traffic. Internet Measurement Conference, Berkeley, CA, October 2005.
[33]
R. Pang, V. Paxson, R. Sommer, and L. Peterson. Binpac: A Yacc for Writing Application Protocol Parsers. Internet Measurement Conference, Rio de Janeiro, Brazil, October 2006.
[34]
G. Portokalidis, A. Slowinska, and H. Bos. Argos: An Emulator for Fingerprinting Zero-Day Attacks for Advertised Honeypots with Automatic Signature Generation. ACM SIGOPS Operating Systems Review, 40(4), October 2006.
[35]
G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas. Secure Program Execution Via Dynamic Information Flow Tracking. International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, MA, October 2004.
[36]
P. Vogt, F. Nentwich, N. Jovanovic, E. Kirda, C. Kruegel, and G. Vigna. Cross-Site Scripting Prevention with Dynamic Data Tainting and Static Analysis. Network and Distributed System Security Symposium, San Diego, CA, February 2007.
[37]
H. Yin, D. Song, E. Manuel, C. Kruegel, and E. Kirda. Panorama: Capturing System-Wide Information Flow for Malware Detection and Analysis. ACM Conference on Computer and Communications Security, Alexandria, VA, October 2007.

Cited By

View all
  • (2024)A Novel Network Protocol Syntax Extracting Method for Grammar-Based FuzzingApplied Sciences10.3390/app1406240914:6(2409)Online publication date: 13-Mar-2024
  • (2024)Pyramis: Domain Specific Language for Developing Multi-tier SystemsProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663431(156-162)Online publication date: 3-Aug-2024
  • (2024)Reverse Engineering Industrial Protocols Driven By Control FieldsIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621405(2408-2417)Online publication date: 20-May-2024
  • Show More Cited By

Index Terms

  1. Polyglot: automatic extraction of protocol message format using dynamic binary analysis

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CCS '07: Proceedings of the 14th ACM conference on Computer and communications security
      October 2007
      628 pages
      ISBN:9781595937032
      DOI:10.1145/1315245
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 October 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. binary analysis
      2. protocol reverse engineering

      Qualifiers

      • Article

      Conference

      CCS07
      Sponsor:
      CCS07: 14th ACM Conference on Computer and Communications Security 2007
      November 2 - October 31, 2007
      Virginia, Alexandria, USA

      Acceptance Rates

      CCS '07 Paper Acceptance Rate 55 of 302 submissions, 18%;
      Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

      Upcoming Conference

      CCS '24
      ACM SIGSAC Conference on Computer and Communications Security
      October 14 - 18, 2024
      Salt Lake City , UT , USA

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)84
      • Downloads (Last 6 weeks)12
      Reflects downloads up to 02 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Novel Network Protocol Syntax Extracting Method for Grammar-Based FuzzingApplied Sciences10.3390/app1406240914:6(2409)Online publication date: 13-Mar-2024
      • (2024)Pyramis: Domain Specific Language for Developing Multi-tier SystemsProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663431(156-162)Online publication date: 3-Aug-2024
      • (2024)Reverse Engineering Industrial Protocols Driven By Control FieldsIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621405(2408-2417)Online publication date: 20-May-2024
      • (2024)APT Attack and Detection Technology2024 IEEE 6th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC)10.1109/IMCEC59810.2024.10575432(795-801)Online publication date: 24-May-2024
      • (2024)STI: A self-evolutive traffic identification system for unknown applications based on improved random forestComputer Communications10.1016/j.comcom.2024.02.010219(64-75)Online publication date: Apr-2024
      • (2024)PRETT2: Discovering HTTP/2 DoS Vulnerabilities via Protocol Reverse EngineeringComputer Security – ESORICS 202410.1007/978-3-031-70890-9_1(3-23)Online publication date: 6-Sep-2024
      • (2023)Unsupervised Detection and Clustering of Malicious TLS FlowsSecurity and Communication Networks10.1155/2023/36766922023(1-17)Online publication date: 12-Jan-2023
      • (2023)SePanner: Analyzing Semantics of Controller Variables in Industrial Control Systems based on Network TrafficProceedings of the 39th Annual Computer Security Applications Conference10.1145/3627106.3627179(310-323)Online publication date: 4-Dec-2023
      • (2023)Raft: Hardware-assisted Dynamic Information Flow Tracking for Runtime Protection on RISC-VProceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3607199.3607246(595-608)Online publication date: 16-Oct-2023
      • (2023)NestFuzz: Enhancing Fuzzing with Comprehensive Understanding of Input Processing LogicProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623103(1272-1286)Online publication date: 15-Nov-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media