Log Parsing 101
Log Parsing 101
Log Parsing 101
Introduction
Pre-Requisites
o NetWitness Stack
o Useful Development Tools
Log Parser Development for New Event Source
o Research
o Log Integration with NetWitness
Log Collection Methods:
Syslog
ODBC
File Reader
SNMP Traps
Windows
o Parser Development
o Deliverables
Log Parser Best Practices
o How to define a Header and Message ID?
Header
ID1 & ID2
Content
Message ID:
Best Practices:
o Message Element
Sample Message Definition
id2 (NW Key: <vid>)
id1 (NW Key: <msg.id>)
Event Categories
Content
Fine Parsing:
Generic Parsing:
o Functions
o Formatting Best Practices
Choosing Meta Keys
o Meta Keys from Table Map/ Master Variable Guide
o Meta Keys not present in Table-map
Unit Testing Log parser
o NW Log player
o REST API
o ESI Tool
o Syntax Checker
o NW Console
Introduction
At very high level, NetWitness for Logs can consume 'Log-based events' information from
hundreds of Event Sources and transform it into actionable information which can be used for
monitoring, reporting, and compliance purposes for the end users. They can use this information
to keep track of everything happening in their Enterprise Network conveniently in one location
on a real-time basis.
Security devices like Firewalls, Antivirus, E-mail gateways and scanners, Data loss
prevention (DLP) systems, Intrusion detection & prevention systems (IDS & IPS)
Network components like Routers, Switches, Gateways
Operating Systems
Data Storage Systems
Web Servers, etc.
Pre-Requisites
NetWitness Stack
1. NetWitness Server
2. Log Collector + Log Decoder
3. Concentrator
4. NetWitness Live Account
1. Notepad++
o This is free to download tool. It is useful since it has lot of searching, editing
options/plug-ins.
2. WinSCP
o To transfer files between NW appliances and local systems
3. Remote Desktop
o To connect to Windows-based systems in the lab
4. Putty client
o To connect to NW appliances as well as UNIX-based systems in the lab
Log Parser Development for New Event Source
Log Parser Development process for New Event Source can be broadly divided into 4 main
stages -
Research
1. Function
o Understand the main functions of the product.
o What does the Event Source actually do?
o What among those functions would be relevant for the Security Monitoring and
Analysis purposes?
2. Product Version
o Find out the Latest version available in the market.
3. Vendor Documentation
o Find the vendor documentation especially Administrator Guides and
Configuration guides.
o Type of Logging:
Find out what and how the event source is tracking all the activity that it is
handling or the network traffic that is going through
Depending on how the event logs are being stored, the Log collection method can be decided
using which the events are sent over to NetWitness.
Note:
For details on all configurations methods - refer to Log Collector Documentation here - RSA
NW Log Collection Deployment Guide on RSA Link.
Syslog
o This is the standard format in which NetWitness recognizes the events. Hence,
there is no need of configuration any method of collection for that device.
o Make sure the Log Decoder service is running and the Capture is ON.
ODBC
o Few Event Sources store their events in their own database. ODBC collection
method is used to pull out the events and send it to NetWitness.
o Here are a few links describing the configuration steps:-
1. ODBC Collection Basics
2. Collection Procedures
3. ODBC DSNs Configuration Parameters
4. ODBC Event Source Configuration Parameters
File Reader
o Most Event Sources either have stripped down versions of standard OS's or are
installed on machines running standard OS's, in which case, the logs are written to
a specific directory. These logs have to be delivered to an NetWitness using the
in-house SFTP (Secure FTP) Agent from where the File reader collection does the
task of picking up the delivered logs and transforming them as per the provided
configuration.
1. File Collection Basics
2. File Collection: Configure Event Sources in SA
3. File Collection: Procedures
4. File Collection: Configuration Parameters
SNMP Traps
Sometimes the Event Sources create their events in the form of SNMP traps.
o
1. SNMP Collection: The Basics
2. SNMP Collection: Procedures
3. SNMP Collection: Configure Event Sources on NW
4. SNMP v3 User Manager Configuration Parameters
Windows
Refer to "Log Parser Best Practices" section on this page for understanding some more
intricate details behind building a quality Log parser.
Deliverables
Header
The Header is used to help identify the Event Source by identifying and defining the
main theme in its Event log format.
The header identifies what the message is and where does it start.
Each header element requires the following attributes:
1. id1
2. id2
3. content
<HEADER
id1="0001"
id2="0001"
content="%PIX-<hlevel>-<messageid>: <!payload>" />
The BEST PRACTICE is to use a 4 digit number as the ID beginning with “0001”.
Content
<messageid>:
<HEADER
id1="0001"
id2="0001"
content="%McAfeeNAC-<messageid>: <!payload>"/>
The <messageid> can also be a concatenation of several strings when needed.
A function call STRCAT() is used to join the strings in this case.
<HEADER
id1="0001"
id2="0001"
messageid="STRCAT(msgIdPart1, '_', msgIdPart2)"
content="<hmonth> <hday> <htime> <hfld1>
,<msgIdPart1>, <hfld2>, <hdate> <hdatetime>,
<msgIdPart2>, <!payload>" />
<payload>:
Defines the section of the Log event which is passed on the the Message element for
further parsing.
It can be a section of the Log event after the <messageid> or it can even be a section
before which is defined using a "Payload rewind" specifier as shown below.
<HEADER
id1="0001"
id2="0001"
content="%reconnex:
<hdatetime>^^<id>^^<hostname>^^<haddress>^^<fl
d4>^^<fld5>^^<messageid>: <!payload:hostname>" />
Message ID:
Look for unique 'Identifier field' in the Log events for a particular event source which has
a finite set of values throughout all the occurrences in the log events.
The logic behind choosing the message ID is to group events together and analyze which
value makes each event unique. Preferably, the unique value will be located in the same
place for each event.
The message ID that is referenced in the Header is registered as 'id2' field in the message
definition or 'vid'.
Some commonly found useful fields could be 'an event type', 'action meta are unique to a
set of events and/or have a list defined by the vendor
Fields which usually do not have a Finite Set of values are not recommended to be used as
message IDs.
Examples -
o Usernames,
o Hostnames,
o Event-ids (which are more of Serial numbers and not unique IDs associated with
particular event type(s)),
o Process IDs (Mostly in Unix environments - these are again more of Serial
numbers and not unique IDs associated with particular processes/daemons)
o Initial words of sentences
If such fields are used as Message IDs then that increases the chances of getting unknowns as
every new variance of such fields will result in a new 'identifier' which is not been supported in
the parser.
Best Practices:
Try to define Header in such a way so as to cover maximum possible variations in the
Log Format.
Use as many Static tokens aka static words that commonly occur within the log formats
that makes the Header strong and device discovery becomes more efficient.
The more generic headers should always be placed after the specific ones.
Parse the meta values in the header itself if there are certain fields that are of significant
value and are not going to be parsed in the <payload>.
Message Element
Each message element in the XML file has the following attributes:
Id1
Id2
Eventcategory
Functions
Content
Sample ref: Cisco Ironport WSA parser
<MESSAGE
id1="CONNECT:01"
id2="CONNECT"
eventcategory="1204000000"
functions="<@domain:*URL($DOMAIN,url)><@fqdn:*URL($FQDN,url)><@
web_domain:*URL($FQDN,url)><@web_root:*URL($ROOT,url)><@web_ref_domain:*UR
L($DOMAIN,web_referer)><@web_ref_page:*URL($PAGE,web_referer)><@web_ref_qu
ery:*URL($QUERY,web_referer)><@web_ref_root:*URL($ROOT,web_referer)><@webp
age:*URL($PAGE,url)><@event_time:*EVNTTIME($MSG,'%X',fld1)><@msg:*PARMVAL(
$MSG)>"
content="<fld1>.<fld2> <duration_string> <saddr>
<action>/<resultcode> <sbytes> <web_method> <url>
<username> <fld4>/<fld5> <content_type> <policyname>
<<<info>> <fld7> s-ip= <daddr> s-port= <dport>
webcat-code= { - | <filter> } cs-version= <fld11> cs-auth-group=
<group_object> c-port= <sport> cs-bytes= <rbytes> wbrs-score= { ns |
<reputation_num> } wbrs-threat-reason= <result> wbrs-threat-type=
<category> cs-user-agent= { - | <user_agent> } cs-referer= { - |
<web_referer> } cs-cookie= { - | <web_cookie> }" />
Event Categories
NOTE: Do not decide on event category based on the overall function of the event
source.
EXAMPLE:
If it's a router and the event is about successful user authentication, use the event category
for User Authentication and not related to Network Activity or Communication.
Content
Fine Parsing:
Usage of Static tokes help in making the message definition strong and unique.
Example:
The words marked in brown would repeat hence can be kept static. While the values in
green are variables and useful information which need to be captured into Meta keys.
Generic Parsing:
If some logs do not present much information then they can be parsed generically -
meaning multiple fields can be parsed to few Meta keys creating buckets.
Example:
The text highlighted in green can be parsed to a single Meta key (<<fld>) because it
is not giving any security related information.
%CISCOIPORTWSA-4: Mon Jul 6 12:30:54 2009 Info: Begin Logfile
Aug 20 23:17:18 backup_cubs[5948]: [ID 702911 local3.info] Status of clone-split
is 0
But in this case, these need to be ordered below the "more - specific" message definitions.
Functions
Functions are used for further processing the Meta values that are extracted from the payload.
When using any form of function in a content string, it is Best Practice to place the
functions in a Separate 'functions' attribute.
These are evaluated from Left to Right.
id1="CONNECT:01"
id2="CONNECT"
functions="<@msg:*PARMVAL($MSG)><@event_time:*EVNTTIME($MSG,'%D/%B/%W:%N:%U:%
O',fld1)>"
content =" ...................." />
NOTE:
INFO:
Refer to this link for more information various functors available in Log Decoder ----> Log
Parsers Functors
All Device XML files should be formatted by placing starting lines ("<") with no
spacing, and XML contents (such as the content line) two tabs in.
Messages should ideally be ordered alphabetically (by id2).
Choosing Meta Keys
Meta Keys from Table Map/ Master Variable Guide
It is extremely important to use these Meta Keys consistently as this data is further used
in for Threat Analytics such as Application Rules, Reports, ESA rules etc.
All the keys must be chosen from the Table Map XML
Log Decoder recognizes the keys that are specified in Table Map xml file only.
All of these meta keys are described in the Master Variables Guide
While creating the Device XML there may be fields which contain information, but do
not map to any variables in the Table Map. In this situation, a variable named "fld#" can
be used as a place holder where # is a number.
o The data in <fld> meta is not stored anywhere on Log Decoder or shown on
Investigation UI unless it is indexed.
o Whenever a situation arises where an fld variable should be used, make sure you
are consistent! If the same data appears in multiple messages, use the same fld#
for that data
Behind the scenes, every meta should be used from Master Variable Guide, or else
Syntax Checker tool (described later) will throw errors.
Back to the point! . If a different piece of data appears, use a different fld*#. Make sure to
document the fld usage at the top of the XML file*, so that it is understood on our end
what purpose the variable served (for when an XML needs an update). Note that even if
the fld is used to capture junk info (a varying number of spaces, for example) consistency
should be maintained and the variable use should be documented.
Unit Testing Log parser
NW Log player
This tool is used to inject/replay log events directly into the Log Decoder
REST API
Log Decoder logs are a good source to track the parsing going on -
info,audit,warning,failure,LogParse=debug|info|audit|warning|failure,Parse
=debug|info|audit|warning|failure
ESI Tool
https://community.rsa.com/community/products/netwitness/blog/2017/04/24/rsa-netwitness-esi-
10-beta-3
Syntax Checker
This tool is used to test the Parser XML syntax for any issues that might not work in the LD
environment
NW Console