Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
456 views

Lesson 4 Collecting and Querying Security

The document discusses configuring log review and security information and event management (SIEM) tools to collect and analyze security monitoring data. It describes deploying a SIEM program, major commercial and open-source SIEM products, and using SIEMs to automate the security intelligence cycle and generate actionable insights from log and event data.

Uploaded by

Dickson Pamin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
456 views

Lesson 4 Collecting and Querying Security

The document discusses configuring log review and security information and event management (SIEM) tools to collect and analyze security monitoring data. It describes deploying a SIEM program, major commercial and open-source SIEM products, and using SIEMs to automate the security intelligence cycle and generate actionable insights from log and event data.

Uploaded by

Dickson Pamin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 137

Lesson 4
Collecting and Querying Security
Monitoring Data

Lesson Introduction
Security monitoring depends to a great extent on the use of data captured in network
traces, log files, and host-based scanners. Collecting this data into a single repository
for analysis—a security information and event management (SIEM) system—will be a
core part of your role as a cybersecurity analyst.

Lesson Objectives
In this lesson you will:
• Configure log review and SIEM tools.

• Analyze and query logs and SIEM data.

Lesson 4: Collecting and Querying Security Monitoring Data

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
138 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

Topic 4A
&RQȴJXUH/RJ5HYLHZDQG6Ζ(07RROV

EXAM OBJECTIVES COVERED


3.1 Given a scenario, analyze data as part of security monitoring activities. 

Log review is a critical part of security assurance. Only referring to the logs following a
major incident is missing the opportunity to identify threats and vulnerabilities early
and to respond proactively. There are many types of logs and log formats however,
so you must be able to configure systems that can aggregate and correlate data from
these different log sources and produce actionable intelligence.

Security Information and Event Management (SIEM)


Deployment
Not all security incidents will be revealed by a single event. Taken in combination,
events that seem completely valid and proper on their own may reveal a security
problem. For example, your virtual private network (VPN) logs show that Jane Doe,
one of your sales representatives who regularly travels to Asia, has logged in to your
network from a location in Beijing. Moments later, your radio-frequency identification
(RFID) physical security logging system shows that Jane has swiped her ID card at the
front door of your corporate office in Downers Grove, IL. While neither of these events
would individually show up as an anomaly, when correlated they provide good evidence
that you have a security problem.
Security information and event management (SIEM) solutions provide real-time
or near-real-time analysis of logs and alerts generated by network hardware and
applications. SIEM technology is used to provide expanded insights into intrusion
detection and prevention through the aggregation and correlation of security
intelligence. SIEM solutions can be implemented as software, hardware appliances, or
outsourced managed services.
The effective deployment of a SIEM program involves the following considerations:
• Log all relevant events, but not be cluttered with irrelevant data.

• Establish and clearly document the scope of events.

• Develop use cases to define exactly what you do and do not consider a threat.

• Have a plan about what should be done in the event that you are alerted to a threat.

• Establish a robust ticketing process to track all flagged events.

• Schedule regular threat hunting so you don't miss any important events that have
escaped alerts.

• Provide auditors and forensics analysts with a trail of evidence to support their duties.

The following represents some of the major commercial and open-source products
available in the SIEM marketplace.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 139

Splunk
Splunk (splunk.com) is one of the market-leading big data information gathering and
analysis tools. Splunk can import machine-generated data via a connector or visibility
add-on. Connectors exist for most NOS and application platforms. The data is indexed
as it is retrieved and written to a data store. The historical or real-time data captured
by Splunk can then be analyzed using searches, written in Splunk's Search Processing
Language (SPL). The results of searches can be presented using visualization tools in
custom dashboards and reports, or configured as triggers for alerts and notifications.
Splunk can be installed as local enterprise software or used as a cloud solution. There
is also a Splunk Light product for smaller networks and a dedicated Enterprise Security
module. The security module includes pre-configured dashboards, security intelligence
searches, and incident response workflows.

ELK/Elastic Stack
The ELK Stack (elastic.co), now the Elastic Stack with the addition of Beats, is a collection
of tools providing SIEM functionality:
• Elasticsearch—The query and analytics tool.

• Logstash—Log collection and normalization.

• Kibana—A visualization tool.

• Beats—Endpoint log collection agents.

The ELK Stack can be implemented locally or it can be invoked as a cloud service.

ArcSight
ArcSight (microfocus.com/en-us/products/siem-security-information-event-
management/overview) is a vendor of SIEM log management and analytics software,
now owned by HP, via the affiliated company Micro Focus. As well as cybersecurity
intelligence and response, one of the crucial functions of enterprise SIEMs like ArcSight
is the ability to provide compliance reporting for legislation and regulations such as
HIPAA, SOX, and PCI DSS.

QRadar
QRadar (ibm.com/security/security-intelligence/qradar) is IBM's SIEM log management,
analytics, and compliance reporting platform.

Alien Vault and OSSIM (Open-Source Security Information


Management)
Open-Source Security Information Management (OSSIM) is a SIEM product developed
by Alien Vault (alienvault.com/products/ossim), who market commercial versions of
it. AlienVault is now owned by AT&T and is being rebranded as AT&T Cybersecurity. As
well as standard SIEM functions such as asset discovery and log management, OSSIM
can integrate other open-source tools, such as the Snort IDS and OpenVAS vulnerability
scanner, and provide an integrated web administrative tool to manage the whole
security environment.

Graylog
Graylog (graylog.org) is an open-source SIEM with an enterprise version focused on
compliance and supporting IT operations and DevOps.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
140 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

Security Data Collection and Use Cases


In many cases, intelligence loses value over time. So, the intelligence that you capture
and analyze in real time or near real time would be the most valuable. In some cases,
such timely intelligence might enable you to limit or completely avoid the damage
resulting from an attack. But gathering and analyzing security intelligence takes a lot
of effort. Many tedious tasks are involved in the process: identifying relevant data,
collecting it, transforming it into a useful form, aggregating different sources and
correlating them, analyzing the correlated data to find patterns that are significant for
security, and finally, identifying actions you should take in response to those significant
security patterns.

SIEM's presence in the security intelligence cycle.

SIEMs can be configured to automate much of this security intelligence cycle,


predominantly in the collection and processing phases, and generate actionable
insights more quickly than manual or piecemeal log collection methods. SIEMs can
even automate some of the tasks involved in analysis, production, and dissemination.
Some of your analysis work can be reduced through careful planning and direction
on the front-end of the life cycle. For example, in the process of evaluating what
information you will collect to meet your security and compliance requirements,
you are conducting a front-end analysis. This process will save you (and the SIEM)
significant work later. While a SIEM could collect all the logs across your systems,
this is not a good idea. It is best to configure the SIEM to focus on the events related
to security and compliance that you need to know about, which you have already
identified through your risk management analysis. Too much information can bog
down the work performed by the SIEM, create unnecessary network traffic, and create
more work for you when it's time to analyze information produced by the SIEM.
Early SIEMs were hard to configure, limited in their capabilities, and required significant
expertise to get the most value out of them. Some users found that they simply added
to the noise, providing another information source and set of alerts to respond to
without providing useful insights or efficiencies. All alerting systems suffer from the
problems of false positives and false negatives. False negatives mean that security
administrators are exposed to threats without being aware of them, while false
positives overwhelm analysis and response resources. To mitigate risks from false
indicators, a successful SIEM deployment must include development of use cases. A
use case is a specific condition that should be reported, such as suspicious log-ons to a

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 141

high value asset by privileged accounts or a process executing from an administrative


share. A template developed to support a use case specifies the data sources that will
contain indicators of the event, the query strings used to correlate indicators, and the
actions that a detected event should trigger. Use cases are identified and constructed
through threat modeling, but in general terms, you should try to capture at least the
five Ws:
• When the event started (and ended, if relevant).

• Who was involved in the event.

• What happened, with specific detail to distinguish the nature of the event from
other events.

• Where it happened—on which host, file system, network port, and so forth.

• Where the event originated (for example, a session initiated from an outside IP
address over a VPN connection).

To learn more, watch the video “Configuring SIEM Agents” on the CompTIA Learning Center.

Security Data Normalization


Security data comes from a wide variety of sources. In its raw form, some of that data
may not be particularly useful for analysis. To produce actionable intelligence, patterns
or anomalies must be identified within the data, which point toward a problem or
vulnerability. Whether data is being scanned by humans or by software, the data may
need to be reformatted or restructured to facilitate the scanning and analysis process.
SIEMs typically collect data from network appliances, servers, and clients in one or
more of the following ways:
• Agent-based—With this approach, you must install an agent service on each host.
As events occur on the host, logging data is filtered, aggregated, and normalized
at the host, then sent to the SIEM server for analysis and storage. Agents could be
configured to forward event and application logs, such as the Elastic Stacks Beats
agents (elastic.co/products/beats), or intrusion detection data, such as OSSEC
(ossec.net).

• Listener/collector—Rather than installing an agent, hosts can be configured to


push updates to the SIEM server using a protocol such as syslog or Simple Network
Management Protocol (SNMP). A process runs on the management server to parse
and normalize each log/monitoring source.

• Sensor—As well as log data, the SIEM might collect packet captures and traffic flow
data from sniffers. Often, the SIEM software can be configured in sensor mode and
deployed to different points on the network. The sensor instances then forward
network traffic information back to the main management instance.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
142 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

In this network, the SIEM aggregates network traffic data from SPAN (port mirroring) and TAP
sensors placed at strategic locations. Data from client workstations is collected from IDS/EDR
agents running on hosts. Security data is collected from application and access logs using either
agents or the Syslog protocol. (Images © 123rf.com)

Parsing and Normalization


Aggregating data from multiple sources is a complex process if the sources use
different formats. There are many different formats for logs, such as proprietary
binary formats, tab-separated or comma-separated values (CSV), database log storage
systems, syslog, and Simple Network Management Protocol (SNMP). Some tools are
oriented toward using eXtensible Markup Language (XML) or JavaScript Object Notation
(JSON) formatted output. Some formats may be directly readable through a text editor,
while others are not. There may be simple encoding differences, such as whether
Linux-style or Windows-style end-of-line characters are used, or whether text is ASCII,
ANSI, or Unicode. SIEM solutions need a way of standardizing the information from
these diverse sources.
SIEM software features connectors or plug-ins to collect and interpret (or parse) the
logs from distinct types of systems and to account for differences between vendor
implementations. Usually parsing will be carried out using regular expressions tailored
to each log file format to identify attributes and content that can be mapped to
standard fields in the SIEM's reporting and analysis tools.

Date/Time Synchronization
Another processing challenge is the timestamps used in each log. Hosts might use
incorrect internal clock settings, or settings that are correct for a different time zone, or
record the timestamp in a non-standard way (tools.ietf.org/html/rfc3339). These issues
can make it difficult to correlate events and reconstruct time sequences. Try to ensure
that all logging sources be synchronized to the same time source, using Network Time
Protocol (NTP), for instance. The system also needs to deal with varying time zones
and daylight savings time changes consistently. If the SIEM cannot correct for these
variations, one option is to ensure that all logging sources record timestamps in the

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 143

UTC time zone. For example, an ISO 8601/RFC 3339 date/timestamp uses the following
format:
2020-01-01T00:00.01Z
This is one second past midnight on New Year's Day 2020 at the Greenwich Meridian.
The Z indicates that there is no time zone offset, so the date/time value represents
Coordinated Universal Time (UTC). At the same time New Year's Day is being celebrated
in Greenwich, the local time in New York (UTC-5) would be recorded as:
2019-12-31T19:00.01-05:00
Coordinated Universal Time (UTC) is a time standard, not a time zone, but it always
corresponds to the current time in the Greenwich Meantime (GMT) time zone. A date stamp
in GMT should be recorded as 2020-01-01T00:00.01+00:00. RFC 3339
allows the use of -00:00 to indicate that the time zone is unknown.

Secure Logging
Logging requires sufficient IT resources because it can be both disk- and network-
intensive. Large organizations can generate gigabytes or even terabytes of log data
every hour. Analyzing such large volumes of data requires substantial CPU and system
memory resources. It is also important to configure a secure channel so that an
attacker cannot tamper with the logs being sent to the SIEM. The data store itself must
have the CIA triad properties of confidentiality, integrity, and availability. 

Event Log
One source of security information is the event log from each network server or client.
Systems such as Microsoft Windows, Apple macOS, and Linux keep a variety of logs to
record events as users and software interact with the system. The format of the logs
varies depending on the system. Information contained within the logs also varies by
system, and in many cases, the type of information that is captured can be configured.
When events are generated, they are placed into log categories. These categories
describe the general nature of the events or what areas of the OS they affect. The five
main categories of Windows event logs are:
• Application—Events generated by applications and services, such as when a service
cannot start.

• Security—Audit events, such as a failed log-on or access to a file being denied.

• System—Events generated by the operating system and its services, such as storage
volume health checks.

• Setup—Events generated during the installation of Windows.

• Forwarded Events—Events that are sent to the local host from other computers.

Several of these event categories further classify events by their severity:


• Information—Successful events.

• Warning—Events that are not necessarily a problem but may be in the future.

• Error—Events that are significant problems and may result in reduced functionality.

• Audit Success/Failure—Events that indicate a user or service either fulfilled or did


not fulfill the system's audit policies. These are unique to the Security log.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
144 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

Beyond general category and severity, each log entry includes fields for the subject of
the entry, details of the error (if there is one), the event's ID, the source of the event,
and a description of what a warning or error might mean.
Prior to Windows Vista and Windows 7, one limitation of Windows logs was that they
only logged local events; that is, each computer handled logging its own events. This
meant that third-party tools were needed to gain an overall view of messaging for
the entire network. The development of event subscriptions in the latest versions of
Windows and Windows Server allows logging to be configured to forward all events to
a single host, enabling a holistic view of network events. The updated log format (.evtx)
uses XML formatting, making export to third-party applications more straightforward.

Using the Elastic Stack running in Security Onion to view a summary of logs collected from winlogbeat
agents running on Windows servers. (Screenshot Security Onion securityonion.net)

syslog
For non-Windows hosts, events are usually managed by syslog (tools.ietf.org/html/
rfc3164). This was designed to follow a client-server model and so allows for centralized
collection of events from multiple sources. It also provides an open format for event
logging messages, and as such has become a de facto standard for logging of events
from distributed systems. For example, syslog messages can be generated by Cisco
routers and switches, as well as servers and workstations, and collected in a central
database for viewing and analysis. Syslog is a TCP/IP protocol and can run on most
operating systems. It usually uses UDP port 514.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 145

Configuring the pfSense UTM to send log events to a remote syslog server at 10.1.0.248 over the default
UDP port 514. (Screenshot Netgate pfSense pfsense.org.)

A syslog message comprises a PRI code, a header containing a timestamp and host
name, and a message part. The PRI code is calculated from the facility and a severity
level:
• Facility identifies the affected system by using a numeric value from 0 to 23. On
most systems, the values can be interpreted by a short keyword such as "kern"
(operating system kernel), "mail" (mail system), or "auth" (authentication or security).
The facility is multiplied by 8.

• Severity values are a number from 0 (most critical) to 7 (not critical). The severity
value is added to the facility value to derive the PRI.

The PRI code is used by the logging daemon to determine where to write the event or
print an alert. For example, a PRI code of <19> represents the mail facility (19/8=2.xxx
[ignore the remainder]) plus an error-level severity (19-[2*8]=3), so the event would
be written to the mail log and possibly also printed to the administrator's terminal. An
event can be written to multiple logs.

In a basic syslog implementation, the PRI code is not usually written to the log. On modern
implementations, it is possible to configure the template used by the logging daemon to add
the string representations to the header. Similarly, more information may be added to the
header than just the timestamp and host name.

The message part contains a tag showing the source process plus content. The format
of the content is application dependent. It might use space- or comma-delimited fields
or name/value pairs, such as JSON data.
The original syslog protocol has some drawbacks. Using UDP delivery protocols does
not ensure delivery, so messages could be lost in a congested network. Also, it does
not supply basic security controls to ensure confidentiality, integrity, and availability of
log data. Messages are not encrypted in transit or in storage, and any host can send

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
146 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

data to the syslog server, so an attacker could cause a DoS to flood the server with
misleading data. A man-in-the-middle attack could destroy the integrity of message
data. In response to these shortcomings, newer syslog implementations introduce
security features, many of which are captured in the standard proposal tools.ietf.org/
html/rfc3195, which includes:
• The ability to use TCP (port 1468) for acknowledged delivery, instead of
unacknowledged delivery over UDP (port 514).

• The ability to use Transport Layer Security (TLS) to encrypt message content in
transit.

• Protecting the integrity of message content through authentication and a message


digest algorithm such as Message Digest 5 (MD5) or Secure Hash Algorithm-1
(SHA-1).

Syslog implementations may also provide additional features beyond those specified
in RFC 3195, such as message filtering, automated log analysis capabilities, event
response scripting (so you can send alerts through email or text messages, for
example), and alternate message formats.

Note that syslog can refer to the protocol used to transfer log data, the server (daemon)
used to implement logging, or to the format of log entries. Most systems implement an
updated version of the daemon (syslog-ng or rsyslog).

Beyond OS event logs, various log formats have been developed for the specific purpose of
exchanging event data between security tools, such as from an IDS or firewall to a SIEM. You
can find an overview of these formats at secef.net/tutorials.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 147

5HYLHZ$FWLYLW\
/RJDQG6Ζ(07RROV
Answer the following questions to test your understanding of the content covered in
this topic.

1. :KDWRSWLRQVDUHWKHUHIRULQJHVWLQJGDWDIURPDXQLȴHGWKUHDW
management (UTM) appliance deployed on the network edge to a SIEM?

2. Which two factors do you need to account for when correlating an event
timeline using a SIEM?

3. True or false? Syslog uses a standard format for all message content.

4. :KLFKGHIDXOWSRUWGR\RXQHHGWRDOORZRQDQ\LQWHUQDOȴUHZDOOVWRDOORZD
host to send messages by syslog to a SIEM management server?

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
148 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

/DE$FWLYLW\
&RQȴJXULQJ6Ζ(0$JHQWVDQG&ROOHFWRUV

EXAM OBJECTIVE COVERED


3.1 Given a scenario, analyze data as part of security monitoring activities.

Scenario
A security information and event management (SIEM) system assists security
monitoring and incident response by aggregating and correlating log and network
traffic data within a single management and reporting interface. In this lab, you will use
different methods of configuring data sources for shipping logs to the SIEM. We will use
the Security Onion (securityonion.net) appliance, which implements the Elastic Stack
(elastic.co) for SIEM functionality.

Lab Setup
If you are completing this lab using the CompTIA Labs hosted environment, access the
lab using the link provided. Note that you should follow the instructions presented in
the CompTIA Labs interface, NOT the steps below. If you are completing this lab using
a classroom computer, use the VMs installed to Hyper-V on your HOST computer, and
follow the steps below to complete the lab.
Start the VMs used in this lab in the following order, adjusting the memory allocation
first if necessary, and waiting at the ellipsis for the previous VMs to finish booting
before starting the next group. You do not need to connect to a VM until prompted to
do so in the activity steps.
1. UTM1 (512—1024 MB)
2. DC1 (1024—2048 MB)
3. SIEM1 (4096—6144 MB)
...
4. MS1 (1024—2048 MB)
...
5. PC1 (1024—2048 MB)
6. PC2 (512—1024 MB)

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 149

If you can allocate more than the minimum amounts of RAM, prioritize SIEM1.

&RQȴJXUHD6HQVRUΖQWHUIDFH
The SIEM1 VM running Security Onion has two interfaces. eth0 is configured with the
IP address 10.1.0.246 and is used as a management interface. eth1 has no IP address
and is used only to sniff traffic from the local network. All the VMs are connected to the
vLOCAL switch implemented in Hyper-V. To enable eth1 to sniff traffic, it is configured
as a port mirroring destination interface. Run a script to configure the source interfaces
and test that the sensor can sniff traffic.
1. On the HOST, open a PowerShell prompt as administrator and run the following
script:

C:\COMPTIA-LABS\LABFILES\EnablePortMirroring.ps1
This script configures the Windows and UTM1 VMs as source interfaces for port
mirroring. Any traffic they process will be copied to the port that SIEM1's eth1
sensor interface is connected to.

2. Close the PowerShell window.

Lab topology—The Hyper-V settings allow SIEM1 to sniff traffic passing over the vLOCAL switch. The
sniffing/sensor interface is separate from the management interface and has no IP address. It operates
as a passive sensor. (Images © 123rf.com)

3. Open a connection window for the SIEM1 VM and log on as siem with the
password Pa$$w0rd.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
150 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

4. Right-click the desktop and select Open Terminal. Run the following command to
test port mirroring, entering Pa$$w0rd when prompted to confirm the use of
sudo:
sudo tcpdump -ni eth1 ip
The -n switch suppresses name resolution and the ip filter omits IPv6 traffic.
Make sure you can see unicast traffic between other hosts (10.1.0.1 to 10.10.2, for
instance).

5. Press CTRL+Z to halt the traffic capture.

If you don't see unicast traffic, use the Settings dialog for each VM to verify that the
adapter is set as Source under Network Adapter > Advanced Features > Mirroring
mode.

This traffic is being monitored by the Bro (now called Zeek) passive network
sniffer (zeek.org). Bro's rules reduce this traffic stream to "interesting" events.
These events are written to the SIEM logging engine, powered by the Elastic Stack
(Logstash, ElasticSearch, and Kibana).

6. Run sudo so-status

The output should show that each service is OK. If there is a warning message
that Logstash is still initializing, you might not see immediate results as you
complete the activities.

7. From the desktop, right-click the Kibana icon and select Open. Log on with the
username siem and password Pa$$w0rd.

Kibana is the visualization tool in the ElasticStack. It is used to configure


dashboards for different categories, showing data in graph and table formats.

8. Under Bro Hunting, select Connections. Scroll down the page to verify that hosts
from the 10.1.0.0/24 network are present.

Bro/Zeek performs passive analysis on traffic received by the sensor and collates statistics and
generates alerts for packets or conversations that match a rule pattern. The Kibana dashboard
presents the data generated by Zeek as visualizations in one or more dashboards. (Screenshot Kibana
in the Elastic Stack elastic.co)

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 151

9. Open a connection window for the PC1 VM and log on as 515support\


administrator with the password Pa$$w0rd.
10. Use the desktop shortcut to run Zenmap and perform the default intense scan
against 10.1.0.1.

11. Switch back to the SIEM1 VM. In Kibana, under Alert Data, view the Bro Notices
and NIDS categories for scanning activity alerts. If there are no results, click the
Update button. You may need to be patient or check again after completing other
tasks in this lab.

The NIDS alert is generated by the Snort IDS engine and ruleset.

Viewing a summary of alerts produced by the NIDSs sensor and ruleset (Snort) in Kibana. The
classification of the event as "Web Application Attack" is drawn from the classtype attribute in the Snort
rule. (Screenshot Kibana in the Elastic Stack elastic.co)

Install a Beats Agent


As well as capturing network traffic from these hosts, we also want to collect log
information from them. To do this, we can install agent software. We will install the
Beats software for Windows log collection on the Server instances.
1. Open a connection window for the DC1 VM and log on as administrator
with the password Pa$$w0rd.

2. Open the folder C:\LABFILES\winlogbeat in Explorer. Right-click winlogbeat.yml


and select Edit with Notepad++.

As you edit the file, be aware that yaml files are white space sensitive. Settings are
grouped by indentation. You must not use tab to indent, however. This file uses
two spaces per indentation level, which is the widely accepted custom.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
152 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

3. Scroll to locate the "output.logstash:" section. In line 111 (#hosts:), change


"localhost" to 10.1.0.246. The line should now read as follows:

hosts: "10.1.0.246:5044"
In the Elastic Stack, ElasticSearch provides the storage and query functionality.
Logstash is an engine for collecting different types of data from various sources
via a pipeline. The pipeline takes inputs—such as syslog or a Beats agent—and
filters the data to normalize it.

4. Save and close the file.

5. Copy the winlogbeat folder to C:\Program Files (x86)\

6. Open PowerShell as administrator and run the following commands to test the
configuration file:

cd 'C:\Program Files (x86)\winlogbeat'


?ZLQORJEHDWWHVWFRQƴJFZLQORJEHDW\POH
The last part of the output should read "Config OK." If there is an error, check the
edit you made to the configuration file carefully.

7. Run the following two commands to install the agent as a service, and start the
service:

.\install-service-winlogbeat
start-service winlogbeat
8. Switch to the SIEM1 VM. At the terminal, run the following command:

sudo so-allow-view
The output shows that the firewall has already been configured to allow traffic
over the Beats port 5044. Note that Logstash and other components run in
Docker containers.

9. In the Kibana app, check the Bro Notices and NIDS dashboards if you have not
previously seen any alerts. Also check the Beats dashboard under Host Hunting. It
may take time for events from the DC1 VM to start appearing, however. Use the
Update or Refresh button to check for new alerts after you have finished other
tasks in the lab.

&RQȴJXUH$SSOLFDWLRQ/RJJLQJ
The default Beats configuration for a Windows Server just captures the application,
system, and security logs. This will produce a lot of data, much of which will not really
be relevant to incident detection or threat hunting. You will often want to configure
application logs to send data to the SIEM. As an example, on MS1, configure IIS to send
access logs to Event Viewer and install the Beats agent to forward it to the SIEM.
1. Open a connection window for the MS1 VM and log on as 515support\
Administrator with the password Pa$$w0rd.
2. In Server Manager, select Tools > Internet Information Services (IIS) Manager.

3. In IIS Manager, select the MS1 server and double-click the Logging applet in the
middle pane.

Note the options for log format, but leave set to W3C.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 153

4. Under Log Event Destination, select %RWKORJȴOHDQG(7:HYHQW. Click Apply.

5. Use Explorer to copy \\DC1\LABFILES\winlogbeat to C:\Program Files (x86)

6. Open PowerShell as administrator and run the following command to check the
name of the event log capturing IIS access events (ignore any line break):

get-winevent -listlog * | where-object { $_.logname


OLNH ,,6 `_IRUPDWOLVWSURSHUW\ORJQDPH
7. The query should match three logs. Copy the text Microsoft-IIS_
Logging/Logs. To copy the value from the prompt, select it and press
ENTER.
8. Open the C:\Program Files (x86)\winlogbeat\winlogbeat.yml file in Notepad++.
Under "winlogbeat.event_logs:" add the following text to the end of the list,
making sure the line has the same indentation as the one above:

- name: Microsoft-IIS-Logging/Logs
9. Save and close the file, selecting Yes when prompted to switch to Administrator
mode. Switch back to the PowerShell prompt. Run the following commands to
test the configuration file:

cd 'C:\Program Files (x86)\winlogbeat'


?ZLQORJEHDWWHVWFRQƴJFZLQORJEHDW\POH
The last part of the output should read "Config OK". If there is an error, check the
edits you made to the configuration file carefully.

10. Run the following two commands to install the agent as a service, and start the
service:

.\install-service-winlogbeat
start-service winlogbeat
11. Use the PC1 and PC2 VMs to generate some network activity, such as copying
files from ??'&?ODEƴOHV share, browsing the http://updates.
corp.515support.com website, and using Zenmap to scan 10.1.0.2.

Install a HIDS Agent


With the Windows clients, we will take a difference approach and install the OSSEC
HIDS agent. This should produce only security-relevant information. Note that Security
Onion works with a forked version of OSSEC called Wazuh (wazuh.com).
1. If necessary, open a connection window for the PC1 VM and log on as
515support\administrator with the password Pa$$w0rd.
2. Run C:\LABFILES\wazuh-agent-3.9.5-1.msi to start the installer.

3. Check the I accept box and click Install. Accept the UAC prompt. When setup
completes, check the 5XQ$JHQWFRQȴJXUDWLRQLQWHUIDFH box and click Finish.
Confirm the UAC prompt.

4. Open a command prompt as administrator and run the following command:

"C:\Program Files (x86)\ossec-agent\agent-auth.exe"


-m 10.1.0.246

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
154 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

This associates the agent with the manager running on SIEM1. It is possible to
authenticate this connection, but we have skipped that for this lab.

5. Switch back to the Wazuh Agent Manager dialog. Click the Refresh button. A key
should be loaded into the Authentication key box.

6. In the Manager IP box, type 10.1.0.246. Click the Save button.

7. Select Manage > Start > OK.

8. Optionally, repeat this process to install the agent on PC2 as well.

Do not be concerned if the agent dialog still shows the status as "Stopped."

Perform a Query
To extract and aggregate records from the SIEM's database, you need to be able to
construct string search patterns as the basis for more complex queries.
1. Switch to the SIEM1 VM. In the Kibana app, check the dashboards for new alert
sources, including the OSSEC dashboard under Host Hunting. Use the Update or
Refresh button to check for new alerts.

2. Click the Management tab, select Index Patterns, and then click the Create
index pattern button.

3. In the Index pattern box, type logstash-ossec-* and then click Next step.

4. From the Time Filter field name list box, select I don't want to use the Time
Filter. Click the Create index pattern button.

5. Click the Discover tab. From the list box currently set to *:logstash-*, select
logstash-ossec-*.

6. In the Search box, type the following filter string and then click the Update button.

agent.name: PC* AND alert_level: >=5


7. From the results, click the small black arrow to expand the record to view all event
data. If there are no results, revisit this step later in the lab.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 155

Querying source log files in Kibana. (Screenshot Kibana in the Elastic Stack elastic.co)

You can also build fields using the Available fields list. This will show you the top values for a
particular category.

&RQȴJXUHDV\VORJ6RXUFH
Some hosts are not compatible with agents, or you may have a configuration or
security reason for not installing an agent. In this scenario, you can use syslog to
transfer event data from the host to the SIEM. To illustrate this, configure remote syslog
on the UTM1 VM, which is running the pfSense security appliance (pfsense.org).
1. Switch to the PC1 VM and open http://10.1.0.254 in the browser.

2. Log on to the web admin app using the username admin and password
Pa$$w0rd. Maxmize the window.
3. Select Status > System Logs. Click the Settings tab.

4. Scroll down to the Remote Logging Options section. Check the Enable Remote
Logging box.

5. In the first Remote Log Server(s) box, type 10.1.0.246:514.

6. From Remote Syslog Contents, check only System Events and Firewall Events.
Click Save.

7. Switch to the SIEM1 VM. In the Kibana app, click the Management tab, select
Index Patterns, and then click the Create index pattern button.

8. In the Index pattern box, type ORJVWDVKV\VORJ and then click Next step.

9. From the Time Filter field name list box, select I don't want to use the Time
Filter. Click the Create index pattern button.

10. Click the Discover tab. From the list box currently set to logstash-ossec*, select
logstash-syslog-*.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
156 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

11. In the Search box, type the following filter string and then click the Update button.

V\VORJVRXUFHLS
12. You have explored some options for ingesting log and network traffic sources into
a SIEM. What will be the next step in configuring this SIEM deployment?

Close the Lab


Discard changes made to the VMs in this lab.
• Switch to the Hyper-V Manager console on the HOST.

• For each VM that is running, right-click and select Revert to set the configuration
back to the saved checkpoint.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4A

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 157

Topic 4B
$QDO\]HDQG4XHU\/RJVDQG6Ζ(0'DWD

EXAM OBJECTIVES COVERED


3.1 Given a scenario, analyze data as part of security monitoring activities. 

Once you have a system for collecting and normalizing security information, the next
phase in the intelligence cycle is analysis and production. As a CySA+ professional, you
must be able to use query and scripting tools to facilitate analysis of large and complex
datasets. 

SIEM Dashboards
A SIEM will help with most of the regular duties involved in staffing a SOC or CSIRT,
such as:
• Perform triage on alerts, escalating true positives to incident response and
dismissing false positives.

• Review security data sources to check that log collection and information feeds are
functioning as expected.

• Review CTI to identify priorities or potential impacts from events occurring at other
companies and all over the Internet.

You may interpret security incidents differently depending on your judgement of an overall
threat level. You should be alert to internal projects that increase risk—product development
that may entice competitors to try to spy on you or new and recent hires, for instance.
Externally measured threats will also change your overall threat level. For example, a
zero-day vulnerability such as the OpenSSL Heartbleed exploit raises the threat level for all
organizations.

• Perform vulnerability scanning and management.

• Identify opportunities for threat hunting, based on CTI and overall alert and incident
status.

These tasks can be aided by using a SIEM dashboard. A dashboard is configured by


adding widgets, each of which shows key metrics in easily digestible visualizations.
The visualization style should support the use of the metric. Some common
visualizations are:
• Pie chart—Shows the relative balance of classifications, without the overall level.

• Line graph—Shows level over time.

• Bar graph—Compares levels between different classifications.

• Stacked bar graph—Compares levels between different classifications across an


added factor, such as time periods.

• Gauge—Shows a level that has defined limits.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4B

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
158 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

• Table—Shows top/bottom instances, or add statistical detail to a metric.

Different visualizations in an Elastic Stack dashboard running on Security Onion. The line graph shows
changes in the volume of alerts over time. The pie graph shows the balance of severe-to-informational
alerts. The table shows the events reported most often. (Screenshot Security Onion securityonion.net)

Selecting the right metrics for the dashboard is a critical task. As space is limited, only
information that is directly actionable should be included. Each widget selected should
be designed to support an analyst workflow. Common security key performance
indicators (KPI) include:
• The number of vulnerabilities, by service type, that have been discovered and
remediated.
• The number of failed log-ons or unauthorized access attempts.
• The number of systems currently out of compliance with security requirements.
• The number of security incidents reported within the last month.
• The average response time for a security incident.
• The average time required to resolve a help-desk call.
• The current number of outstanding or unresolved technical issues in a project or
system.
• The number of employees who have completed security training.
• Percentage of test coverage on applications being developed in-house.
You may also configure multiple dashboards for different audiences. For example, the
metrics discussed above are relevant to the security team. A separate dashboard could
be configured for reporting to management.

To learn more, watch the video “Using SIEM Dashboards” on the CompTIA Learning Center.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4B

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 159

Analysis and Detection Methods


A SIEM will apply various rules to its data inputs and output the resulting matches
as alerts for an analyst to investigate. Such systems tend to produce high numbers
of false negatives, so it is important to understand the analytic process by which the
system generated the alert. This will help you as triage alerts, looking to dismiss false
positives and respond to true positives.
The simplest forms of correlation for a machine to perform are signature detection
and rules-based policies, also referred to as conditional analysis. This means that the
software is programmed with signatures of what certain attacks look like, and when the
data inputs from the various log sources match a signature or rule ("IF x AND (y OR z)")
the software generates an alert. The problem with this approach lies in establishing the
right ruleset. Most rules-based SIEM correlation systems generate enormous numbers of
false positives. This sort of system is also blind to zero-day or previously unknown TTPs.

Heuristic Analysis and Machine Learning


A basic "IF x AND (y OR z)" type of ruleset can be improved by heuristic analysis and
machine learning. Determining whether a number of observed data points constitute
an indicator and whether related indicators make up an incident depends on a good
understanding of the relationships between the observables and the context in which
they occur. Heuristic analysis means the software can use techniques to determine
whether a set of data points are similar enough to "IF x AND (y OR z)" that an alert
should be generated anyway. 
Human analysts are typically good at interpreting context but work painfully slowly, in
computer terms, and cannot hope to cope with the sheer volume of data and traffic
generated by a typical network. Analysis of past incidents can be used as feedback
to improve rulesets manually, but this is slow work. Modern detection and response
systems make substantial use of machine learning. This means that the system can
receive and process feedback without (much) human intervention using systems such
as honeypots and honeynets to expose the software to real world threats and tune it to
recognize and defeat them.

Behavioral Analysis
Behavior-based detection (or statistical- or profile-based detection) means that the
engine is trained to recognize baseline traffic or expected events associated with a user
account or network device. Anything that deviates from this baseline (outside a defined
level of tolerance) generates an alert. The engine does not keep a record of everything
that has happened and then try to match new traffic to a precise record of what has
gone before. It uses heuristics to generate a statistical model of what the baseline looks
like. It may develop several profiles to model behavior at various times of the day. This
means that the system generates false positive and false negatives until it has had time
to improve its statistical model of what is normal.

Anomaly Analysis
Anomaly analysis is the process of defining an expected outcome or pattern to
events, and then identifying any events that do not follow these patterns. This is useful
in tools and environments that enable you to set rules. If network traffic or host-based
events do not conform to the rules, then the system will see this as an anomalous
event. For example, the engine may check packet headers or the exchange of packets
in a session against RFC standards and generate an alert if they deviate from strict
RFC compliance. Anomaly analysis is useful because you don't need to rely on known
malicious signatures to identify something unwanted in your organization, as this can
lead to false negatives.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4B

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
160 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

Behavioral analysis differs from anomaly analysis in that the latter prescribes the baseline
for expected patterns, and the former records expected patterns in relation to the entity
being monitored.

Trend Analysis
Trend analysis is the process of detecting patterns within a dataset over time and
using those patterns to make predictions about future events. Applied to security
intelligence, trend analysis can help you to judge that specific events over time are
related and possibly indicate that an attack is imminent. It can also help you avoid
unforeseen negative effects that result from an attack if you can't stop the attack
altogether. Aside from predicting future events, trend analysis also enables you to
review past events through a new lens. For example, when an incident happens, you'll
usually attribute it to one cause. However, after time has passed and you gather more
intelligence, you may gain a new perspective and realize that the nature of the cause is
different than you had originally thought.
A trend is difficult to spot by examining each event in a log file. Instead, you need
software to visualize the incidence of types of event and show how the number or
frequency of those events changes over time. Trend analysis can apply to frequency,
volume, or statistical deviation:
• Frequency-based trend analysis establishes a baseline for a metric, such as number
of NXERROR DNS log events per hour of the day. If the frequency exceeds (or in
some cases undershoots) the threshold for the baseline, then an alert is raised.

• Volume-based trend analysis can be performed with simpler indicators. For


example, one simple metric for determining threat level is log volume. If logs
are growing much faster than they were previously, there is a good chance that
something needs investigating. Volume-based analysis also applies to network
traffic. You might also measure endpoint disk usage. Client workstations don’t
usually need to store data locally, so if a host's disk capacity has suddenly
diminished, it could be a sign that is being used to stage data for exfiltration.

• Statistical deviation analysis can show when a data point should be treated as
suspicious. Statistical analysis uses the concept of mean (the sum of all values
divided by the number of samples) and standard deviation. Standard deviation is a
measure of how close values in the set are to the mean. If most values are close to
the mean, standard deviation is low. Statistical techniques such as regression and
clustering can be used to determine whether a certain data point is not aligned with
the relationships that most data points share. For example, a cluster graph might
show activity by standard users and privileged users, invoking analysis of behavioral
metrics of what processes each type runs, which systems they access, and so on.
A data point that appears outside the two clusters for standard and administrative
users might indicate some suspicious activity by that account.

Trend analysis depends on choice of metrics to baseline and measure. You should aim
to evaluate the effectiveness of each metric that you track, given the limited resource
that is hours of analyst time. Some areas for trend analysis include:
• Number of alerts and incidents and detection/response times—These types of
metrics show how well security operations are performing. You could potentially
also measure hours lost or impact in cost terms, though these things are hard to
measure and quantify.

• Network and host metrics—You can measure any number of network metrics
(volume of internal and external traffic, numbers of log-ons/log-on failures,
number of active ports, number of authorized or unauthorized devices, instances

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4B

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 161

of unauthorized software, creation of administrative accounts, and so on) but they


might only be interesting from a security perspective if they can reveal deviations
from the network baseline. Most networks change considerably over a period for
genuine business reasons.

• Training/threat awareness education—How well-informed are staff about cyber


threats? You could measure number of programs delivered or use graded
assessments to evaluate knowledge levels.

• Compliance—What percentage of compliance targets are being met? Is the


percentage going up or down? If going down, is this because the compliance targets
are increasing or getting tougher to meet, or because policies are not being followed
correctly?

• Externally measured threat levels—What is the security landscape across the


Internet in general? Are there any major new threats for you to account for?

Trend analysis can provide some defense against sparse attack techniques. The
problem with many monitoring systems is the profusion of false alarms. Each alert
requires so many work hours of human analyst time to investigate. Where the system
is generating high numbers of alerts, a sizable proportion will go uninvestigated. A
sparse attack succeeds either because the sensitivity of the security software has been
turned down to try to reduce false positives or because the actions are buried within
the noise generated by the number of alerts. An attacker can also launch "blinding" or
diversionary attacks to disguise his or her actual target or intention.
In another sense, trend analysis can also refer to narrative-based threat awareness
and intelligence. For example, historically botnets used Internet Relay Chat (IRC) as a
command-and-control mechanism. Security researchers analyzed these techniques
and specified heuristic rulesets that were good at spotting IRC-based C&C mechanisms.
Consequently, the attackers stopped using IRC and started using SSL tunnels to bury
their communications amid the general HTTPS chatter of a regular network. It is vital to
keep up to date with the latest threat intelligence so that your security controls can be
configured and deployed appropriately.

Rule and Query Writing


As you attempt to transform raw data into actionable intelligence, at some point
between data collection and data analysis, you'll need to prepare your raw data to get
it into a form that is useful and efficient for analysis. To some extent, this may be done
for you by your automation tools. You may also have to adjust SIEM rules or manually
prepare some data using capabilities provided by your logging and tracing tools. A
variety of skills can help you in the process of preparing data. Programming, shell
scripting, or batch-file writing skills enable you to develop automation tools. The ability
to write regular expressions can help you search for patterns. 

SIEM Correlation Rules


Correlation means interpreting the relationship between individual data points to
diagnose incidents of significance to the security team. A SIEM correlation rule is a
statement that matches certain conditions. These rules use logical expressions, such as
AND and OR, and operators, such as == (matches), < (less than), > (greater than), and
in (contains). For example, a single-user log-on failure is not a condition that should
raise an alert. Multiple user log-on failures for the same account, taking place within
the space of one hour, is more likely to require investigation and is a candidate for
detection by a correlation rule.
Error.LogonFailure > 3 AND LogonFailure.User AND
Duration < 1 hour

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4B

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
162 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

One of the problems of this type of rule is that it must store persistent state data,
which takes up memory. The SIEM will only be able to store individual items of state
data for a limited period. If there are many correlation rules that use stateful data,
there will be a significant load on the host's processing resources.
Correlation rules depend on normalized data. For example, an IP address only has
value as data in context. It could be a source or destination IP address, it could
be statically or dynamically assigned, or it could be affected by a network address
translation (NAT) service. All these factors can affect whether a correlation between
indicators in one log, such as a firewall, can be made between those in another, such as
a web server's application log. Similarly, local time values can be affected by differences
in time zones or poor clock synchronization.

SIEM Queries
Where a correlation rule matches data as it is first ingested in the SIEM, a query
extracts records from among all the data stored for review or to show as a visualization.
The basic format of a query is:
Select (Some Fields) Where (Some Set of Conditions)
6RUWHG%\ 6RPH)LHOGV
Microsoft's blog introducing Azure log query language (azure.microsoft.com/en-us/blog/
azure-log-analytics-meet-our-new-query-language-2) provides a useful overview of query
syntax. Resources such as the Splunk documentation for Search Processing Language (docs.
splunk.com/Documentation/Splunk/8.0.0/Search/GetstartedwithSearch) will also help you to
understand the features and capabilities of SIEM search, query, and visualization tools.

To learn more, watch the video “Reviewing Query Log” on the CompTIA Learning Center.

String Search and Piping Commands


Filtering a log to discover data points of interest or writing a SIEM correlation rule
usually involves some sort of string search, typically invoking regular expression (regex)
syntax. A regular expression is a search pattern to match within a given string. The
search pattern is built from the regex syntax. This syntax defines metacharacters that
function as search operators, quantifiers, logic statements, and anchors/boundaries.
The following list illustrates some commonly used elements of regex syntax:
• [ … ] matches a single instance of a character within the brackets. This can
include literals, ranges such as [a-z], and token matches, such as [\s] (white
space) or [\d] (one digit).

• + matches one or more occurrences (quantifier). A quantifier is placed after the


term to match; for example, \s+ matches one or more white space characters.

• * matches zero or more times (quantifier).


• ? matches once or not at all (quantifier).
• {} matches a number of times (quantifier). For example, {2} matches two times,
{2,} matches two or more times, and {2-5} matches two to five times.
• ( … ) defines a matching group, with a regex sequence placed within the
parentheses. Each group can subsequently be referred to by \1 for the first group,
\2 for the second, and so on.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4B

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 163

• | the OR operand (logic).


• ^ matches the start of a line only (anchor/boundary).
• $ matches the end of a line only (anchor/boundary).

A complete description of regex syntax is beyond the scope of this course, but you can use
an online reference such as regexr.com or rexegg.com to learn it.

The grep Command


In Unix-like operating systems, the grep command invokes simple string matching
or regex syntax to search text files for specific strings. This enables you to search the
entire contents of a text file for a specific pattern within each line and display that
pattern on the screen or dump it to another file. A simple example of grep usage is as
follows:
grep -F 192.168.1.254 access.log
This searches the text file access.log for all lines containing some variation of the literal
string pattern 192.168.1.254 and prints only those lines to the terminal. The -F
switch instructs grep to treat the pattern as a literal. The following example performs
the same search in any file within the current directory, using double quotes instead of
-F to indicate the literal:
grep "192.168.1.254" *
The following example searches for any IP address in the 192.168.1.0/24 subnet
using regex syntax for the pattern (note that each period must be escaped) within any
file in any directory from the current one. The -r option enables recursion, while the
period in the target part indicates the current directory:
grep -r 192\.168\.1\.[\d]{1,3} .
Some of the other options to modify the behavior of grep include:

Option Description
-i By default, literal search strings in grep
are case-sensitive. This option ignores
case sensitivity.

-v Reverses the command's default be-


havior, returning only lines that do not
match the given string.

-w Treats literal search strings as discrete


words. By default, the string add will
also return address. With this option,
the string add will only return instances
of the word add by itself.

-c Returns the total count of matching lines


rather than the lines themselves.

-l Returns the names of the files with


matching lines rather than the lines
themselves. Primarily used in multi-file
grep searches.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4B

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
164 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

Option Description
-L Like the behavior of the -v option, in
that it returns the names of files without
matching lines.

In Windows, you can use the ƴQG command for basic string matching. The ƴQGVWU
command supports regex syntax.

The cut and sort Commands


The cut command enables you to specify which text on a line you want to remove
from your results so that they're easier for you to read. Many cut operations use
the -c option, which enables you to specify which characters to cut. Here's a basic
example:
FXWFV\VORJW[W
This will return only the fifth character in each line of the syslog.txt file. You can also
specify multiple characters to cut or a range to cut by using -c#,#, and -c#-#,
respectively. Using -c5- cuts from the fifth character to the end of the line. The other
major use of cut is with the -f and -d flags. Take the following example:
FXWGIV\VORJW[W
The -d flag identifies a delimiter or a character that acts as a separator in the source
string. In this case, the delimiter is a space. The -f flag is like the -c flag, but instead
of cutting by characters, it cuts by whatever delimiter you specified, so -f1-4 will
return the first four columns.
The sort command can be used to change the output order, using -t to identify
the delimiter and -k for the key field. -r sorts in reverse order. -n specifies using
numerical sort order rather than alphabetical. 

Piping
The output of a command can be used as the input for another command—a process
called piping. Using the pipe character (|) causes the following command to take the
output of a previous command as its input. For example, to return only lines in syslog.
txt that deal with the NetworkManager process, while also cutting each line so that only
the date, time, source, and process display, you would enter:
JUHS1HWZRUN0DQDJHUYDUORJV\VORJ_FXWG
-f1-5 | sort -t " " -k3
In this example, the grep command feeds into the cut command, and then into the
sort command, producing a more focused output.
The head and tail Commands
The head and tail commands output the first and last 10 lines respectively of a
file you provide. You can also adjust this default value to output more or fewer lines.
The tail tool is useful for reviewing the most recent entries in a log file.

To learn more, watch the video “Analyzing, Filtering, and Searching Event Log” on the
CompTIA Learning Center.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4B

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 165

Scripting Tools
While issuing a search command sequence manually is useful for one-off analysis, in
many circumstances you might want to run searches on multiple files and according
to a schedule. To do this, you need to use the commands within the context of a script.
We will look at shell scripting languages for Linux (Bash) and Windows (PowerShell), but
be aware that languages such as Python and Ruby are also widely used for automation.

Bash
Bash is a scripting language and command shell for Unix-like systems. It is the default
shell for Linux and macOS. Tools like grep, cut, and sort are built into the Bash
shell. Beyond individual command entry, Bash can run complex scripts. Like standard
programming languages, Bash supports elements such as variables, loops, conditional
statements, functions, and more. The following is an example of a simple Bash script
that uses the grep and cut commands:
#!/bin/bash
echo "Pulling NetMan entries..."
JUHS1HWZRUN0DQDJHUYDUORJV\VORJ_FXWG
-f1-5 > netman-log.txt
HFKR1HW0DQORJƴOHFUHDWHG
The first line of the script indicates what type of interpreter the system should run, as
there are many different scripting languages. The echo lines simply print messages to
the console. The grep line pipes in cut to trim the syslog as before, and outputs (>)
the results to a file called netman-log.txt.

For a more in-depth look at Bash scripting, visit tldp.org/LDP/abs/html.

Newer versions of Windows 10 include a Linux subsystem that supports the Bash shell.

awk
The feature awk is a scripting engine geared toward modifying and extracting data
from files or data streams, which can be useful in preparing data for analysis. Programs
and scripts run in awk are written in the AWK programming language. The awk
keyword is followed by the pattern, the action to be performed, and the file name. The
action to be performed is given within curly braces. The pattern and the action to be
performed should be specified within single quotes. If the pattern is not specified, the
action is performed on all input data; however, if the action is not specified, the entire
line is printed.

Windows Management Instrumentation Command-Line (WMIC)


The Windows Management Instrumentation Command-line (WMIC) is used to review
log files on a remote Windows machine. The main alias that you can use in WMIC to
review logs is NTEVENT. NTEVENT will, given a certain input, return log entries that
match your parameters. For example:
ZPLF17(9(17:+(5(/RJ)LOH
6HFXULW\
$1'
(YHQW7\SH *(76RXUFH1DPH7LPH*HQHUDWHG0HVVDJH
Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4B

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
166 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

This will select all security event log entries whose events are type 5 (audit failure). It
will then output the source, the time the event was generated, and a brief message
about the event. This can be useful for finding specific events based on their details,
without being at the target computer and combing through Event Viewer.

Windows PowerShell
Windows administrators often use PowerShell to manage both local and remote hosts.
PowerShell offers much greater functionality than the traditional Windows command
prompt. PowerShell functions mainly through the use of cmdlets, which are specialized
.NET commands. These cmdlets typically take the syntax of Verb-Noun, such as Set-
Date, to change a system's date and time. Like other command shells, the cmdlet will
take whatever valid argument the user provides. PowerShell is also able to execute
scripts written to its language. Like Bash, the PowerShell scripting language supports a
wide variety of control structures.
The following is an example of a PowerShell script:
Write-Host "Retrieving logon failures..."
*HW(YHQW/RJ1HZHVW/RJ1DPH6HFXULW\,QVWDQFH,G
4625 | select
timewritten, message | Out-File C:\log-fail.txt
Write-Host "Log created!"
The Write-Host cmdlets function similar to echo by printing the given text to the
PowerShell window. The Get-EventLog cmdlet line searches the security event log for
the latest five entries that match an instance ID of 4625—the log-on failure code. The
time the event was logged and a brief descriptive message are then output to the log-
fail.txt file.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4B

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 167

5HYLHZ$FWLYLW\
4XHU\/RJDQG6Ζ(0'DWD$QDO\VLV
Answer the following questions to test your understanding of the content covered in
this topic.

1. :KDWW\SHRIYLVXDOL]DWLRQLVPRVWVXLWDEOHIRULGHQWLI\LQJWUDɝFVSLNHV"

2. You need to analyze the destination IP address and port number from some
ȴUHZDOOGDWD7KHGDWDLQWKHLSWDEOHVȴOHLVLQWKHIROORZLQJIRUPDW

DATE,FACILITY,CHAIN,IN,SRC,DST,LEN,TOS,PREC,TTL,ID,
PROTO,SPT,DPT
Jan 11 05:33:59,lx1 kernel: iptables,INPUT,eth0,
10.1.0.102,10.1.0.1,52,0x00,0x00,128,2242,T
CP,2564,21
Write the command to select only the necessary data, and sort it by
destination port number.

3. :RUNLQJZLWKWKHVDPHGDWDȴOHZULWHWKHFRPPDQGWRVKRZRQO\WKHOLQHV
where the destination IP address is 10.1.0.10 and the destination port is 21.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4B

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
168 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

/DE$FWLYLW\
Analyzing, Filtering, and Searching
(YHQW/RJDQGV\VORJ2XWSXW

EXAM OBJECTIVES COVERED


3.1 Given a scenario, analyze data as part of security monitoring activities.

Scenario
When you have set up appropriate data sources for a security information and event
management (SIEM) system, the next challenge is to extract actionable intelligence
from it. A SIEM will be used to respond to incidents in real time and also to perform
threat hunting for incidents that might not have been detected. Both these use
cases will depend on effective queries, filters, and visualizations so that analysts are
presented with useful information and not overloaded by false positive alerts. To
demonstrate some of the use of these tools, we will continue to use the Security Onion
(securityonion.net) appliance.

Lab Setup
If you are completing this lab using the CompTIA Labs hosted environment, access the
lab using the link provided. Note that you should follow the instructions presented in
the CompTIA Labs interface, NOT the steps below. If you are completing this lab using
a classroom computer, use the VMs installed to Hyper-V on your HOST computer, and
follow the steps below to complete the lab.
Start the SIEM1 VM only to use in this lab, adjusting the memory allocation first, if
necessary.

Analyze a Dashboard
Where the Sguil tool is used to manage and categorize alerts, escalating or dismissing
them as appropriate, Squert (squertproject.org) provides an overview of current status.
Analysis at this operational or "big picture" level is just as important as at the tactical
level.
1. Open a connection window for the SIEM1 VM and log on as siem with the
password Pa$$w0rd.

2. From the desktop, right-click the Squert icon and select Open. In the browser,
if prompted, sign in to the app with the username siem and the password
Pa$$w0rd.
3. Click the INTERVAL link to open the date picker. If necessary, select 2020 and
then Mar and Mon16.

You are now looking at the alerts raised by the IDS (Snort) as a result of sample
packet captures that were replayed through the sensor. Note that there are

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4B

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 169

some other alerts from the OSSEC agent installed on Security Onion. If you were
to configure other sensors and agents, that data would also be collected and
summarized here.

The SC and DC columns show the number of source and destination IPs involved
for each event signature, while the activity column shows the number of events
of that signature per hour. This kind of dashboard is designed to provide overall
threat status reporting. The default view shows only queued events that have not
yet been analyzed and categorized by an incident handler.

4. At the top of the page, select the Summary tab. This tab shows you information
about which signatures, IP addresses, and ports are most active. It also displays
location information for each public IP and summarizes connections by source
and destination IPs and countries.

One of the functions of a SIEM dashboard is pivoting functionality to correlate


attributes in one event to other information stored in the database.

5. Select the Events tab again. Click the Queue box (with the value 24) for the ET
TROJAN Possible Windows executable sent event (2009897).

Additional detail about each instance of each event is shown.

Using the Squert event dashboard. (Screenhot Squert squertproject.org.)

6. Click the first Queue box for the expanded event. The indicators comprising the
event are shown. 

7. Click the first Event ID (3.47). The packet capture underlying the detected event is
shown in a new tab.

Note the GET request for a cryptically named php file with some parameters
whose functionality is opaque. As an old threat, we could look up the functionality
of this malware in a threat database. If it were an unknown threat, we could
isolate the host and use a sandbox to try to determine the code's functions.

8. Close the capME! tab.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4B

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
170 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

9. On the Squert tab, click the source IP address starting 195. A number of
information sources is shown. Select Kibana. In Kibana, click the date picker and
select Last 5 years.

The current view has applied the IP address as a search term in the bar at the top.
You can see how many alerts the address is associated with and how many times
it appears as a source and destination address.

10. Close the browser.

Use grep for Manual Log Analysis


You will not always have access to a SIEM. There may be data sources that have never
been configured to ship data to one or organizations that do not have the budget or
resources to provision and manage one. Consequently, you should also be comfortable
working with Linux commands to extract useful information from log files directly.
1. On the SIEM1 VM, open a terminal. Run the following command to show the
contents of the logging directory:

cd /var/log && ls
This is the primary log folder for Linux. Some of the log files are processed by
syslog, while others are written directly by the application. Most application-
written logs are stored in subdirectories. For example, the web server Apache
writes to /var/log/apache2 or /var/log/httpd.

2. At the terminal, enter man grep and note the options available with the grep
command. The grep command is an extremely useful tool for searching any file,
not just logs.

3. Scroll through the grep manual until you return to the prompt. Alternatively, enter
q to return to the prompt.
4. Run the following command, entering Pa$$w0rd when prompted:

VXGRJUHSURRWV\VORJ
This shows all instances of the string root in the syslog file. You can search for any
text string in any file this way. As with most things in Linux, these searches are
case-sensitive by default.

5. Run the following command:

VXGRJUHSURRWV\VORJ
This command searches for the word "root" in all files that start with syslog,
including syslog.1. The log rotation system usually backs up the last log as .1,
while older logs are gzipped.

While grep does not search gzipped files, zgrep does!

6. Run the following command:

VXGRJUHSLHUURUV\VORJ
The -i flag makes the search case-insensitive.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4B

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 171

7. How would you use grep to look for a negative match for a pattern rather than a
positive match?

Use Tools to Format Queries


The output from searching multiple log files can be overwhelming. You might be able to
design a tighter search pattern to return fewer matches. On the other hand, if there are
simply a lot of results to scan, you can use formatting tools to make the output easier
to analyze.
1. At the terminal, run the following command:

VXGRFXWFV\VORJ
This command displays the first 32 characters of each log item in the file.
Adjusting for the length of the host name, this sort of command can show the
date and time, user, process name, and process ID of each log item.

2. Run the following command:

VXGRFXWFV\VORJ
This command displays from character 31 to the end of the line.

One complication with this sort of approach is that logs can be in different formats.

3. Run the following commands and compare the output:

sudo tail auth.log


sudo cat auth.log
sudo cat auth.log | less
If you cat an entire file, you can scroll back using SHIFT+3$*(83. However, the
terminal may not store enough lines to see the whole thing. Piping the command
to less allows you to page up and down normally. You can press q to return to the
prompt.

4. Run the following commands and compare the output to the auth log:

cd ~/Downloads
head conn-sample.log
This log file is generated by the Bro IDS. Bro usually logs in JSON format, but this
has been changed to tab-delimited in this sample log file. head displays the first
10 lines from the file. In the case of this Bro log, field definitions are included.
When you have a log file using standard delimiters, use cut to extract fields
from the source file. The -f flag enables you to search by fields. The -d flag
enables you to specify what separates (delimits) each field. The tab is the default,
however, so you do not need to use -d with this file.

5. Use the output to work out the column numbers for source IP (id_orig_h),
destination port (id_resp_p), and orig_bytes (payload data sent by the originator of
the connection). Then, run the command.

6. Run clear to remove the previous output. Pipe the command to sort so that it
is shown in descending order of byte count.

7. Run clear to remove the previous output. Sort by port number in ascending
order and then by byte count in descending order.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4B

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
172 | The Official CompTIA CySA+ Student Guide (Exam CS0-002)

8. Run clear to remove the previous output. See if you can construct a regular
expression to filter the output to IPv4 addresses only. You will need to use grep
-E or egrep.

Close the Lab


Discard changes made to the VMs in this lab.
• Switch to the Hyper-V Manager console on the HOST.

• For each VM that is running, right-click and select Revert to set the configuration
back to the saved checkpoint.

Lesson 4: Collecting and Querying Security Monitoring Data | Topic 4B

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022
The Official CompTIA CySA+ Student Guide (Exam CS0-002) | 173

Lesson 4
Summary
You should be able to explain the factors to consider when deploying a SIEM plus the
options for collecting network and log data from diverse sources. You should also be
able to use dashboards, string search, and queries to extract relevant information from
data sources.

Guidelines for Collecting and Querying Security


Monitoring Data
Follow these guidelines when you develop or update use of SIEM or other log/security
data collection techniques in your organization:
• Identify data sources for collection plus use cases for filtering and querying the data
to provide alerting about threats that are relevant to your organization.

• Configure the SIEM or manual collection methods to normalize security data to


standard fields and date/time formats, considering sources such as Windows Event
Log and syslog.

• Ensure that logs are stored within secure architecture with appropriate access
permissions and tamper protection.

• Configure one or more dashboards to provide actionable status and alert


information for analysts, plus status information for department managers and
executives.

• Identify the analysis methods used to query and filter data to produce alerts,
including conditional, heuristic, behavioral, and anomaly-based. Evaluate methods
against the numbers of false positives and false negatives.

• Make command-line tools such as grep, cut, and sort available for manual analysis.
Consider the use of scripts to automate detection functions not supported by a
SIEM.

Additional Practice Questions are available on the CompTIA Learning Center.

Lesson 4: Collecting and Querying Security Monitoring Data

LICENSED FOR USE ONLY BY: AFENDEY JINIR · 14899581 · JAN 18 2022

You might also like