Mastering Splunk Sample Chapter
Mastering Splunk Sample Chapter
Mastering Splunk Sample Chapter
ee
Sa
m
pl
In various other leadership roles, such as project and team leader, lead developer, and
applications development director, James has managed and directed multiple resources,
using a variety of technologies and platforms.
He has authored the book IBM Cognos TM1 Developer's Certification Guide, Packt
Publishing, and a number of whitepapers on best practices, such as Establishing a Center
of Excellence. Also, he continues to post blogs on a number of relevant topics based on
personal experiences and industry best practices.
He currently holds the following technical certifications:
Mastering Splunk
This book is designed to go beyond the introductory topics of Splunk, introducing more
advanced concepts (with examples) from an enterprise architectural perspective. This
book is practical yet introduces a thought leadership mindset, which all Splunk masters
should possess.
This book walks you through all of the critical features of Splunk and makes it easy to
help you understand the syntax and working examples for each feature. It also introduces
key concepts for approaching Splunk's knowledge development from an
enterprise perspective.
Splunk (the product) runs from both a standard command line as well as from an
interface that is totally web-based (which means that no thick client application
needs to be installed to access and use the tool) and performs large-scale, high-speed
indexing on both historical and real-time data.
Splunk does not require a restore of any of the original data but stores a compressed
copy of the original data (along with its indexing information), allowing you
to delete or otherwise move (or remove) the original data. Splunk then utilizes
this searchable repository from which it efficiently creates graphs, reports, alerts,
dashboards, and detailed visualizations.
Splunk's main product is Splunk Enterprise, or simply Splunk, which was
developed using C/C++ and Python for maximum performance and which
utilizes its own Search Processing Language (SPL) for maximum functionality
and efficiency.
The Splunk documentation describes SPL as follows:
"SPL is the search processing language designed by Splunk for use with
Splunk software. SPL encompasses all the search commands and their functions,
arguments, and clauses. Its syntax was originally based upon the UNIX pipeline
and SQL. The scope of SPL includes data searching, filtering, modification,
manipulation, insertion, and deletion."
Keeping it simple
You can literally install Splunkon a developer laptop or enterprise server and
(almost) everything in betweenin minutes using standard installers. It doesn't
require any external packages and drops cleanly into its own directory (usually
into c:\Program Files\Splunk). Once it is installed, you can check out the
readmesplunk.txtfile (found in that folder) to verify the version number
of the build you just installed and where to find the latest online documentation.
Note that at the time of writing this book, simply going to the website http://docs.
splunk.com will provide you with more than enough documentation to get you
started with any of the Splunk products, and all of the information is available to be
read online or to be downloaded in the PDF format in order to print or read offline.
In addition, it is a good idea to bookmark Splunk's Splexicon for further reference.
Splexicon is a cool online portal of technical terms that are specific to Splunk, and all
the definitions include links to related information from the Splunk documentation.
[6]
Chapter 1
After installation, Splunk is ready to be used. There are no additional integration steps
required for Splunk to handle data from particular products. To date, Splunk simply
works on almost any kind of data or data source that you might have access to, but
should you actually require some assistance, there is a Splunk professional services
team that can answer your questions or even deliver specific integration services.
This team has reported to have helped customers integrate with technologies such
as Tivoli, Netcool, HP OpenView, BMC PATROL, and Nagios.
Single machine deployments of Splunk (where a single instance or the Splunk
server handles everything, including data input, indexing, searching, reporting,
and so on) are generally used for testing and evaluations. Even when Splunk
is to serve a single group or department, it is far more common to distribute
functionalities across multiple Splunk servers.
For example, you might have one or more Splunk instance(s) to read input/data,
one or more for indexing, and others for searching and reporting. There are many
more methodologies for determining the uses and number of Splunk instances
implemented such as the following:
Applicable purpose
Type of data
Security
Files and/or directories: This is the data that exists as physical files or
locations where files will exist (directories or folders).
Other sources: This data source type covers pretty much everything else,
such as mainframe logs, FIFO queues, and scripted inputs to get data from
APIs and other remote data interfaces.
[8]
Chapter 1
Installing and configuring more than one instance of Splunk, where each is
configured for only the data intended for an appropriate audience
The use of Splunk apps such as configuring each app appropriately for a
specific use, objective, or perhaps for a Splunk security role
[9]
[ 10 ]
Chapter 1
A NoSQL query approach method is used by Splunk that is reportedly based on the
Unix command's pipelining concepts and does not involve or impose any predefined
schema. Splunk's search processing language (SPL) encompasses Splunk's search
commands (and their functions, arguments, and clauses).
Search commands tell Splunk what to do with the information retrieved from
its indexed data. An example of some Splunk search commands include stats,
abstract, accum, crawl, delta, and diff. (Note that there are many more search
commands available in Splunk, and the Splunk documentation provides working
examples of each!)
"You can point Splunk at anything because it doesn't impose a schema when you
capture the data; it creates schemas on the fly as you run queries" explained Sanjay
Meta, Splunk's senior director of product marketing.
InformationWeek 1/11/2012.
Subsearches (searches that actually take the results of one search and then
use them as input or to affect other searches)
SQL-like joins
Flexible searching and correlating are not Splunk's only magic. Using Splunk, users
can also rapidly construct reports and dashboards, and using visualizations (charts,
histograms, trend lines, and so on), they can understand and leverage their data
without the cost associated with the formal structuring or modeling of the data first.
[ 11 ]
Investigational searching
Investigational Searching
Monitoring and Alerting
Decision Support Analysis
Investigational searching
The practice of investigational searching usually refers to the processes of scrutinizing
an environment, infrastructure, or large accumulation of data to look for an occurrence
of specific events, errors, or incidents. In addition, this process might include locating
information that indicates the potential for an event, error, or incident.
As mentioned, Splunk indexes and makes it possible to search and navigate through
data and data sources from any application, server, or network device in real time.
This includes logs, configurations, messages, traps and alerts, scripts, and almost any
kind of metric, in almost any location.
"If a machine can generate it - Splunk can index it"
www.Splunk.com
Splunk's powerful searching functionality can be accessed through its Search &
Reporting app. (This is also the interface that you used to create and edit reports.)
A Splunk app (or application) can be a simple search collecting events, a group of
alerts categorized for efficiency (or for many other reasons), or an entire program
developed using the Splunk's REST API.
[ 12 ]
Chapter 1
The Search & Reporting app provides you with a search bar, time range picker, and a
summary of the data previously read into and indexed by Splunk. In addition, there
is a dashboard of information that includes quick action icons, a mode selector, event
statuses, and several tabs to show various event results.
Splunk search provides you with the ability to:
Find errors that cross multiple tiers of an infrastructure (and even access
Cloud-based environments)
Users are also allowed to accelerate their searches by shifting search modes:
They can use the fast mode to quickly locate just the search pattern
They can use the verbose mode to locate the search pattern and also return
related pertinent information to help with problem resolution
A more advanced feature of Splunk is its ability to create and run automated
searches through the command-line interface (CLI) and the even more advanced,
Splunk's REST API.
Splunk searches initiated using these advanced features do not go through
Splunk Web; therefore, they are much more efficient (more efficient because
in these search types, Splunk does not calculate or generate the event timeline,
which saves processing time).
[ 13 ]
Monitoring
Monitoring numerous applications and environments is a typical requirement of
any organization's data or support center. The ability to monitor any infrastructure
in real time is essential to identify issues, problems, and attacks before they can
impact customers, services, and ultimately profitability.
With Splunk's monitoring abilities, specific patterns, trends and thresholds, and
so on can be established as events for Splunk to keep an alert for, so that specific
individuals don't have to.
[ 14 ]
Chapter 1
Splunk can also trigger notifications (discussed later in this chapter) in real time so
that appropriate actions can be taken to follow up on an event or even avoid it as
well as avoid the downtime and the expense potentially caused by an event.
Splunk also has the power to execute actions based on certain events or conditions.
These actions can include activities such as:
Sending an e-mail
For all events, all of this event information is tracked by Splunk in the form of its
internal (Splunk) tickets that can be easily reported at a future date.
Typical Splunk monitoring marks might include the following:
MS Windows event logs and Windows printer information: Splunk has the
ability to locate problems within MS Windows systems and printers located
anywhere within the infrastructure.
Files and directories: With Splunk, you can literally monitor all your data
sources within your infrastructure, including viewing new data when it arrives.
WMI-based data: You can pull event logs from all the Windows servers
and desktops in your environment without having to install anything on
those machines.
[ 15 ]
Alerting
In addition to searching and monitoring your big data, Splunk can be configured to
alert anyone within an organization as to when an event occurs or when a search
result meets specific circumstances. You can have both your real-time and historical
searches run automatically on a regular schedule for a variety of alerting scenarios.
You can base your Splunk alerts on a wide range of threshold and trend-based
situations, for example:
Utilizations
All alerts in Splunk are based on timing, meaning that you can configure an alert as:
Real-time alerts: These are alerts that are triggered every time a search
returns a specific result, such as when the available disk space reaches a
certain level. This kind of alert will give an administrator time to react to
the situation before the available space reaches its capacity.
Rolling time-frame alerts: These alerts can be configured to alert you when
a specific condition occurs within a moving time frame. For example, if the
number of acceptable failed login attempts exceed 3 in the last 10 minutes
(the last 10 minutes based on the time for which a search runs).
Splunk also allows you to create scheduled reports that trigger alerts to perform
an action each time the report runs and completes. The alert can be in the form of a
message or provide someone with the actual results of the report. (These alert reports
might also be set up to alert individuals regardless of whether they are actually set
up to receive the actual reports!)
[ 16 ]
Chapter 1
Reporting
Alerts create records when they are triggered (by the designated event occurrence or
when the search result meets the specific circumstances). Alert trigger records can be
reviewed easily in Splunk, using the Splunk alert manager (if they have been enabled
to take advantage of this feature).
The Splunk alert manager can be used to filter trigger records (alert results)
by application, the alert severity, and the alert type. You can also search for
specific keywords within the alert output. Alert/trigger records can be set up to
automatically expire, or you can use the alert manager to manually delete individual
alert records as desired.
Reports can also be created when you create a search (or a pivot) that you would like
to run in the future (or share with another Splunk user).
Configuration changes
User activities
User requests
Operational events
Incidents
Deployments
Streaming events
Additionally, as the world digitizes the volume, the velocity and variety of
additional types of data becoming available for analysis increases.
The ability to actually gain (and maintain) visibility in this operationally vital
information is referred to as gaining operational intelligence.
[ 17 ]
Operational intelligence
Operational intelligence (OI) is a category of real-time, dynamic, business analytics
that can deliver key insights and actually drive (manual or automated) actions
(specific operational instructions) from the information consumed.
A great majority of IT operations struggle today to access and view operational data,
especially in a timely and cost-efficient manner.
Today, the industry has established an organization's ability to evaluate and
visualize (the volumes of operational information) in real time as the key metric
(or KPI) to evaluate an organization's operational ability to monitor, support,
and sustain itself.
At all levels of business and information technology, professionals have begun to
realize how IT service quality can impact their revenue and profitability; therefore,
they are looking for OI solutions that can run realistic queries against this information
to view their operational data and understand what is occurring or is about to occur, in
real time.
Having the ability to access and understand this information, operations can:
Measure performance
Chapter 1
A technology-agnostic approach
Splunk can index and harness all the operational data of an organization and
deliver true service-level reporting, providing a centralized view across all of
the interconnected application components and the infrastructuresall without
spending millions of dollars in instrumenting the infrastructure with multiple
technologies and/or tools (and having to support and maintain them).
No matter how increasingly complex, modular, or distributed and dynamic systems
have become, the Splunk technology continues to make it possible to understand
these system topologies and to visualize how these systems change in response to
changes in the environment or the isolated (related) actions of users or events.
Splunk can be used to link events or transactions (even across multiple technology
tiers), put together the entire picture, track performance, visualize usage trends,
support better planning for capacity, spot SLA infractions, and even track how the
support team is doing, based on how they are being measured.
Splunk enables new levels of visibility with actionable insights to an organization's
operational information, which helps in making better decisions.
Increased productivity
Higher efficiency
Better communication
Cost reduction
Time savings
Supportive education
[ 19 ]
Trend/pattern identification
Many more
Can you use Splunk as a real-time decision support system? Of course, you can!
Splunk becomes your DSS by providing the following abilities for users:
Splunk possess the ability to collect real-time data with details of this data
(collected in an organization's master or other data) and so much more
[ 20 ]
Chapter 1
Another significant feature of Splunk is that it does not require data to be specifically
extracted, transformed, and then (re)loaded (ETL'ed) into an accessible model for
Splunk to get started. Splunk just needs to be pointed to the data for it to index the
data and be ready to go.
These capabilities (along with the ability to easily create dashboards and applications
based on specific objectives), empower the Splunk user (and the business) with key
insightsall in real time.
Data architects can expand the scope of the data being used in their other
analytical tools
Splunk can also be the engine behind applications by exploiting the Splunk ODBC
connector to connect to and access any data already read into and indexed by
Splunk, harnessing the power and capabilities of the data, perhaps through an
interface more familiar to a business analyst and not requiring specific programming
to access the data.
[ 21 ]
ODBC
An analyst can leverage expertise in technologies such as MS Excel or Tableau to
perform actions that might otherwise require a Splunk administrator using the
Splunk ODBC driver to connect to Splunk data. The analyst can then create specific
queries on the Splunk-indexed data, using the interface (for example, the query
wizard in Excel), and then the Splunk ODBC driver will transform these requests
into effectual Splunk searches (behind the scenes).
[ 22 ]
Chapter 1
Emerging technologies
Emerging technologies include the technical innovations that represent progressive
developments within a field such as agriculture, biomed, electronic, energy,
manufacturing, and materials science to name a few. All these areas typically
deal with a large amount of research and/or test data.
Disaster recovery
Disaster recovery (DR) refers to the process, policies, and procedures that are related
to preparing for recovery or the continuation of technology infrastructure, which
are vital to an organization after a natural or human-induced disaster. All types of
information is continually examined to help put control measures in place, which
can reduce or eliminate various threats for organizations. Different types of data
measures can be included in disaster recovery, control measures, and strategies.
Virus protection
The business of virus protection involves the ability to detect known threats and
identify new and unknown threats through the analysis of massive volumes of
activity data. In addition, it is important to strive to keep up with the ever-evolving
security threats by identifying new attacks or threat profiles before conventional
methods can.
Project management
Project management is another area that is always ripe for improvement by
accessing project specifics across all the projects in all genres. Information generated
by popular project management software systems (such as MS Project or JIRA, for
example) can be accessed to predict project bottlenecks or failure points, risk areas,
success factors, and profitability or to assist in resource planning as well as in sales
and marketing programs.
The entire product development life cycle can be made more efficient, from
monitoring code checkins and build servers to pinpointing production issues in real
time and gaining a valuable awareness of application usage and user preferences.
Firewall applications
Software solutions that are firewall applications will be required to pour through the
volumes of firewall-generated data to report on the top blocks and accesses (sources,
services, and ports) and active firewall rules and to generally show traffic patterns
and trends over time.
Hadoop technologies
What is Hadoop anyway? The Hadoop technology is designed to be installed and
run on a (sometimes) large number of machines (that is, in a cluster) that do not
have to be high-end and share memory or storage.
The object is the distributed processing of large data sets across many severing
Hadoop machines. This means that virtually unlimited amounts of big data can
be loaded into Hadoop because it breaks up the data into segments or pieces and
spreads it across the different Hadoop servers in the cluster.
There is no central entry point to the data; Hadoop keeps track of where the data
resides. Because there are multiple copy stores, the data stored on a server that goes
offline can be automatically replicated from a known good copy.
[ 24 ]
Chapter 1
So, where does Splunk fit in with Hadoop? Splunk supports the searching
of data stored in the Hadoop Distributed File System (HDFS) with Hunk
(a Splunk app). Organizations can use this to enable Splunk to work with
existing big data investments.
Media measurement
This is an exciting area. Media measurement can refer to the ability to measure
program popularity or mouse clicks, views, and plays by device and over a period
of time. An example of this is the ever-improving recommendations that are made
based on individual interestsderived from automated big data analysis and
relationship identification.
Social media
Today's social media technologies are vast and include ever-changing content. This
media is beginning to be actively monitored for specific information or search criteria.
This supports the ability to extract insights, measure performance, identify
opportunities and infractions, and assess competitor activities or the ability to be
alerted to impending crises or conditions. The results of this effort serve market
researchers, PR staff, marketing teams, social engagement and community staff,
agencies, and sales teams.
Splunk can be the tool to facilitate the monitoring and organizing of this data into
valuable intelligence.
[ 25 ]
Splunk in action
Today, it is reported that over 6,400 customers across the world rely on the
Splunk technology in some way to support their operational intelligence initiatives.
They have learned that big data can provide them with a real-time, 360-degree view
of their business environments.
Summary
In this chapter, we provided you with an explanation of what Splunk is, where
it was started, and what its initial focus was. We also discussed the evolution of
the technology, giving the conventional use cases as well as some more advanced,
forward-thinking, or out-of-the-box type opportunities to leverage the technology
in the future.
In the next chapter, we will explore advanced searching topics and provide
practical examples.
[ 26 ]
www.PacktPub.com
Stay Connected: