Exploring Splunk
Exploring Splunk
Exploring Splunk
SEARCH PROCESSING LANGUAGE (SPL) PRIMER AND COOKBOOK By David Carasso, Splunks Chief Mind
CITO Research
New York, NY
Disclaimer
This book is intended as a text and reference book for reading purposes only. The actual use of Splunks software products must be in accordance with their corresponding software license agreements and not with anything written in this book. The documentation provided for Splunks software products, and not this book, is the definitive source for information on how to use these products. Although great care has been taken to ensure the accuracy and timeliness of the information in this book, Splunk does not give any warranty or guarantee of the accuracy or timeliness of the information and Splunk does not assume any liability in connection with any use or result from the use of the information in this book. The reader should check at docs.splunk. com for definitive descriptions of Splunks features and functionality.
Table of Contents
Preface About This Book Whats In This Book? i ii
Conventionsii Acknowledgmentsiii PART I: EXPLORING SPLUNK 1 The Story of Splunk Splunk to the Rescue in the Datacenter Splunk to the Rescue in the Marketing Department Approaching Splunk Splunk: The Company and the Concept How Splunk Mastered Machine Data in the Datacenter Operational Intelligence Operational Intelligence at Work 2 Getting Data In Machine Data Basics Types of Data Splunk Can Read Splunk Data Sources Downloading, Installing, and Starting Splunk Bringing Data in for Indexing Understanding How Splunk Indexes Data 3 Searching with Splunk The Search Dashboard SPL: Search Processing Language 23 27 13 15 15 15 17 18 3 4 5 7 8 9 11
Pipes27 Implied AND top user elds percent The search Command Tips for Using the search Command 4 SPL: Search Processing Language Sorting Results Filtering Results 33 35 sort33 where35 dedup36 head Grouping Results Reporting Results 38 39 41 28 28 28 29 30
Subsearches30
transaction39 top41 stats43 chart45 timechart Filtering, Modifying, and Adding Fields 47 48
elds49 replace50 eval51 rex52 lookup53 5 Enriching Your Data Using Splunk to Understand Data Identifying Fields: Looking at the Pieces of the Puzzle 55 56
Exploring the Data to Understand its Scope Preparing for Reporting and Aggregation Visualizing Data Creating Visualizations Creating Dashboards Creating Alerts Creating Alerts through a Wizard Tuning Alerts Using Manager Customizing Actions for Alerting The Alerts Manager PART II: RECIPES 6 Recipes for Monitoring and Alerting Monitoring Recipes Monitoring Concurrent Users Monitoring Inactive Hosts Reporting on Categorized Data Comparing Todays Top Values to Last Months Finding Metrics That Fell by 10% in an Hour Charting Week Over Week Results Identify Spikes in Your Data Compacting Time-Based Charting Reporting on Fields Inside XML or JSON Extracting Fields from an Event Alerting Recipes Alerting by Email when a Server Hits a Predened Load Alerting When Web Server Performance Slows Shutting Down Unneeded EC2 Instances Converting Monitoring to Alerting
58 60 65 65 67 68 68 71 74 74
79 79 80 81 82 84 85 86 88 88 89 90 90 91 91 92
7 Grouping Events Introduction95 Recipes97 Unifying Field Names Finding Incomplete Transactions Calculating Times within Transactions Finding the Latest Events Finding Repeated Events Time Between Transactions Finding Specic Transactions Finding Events Near Other Events Finding Events After Events Grouping Groups 8 Lookup Tables Introduction113 lookup113 inputlookup113 outputlookup113 Further Reading Setting Default Lookup Values Using Reverse Lookups Using a Two-Tiered Lookup Using Multistep Lookups Creating a Lookup Table from Search Results Appending Results to Lookup Tables Using Massive Lookup Tables Comparing Results to Lookup Values 114 114 114 116 116 117 117 118 120 Recipes114 97 97 99 100 101 102 104 107 108 109
Controlling Lookup Matches Matching IPs Matching with Wildcards Appendix A: Machine Data Basics Application Logs Web Access Logs Web Proxy Logs Call Detail Records Clickstream Data Message Queuing Packet Data Conguration Files Database Audit Logs and Tables File System Audit Logs Management and Logging APIs OS Metrics, Status, and Diagnostic Commands Other Machine Data Sources Appendix B: Case Sensitivity Appendix C: Top Commands Appendix D: Top Resources Appendix E: Splunk Quick Reference Guide CONCEPTS
122 122 123 126 126 127 127 127 128 128 128 128 128 129 129 129
137
Overview137 Events137 Sources and Sourcetypes 138 Hosts138 Indexes138 Fields138 Tags138
139 139
Apps139 Permissions/Users/Roles139 Transactions139 Forwarder/Indexer140 SPL140 Subsearches141 Relative Time Modiers COMMON SEARCH COMMANDS Optimizing Searches SEARCH EXAMPLES EVAL FUNCTIONS COMMON STATS FUNCTIONS REGULAR EXPRESSIONS COMMON SPLUNK STRPTIME FUNCTIONS 141 142 142 143 146 151 152 153
Preface
Splunk Enterprise Software (Splunk) is probably the single most powerful tool for searching and exploring data that you will ever encounter. We wrote this book to provide an introduction to Splunk and all it can do. This book also serves as a jumping off point for how to get creative with Splunk. Splunk is often used by system administrators, network administrators, and security gurus, but its use is not restricted to these audiences. There is a great deal of business value hidden away in corporate data that Splunk can liberate. This book is designed to reach beyond the typical techie reader of OReilly books to marketing quants as well as everyone interested in the topics of Big Data and Operational Intelligence.
Conventions
As you read through this book, youll notice we use various fonts to call out certain elements: UI elements appear in bold. Commands and field names are in constant width. If you are told to select the Y option from the X menu, thats written concisely as select X Y.
Acknowledgments
This book would not have been possible without the help of numerous people at Splunk who gave of their time and talent. For carefully reviewing drafts of the manuscript and making invaluable improvements, wed like to thank Ledion Bitincka, Gene Hartsell, Gerald Kanapathy, Vishal Patel, Alex Raitz, Stephen Sorkin, Sophy Ting, and Steve Zhang, PhD; for generously giving interview time: Maverick Garner; for additional help: Jessica Law, Tera Mendonca, Rachel Perkins, and Michael Wilde.
iii
Exploring Splunk
Approaching Splunk
As you use Splunk to answer questions, youll find that you can break the task into three phases. First, identify the data that can answer your question. Second, transform the data into the results that can answer your question. Third, display the answer in a report, interactive chart, or graph to make it intelligible to a wide range of audiences. Begin with the questions you want to answer: Why did that system fail? Why is it so slow lately? Where are people having trouble with our website? As you master Splunk, it becomes more obvious what types of data and searches help answer those questions. This book will accelerate your progress to mastery. The question then becomes: Can the data provide the answers? Often, when we begin an analysis, we dont know what the data can tell us. But Splunk is also a powerful tool for exploring data and getting to know it. You can discover the most common values or the most unusual. You can summarize the data with statistics or group events into transactions, such as all the events that make up an online hotel reservation across systems of record. You can create workflows that begin with the whole data set, then filter out irrelevant events, analyzing whats left. Then, perhaps, add some information from an external source until, after a number of simple steps, you have only the data needed to answer your questions. Figure 1-1 shows, in general, the basic Splunk analysis processes.
Exploring Splunk
PHASE III
12.1.1.140 .1 1 1.140 - - [01/Aug [01 g /2009:09:37:01 / 200 09:09:37:01 -0700] -0 "GET /home/themes/C /home/themes/ComB m th B eta/images/btn_login.g n.g g if HTTP/1.1" 1 304 - "h t "ht p p: : ://w / // /w / w webdev webde e ev ev v:2 :2 20 2 000/home/i 000/hom 0 /hom i p://webdev:2000/home/i ex.php" e hp" "Mozilla/5.0 "M il ll la a/5
12.1.1.140 .1 1 1.140 - - [01/Aug [01 g /2009:09:37:01 / 200 09:09:37:01 01 -0700] -0 /home/themes/ComB " "GET /home/t / m /th themes/Co /ComB B eta/images/btn_login.g n.g g "ht if HTTP/1.1" 1 304 - "h t p://webdev:2000/home/i p p: : ://w / // /w / w webdev webde e ev ev v:2 :2 20 2 000/hom 000/ho 0 home/i i ex.php" e hp" "Mozilla/5.0 "M il ll la a/5
PHASE II
PHASE I
12.1.1.140 .1 1 1.140 - - [01/Aug [01 g /2009:09:37:01 / 200 09:09:37:01 01 -0700] -0 /home/themes/ComB "GET /home/t / m /th themes/Co hemes/ComB B eta/images/btn_login.g n.g g if HTTP/1.1" 1 304 - "ht "ht p://webdev:2000/home/i p://webdev:2000/hom p p: :/ / /w we ev ev:2 20 2 0 /hom i ex.php" e hp" "Mozilla/5.0 "M il ll la a/5
/2 2 "GET eta/im if HTT p://we p:/ /web bde e ex.php php" "Mozi zi i 12.1.1. .140 - - [01/Auga a /2009: /20 2009:0 09:0 09:37: 0 9:37:01 9:37:0 01 -0700 "G GE ET /home/th he et ta/ /im /im mage age es/ s/b / t tn.g if f HTTP P/1 P/1 1.1" .1 1" 3 - "ht p://web //web bdev:2 dev:2000/home/i 20 0/h / /i 10 0 - - [01/Aug ex.php" x.php" " "Mozilla/5.0 "Moz Mo il Moz /2 2009:09:37:01 09:09:37:01 -0700] GET T /home/themes/ComB /home/themes/Co om 12 2. .1.1 1.14 .1 14 40 - - [01/Aug "G ta/images/btn_login.g /images/btn_login n. /2009:0 2009:0 09:37: 9:37:01 37 37: 7 01 -0700] 0700] et f HTTP/1.1" HTTP/1.1" 304 - "ht "h h "GET /h home/them mes/ComB if et ta/ima age es/b / t tn if f HTTP P/ P/ P/1 /1 1.1 1" 1 " 3 p://we /we we ebde bdev:200 bd 12.1.1. 2 1 1.140 - - [01/Aug /2009:0 09: "GET / eta/im if HTT TT T p://w //we /w bdev:2000/home/i ex php" "Mozilla/5.0 e ex.
IP address 12.1.1.002 12.1.1.140 12.1.1.140 12.1.1.002 12.1.1.43
<fields...>
Exploring Splunk
Erik and Rob raised funding and the first version of Splunk debuted at LinuxWorld 2005. The product was a huge hit and immediately went viral, spurred on by its availability as a free download. Once downloaded, Splunk began solving broad range of unimagined customer problems and spread from department to department and from company to company. When users asked management to purchase it, they could already point to a record of solving problems and saving time with Splunk. Originally conceived to help IT and datacenter managers troubleshoot technical problems, Splunk has grown to become an extremely useful platform for all kinds of business users because it enables them to search, collect, and organize data in a far more comprehensive, far less labor-intensive way than traditional databases. The result is new business insights and operational intelligence that organizations have never had before.
many different machines to gain access to all the data using far less powerful tools. Using the indexes, Splunk can quickly search the logs from all servers and hone in on when the problem occurred. With its speed, scale, and usability, Splunk makes determining when a problem occurred that much faster. Splunk can then drill down into the time period when the problem first occurred to determine its root cause. Alerts can then be created to head the issue off in the future. By indexing and aggregating log files from many sources to make them centrally searchable, Splunk has become popular among system administrators and other people who run technical operations for businesses around the world. Security analysts use Splunk to sniff out security vulnerabilities and attacks. System analysts use Splunk to discover inefficiencies and bottlenecks in complex applications. Network analysts use Splunk to find the cause of network outages and bandwidth bottlenecks. This discussion brings up several key points about Splunk: Creating a central repository is vital: One of the major victories of Splunk is the way that diverse types of data from many different sources are centralized for searching. Splunk converts data into answers: Splunk helps you find the insights that are buried in the data. Splunk helps you understand the structure and meaning of data: The more you understand your data, the more youll see in it. Splunk also helps you capture what you learn to make future investigations easier and to share what youve learned with others. Visualization closes the loop: All that indexing and searching pays off when you see a chart or a report that makes an answer crystal clear. Being able to visualize data in different ways accelerates understanding and helps you share that understanding with others.
Operational Intelligence
Because almost everything we do is assisted in some way by technology, the information collected about each of us has grown dramatically. Many of the events recorded by servers actually represent behavior of customers or partners. Splunk customers figured out early on that web server access logs could be used not only to diagnose systems but also to better understand the behavior of the people browsing a website.
Exploring Splunk
Splunk has been at the forefront of raising awareness about operational intelligence, a new category of methods and technology for using machine data to gain visibility into the business and discover insights for IT and the entire enterprise. Operational intelligence is not an outgrowth of business intelligence (BI), but a new approach based on sources of information not typically within the purview of BI solutions. Operational data is not only incredibly valuable for improving IT operations, but also for yielding insights into other parts of the business. Operational intelligence enables organizations to: Use machine data to gain a deeper understanding of their customers: For example, if you just track transactions on a website, you see what people bought. But by looking closely at the web server logs you can see all the pages they looked at before they purchased, and, perhaps even more important for the bottom line, you can see the pages that the people who didnt buy looked at. (Remember our new product search example from the intro?) Reveal important patterns and analytics derived from correlating events from many sources: When you can track indicators of consumer behavior from websites, call detail records, social media, and in-store retail transactions, a far more complete picture of the customer emerges. As more and more customer interactions show up in machine data, more can be learned. Reduce the time between an important event and its detection: Machine data can be monitored and correlated in real time. Leverage live feeds and historical data to make sense of what is happening now, to find trends and anomalies, and to make more informed decisions based on that information: For example, the traffic created by a web promotion can be measured in real time and compared with previous promotions. Deploy a solution quickly and deliver the flexibility needed by organizations today and in the futurethat is, the ability to provide ad hoc reports, answer questions, and add new data sources: Splunk data can be presented in traditional dashboards that allow users to explore the events and keep asking new questions.
10
11
Exploring Splunk
Ultimately, operational intelligence enables organizations to ask the right questions, leading to answers that deliver business insights, using combinations of real-time and historical data, displayed in easily digestible dashboards and graphical tools. Theres a reason for the trend toward calling machine data big data. Its big, its messy, and in there, buried somewhere, is the key to the future of your business. Now lets move on to Chapter 2, where youll learn how to get data into Splunk and start finding the gold hidden in your data.
12
2 Getting Data In
Chapter 1 provided an introduction to Splunk and described how it can help you. Now lets take the next step in your journey: getting your data into Splunk. This chapter covers installing Splunk, importing your data, and a bit about how the data is organized to facilitate searching.
Every time the clock ticks, it logs the action and the time that the action occurred. If you were really going to keep track of the clock, in addition to the fact that it ticked, the log might also include other useful information: the battery level, when an alarm was set, turned on or off, or soundedanything that could give you insight into how the clock was working. Each line of the machine data shown above can be considered a separate event, although its common for other machine data to have events that span multiple or even hundreds of lines. Splunk divides raw machine data into discrete pieces of information known as events. When you do a simple search, Splunk retrieves the events that match your search terms. Each event consists of discrete piec13
Exploring Splunk
es of data known as fields. In clock data, the fields might include second, minute, hour, day, month, and year. If you think of groups of events organized in a spreadsheet or database, the events are the rows and the fields are the columns, as shown in Figure 2-1.
Second 58 59 60 1 2 3 Minute 1 1 1 2 2 2 Hour 14 14 14 14 14 14 Day 23 23 23 23 23 23 Month 11 11 11 11 11 11 Year 2011 2011 2011 2011 2011 2011
In practice, another way to think of events is as a set of fields of keyword/ value pairs. If represented as keyword/value pairs, the clock events look like Figure 2-2.
Second=58, Second=59, Second=60, Second=01, Second=02, Minute=01, Minute=01, Minute=01, Minute=02, Minute=02, Hour=14, Hour=14, Hour=14, Hour=14, Hour=14, Day=23, Day=23, Day=23, Day=23, Day=23, Year=2011 Year=2011 Year=2011 Year=2011 Year=2011
Heres a real-world example of one of the most common and useful types of machine data. A web server has a log file that records every URL requested from the server. Some of the fields in web server data are:
client IP, timestamp, http method, status, bytes, referrer, user agent
A visit to one webpage can invoke dozens of requests to retrieve text, images, and other resources. Each request is typically logged as a separate event in a log file. The result is a file that looks something like Figure 2-3 (without the fancy highlighting to help you see the fields).
IP Address Timestamp Http Command Status Bytes Referrer Browser Type
12.1.1.015 - - [01/Aug/2011:12:29:58 -0700] "GET /pages/hltabs_c.html HTTP/1.1" 200 1211 "http://webdev:2000/pages/" "Mozilla/5.0 AppleWebKit/102.1 (KHTML) Safari/102" 12.1.1.015 - - [01/Aug/2011:12:29:58 -0700] "GET /pages/joy.html HTTP/1.1" 200 0012 "http://webdev:2000/pages/" "Mozilla/5.0 AppleWebKit/102.1 (KHTML) Safari/102" AppleWebKit/102.1 (KHTML) Safari/102"
12.1.1.015 - - [01/Aug/2011:12:29:58 -0700] "GET /pages/dochomepage.html HTTP/1.1" 200 1000 "http://webdev:2000/pages/" "Mozilla/5.0
14
15
Exploring Splunk
Downloading Splunk
You can download fully functional Splunk for free, for learning or to support small to moderate use of Splunk. On the splunk.com home page, you see this button:
Click it to begin downloading and installing Splunk on computers running Windows, Mac, Linux, and Unix.
Installing Splunk
Installing Splunk is easy, so well assume youll do that part on your own. If you have any questions, refer to the Splunk Tutorial (http://splunk.com/ goto/book#tutorial), which covers everything in detail.
Starting Splunk
To start Splunk on Windows, launch the application from the Start menu. Look for the Welcome screen, shown in Figure 2-4, and keep reading. To start Splunk on Mac OS X or Unix, open a terminal window. Go to the directory where you installed Splunk, go to the bin subdirectory and, at the command prompt, type:
./splunk start
The very last line of the information you see when Splunk starts is:
The Splunk web interface is at http://your-machinename:8000
Follow that link to the login screen. If you dont have a username and password, the default credentials are admin and changeme. After you log in, the Welcome screen appears.
16
The Welcome screen shows what you can do with your pristine instance of Splunk: add data or launch the search app.
Exploring Splunk
Youre finished adding your data. Lets talk about what Splunk is doing behind the scenes.
Indexes
INDEX A
Search Head
splunk search command
EVENT: raw text + fields such as source, sourcetype, host, and _time
Index A INDEX B
Index C
The indexing pipeline reads the machine data, divides it into events, and identifies some default fields
Machine data is copied to the index where it is available during the search process
The search head distributes the search across many indexes and consolidates the results
Splunk divides a stream of machine data into individual events. Remember, an event in machine data can be as simple as one line in a log file or as complicated as a stack trace containing several hundred lines. Every event in Splunk has at least the four default fields shown in Table 2-1.
18
Field
source
Examples
Where did the data come from? files (/var/log/), scripts (myscript.bat), network feeds (UDP:514) What kind of data is it? Which host or machine did the data come from? When did the event happen?
access_combined, syslog webserver01, cisco_ router Sat Mar 31 02:16:57 2012
These default fields are indexed along with the raw data. The timestamp (_time) field is special because Splunk indexers uses it to order events, enabling Splunk to efficiently retrieve events within a time range. Chapter 3 brings us to the place where most of the action happens: Splunks search interface.
19
21
Exploring Splunk
Notice a few things about this dashboard: The search bar at the top is empty, ready for you to type in a search. The time range picker to the right of the search bar permits time range adjustment. You can see events from the last 15 minutes, for example, or any desired time interval. For real-time streaming data, you can select an interval to view, ranging from 30 seconds to an hour. The All indexed data panel displays a running total of the indexed data.
22
The next three panels show the most recent or common values that have been indexed in each category: The Sources panel shows which files (or other sources) your data came from. The Source types panel shows the types of sources in your data. The Hosts panel shows which hosts your data came from. Now, lets look at the Search navigation menus near the top of the page:
Summary is where we are. Search leads to the main search interface, the Search dashboard. Status lists dashboards on the status of your Splunk instance. Dashboards & Views lists your dashboards and views. Searches & Reports lists your saved searches and reports. The next section introduces you to the Search dashboard.
23
Exploring Splunk
Timeline
Fields menu
Timestamp
Lets examine the contents of this dashboard: Timeline: A graphic representation of the number of events matching your search over time. Fields sidebar: Relevant fields along with event counts. This menu also allows you to add a field to the results. Field discovery switch: Turns automatic field discovery on or off. When Splunk executes a search and field discovery is on, Splunk attempts to identify fields automatically for the current search.
24
Results area: Shows the events from your search. Events are ordered by Timestamp, which appears to the left of each event. Beneath the Raw text of each event are any fields selected from the Fields sidebar for which the event has a value. When you start typing in the search bar, context-sensitive information appears below, with matching searches on the left and help on the right:
Figure 3-4. Helpful info appears when you enter text in the search bar
Under the time range picker, you see a row of icons: Pause Cancel Finalize Print results Create menu
Job inspector
Save search
The search job controls are only active when a search is running. If you havent run a search, or if your search has finished, they are inactive and greyed out. But if youre running a search that takes a long time to complete, you can use these icons to control the search progress: Sending a search to the background lets it keep running to completion on the server while you run other searches or even close the window and log out. When you click Send to background, the search bar clears and you can continue with other tasks. When
25
Exploring Splunk
the job is done, a notification appears on your screen if youre still logged in; otherwise, Splunk emails you (if youve specified an email address). If you want to check on the job in the meantime, or at a later time, click the Jobs link at the top of the page. Pausing a search temporarily stops it and lets you explore the results to that point. While the search is paused, the icon changes to a play button. Clicking that button resumes the search from the point where you paused it. Finalizing a search stops it before it completes, but retains the results to that point and so you can view and explore it in the search view. In contrast, canceling a search stops it running, discards the results, and clears them from the screen. The Job inspector icon takes you to the Job inspector page, which shows details about your search, such as the execution costs of your search, debug messages, and search job properties. Use the Save menu to save the search, save the results, or save and share the results. If you save the search, you can find it on the Searches & Reports menu. If you save the results, you can review them by clicking on Jobs in the upper right corner of the screen. Use the Create menu to create dashboards, alerts, reports, event types, and scheduled searches. Well explain those in detail in Chapter 5. Moving down to the upper left corner of the Results area, you see the following row of icons. Options for displaying events
List
Table Chart
By default, Splunk shows events as a list, from most recent events to least, but you can click on the Table icon to view your results as a table, or you can click the Chart icon to view them as a chart. The Export button exports your search results in various formats: CSV, raw events, XML, or JSON.
26
Events? Results? Whats the Difference? Technically speaking, retrieved events from your indexes are called events. If those events are transformed or summarized so that there is no longer a one-toone mapping with events on disk, they are properly called results. For example, a web-access event retrieved from a search is an event, but the top URL visited today is a result. That said, we are not going to be that picky, and will use the two terms interchangeably.
Disk
sourcetype syslog syslog other-source syslog syslog syslog other-source syslog other-source <events> raw ERROR ERROR WARNING WARNING ERROR
Final results
is called a search, and the pipe character (|) separates the individual commands that make up the search.
Pipes
The first keyword after the pipe is the name of the search command. In this case the commands are top and fields. What command is retrieving the events from the index? Well, there is an implied command called
27
Exploring Splunk
search,
at the beginning of any search that doesnt start with a pipe character. So, really, there are three search commands in the above search: search, top, and fields. The results from each command are passed as input to the next command. If you have ever used a Linux shell such as bash, this concept is probably familiar.
Implied AND
sourcetype=syslog ERROR tells the search command to retrieve events that have a sourcetype equal to syslog AND contain the ERROR.
only term
top user
The next command, top, returns the most common values of the specified fields. By default, top returns the top 10 most common values for the specified field, in descending order (thank you, David Letterman). In this case, the specified field is user, so top returns the users that appear most often in syslog events that contain the term ERROR. The output of top is a table of 3 columns (user, count, and percent), with 10 rows of values. Its also important to understand that the output of the top command becomes the input to the next command after the pipe. In this sense, top has transformed the search results to a smaller set of values, which are further refined by the next command.
elds percent
The second command, fields, with an argument of percent, tells Splunk to remove the percent column from the output of the top command. Exploratory Data Analysis: Spelunking with Splunk What if you dont know anything about the data? Get creative and explore. You can do a search for * to retrieve all events and then learn about them: look at some events, extract some interesting looking fields, get a top of that field, see how the events are broken up, perhaps derive some new fields based on other fields, cluster your results, see how one field varies with another field, and so on. (For more tips about learning whats in a source that you have little knowledge about, refer to http://splunk.com/goto/book#mining_tips.)
28
Before we dive into the search commands in Chapter 4, lets cover the search command itself: a very special command that is critical for using Splunk.
When its not the first command in a search, the search command can filter a set of results of the previous search. To do this, use the search command like any other commandwith a pipe character followed by an explicit command name. For example, the command error | top url | search count>=2 searches for events on disk that have the word error, finds the top URLs, and filters any URLs that only occur once. In other words, of the 10 error events that top returns, show me only the ones where there are two or more instances of that URL. Table 3-1 shows a few examples of implicit calls to the search command and their results.
Table 3-1. Implicit search commands
Search Arguments
(warn OR error) NOT fail*
Result
Retrieves all events containing either warn or error, but not those that have fail, fails, failed, failure, etc. Retrieves all events containing the phrase database error, fatal, and disk (the AND is implied). Retrieves all events that have a host field with a value of main_web_server and a delay field with a value greater than 2.
host=main_web_server delay>2
29
Exploring Splunk
Case-sensitivity
Keyword arguments to the search command are not case-sensitive, but field names are. (See Appendix B for more details about case-sensitivity.)
Boolean logic
Argumentskeywords and fieldsto the search command are ANDed together, implicitly. You can specify that either one of two or more arguments should be true using the OR keyword, in uppercase. OR has higher precedence than AND, so you can think of arguments using OR as having parentheses around them. To filter out events that contain a particular word, use the NOT keyword. Finally, you can use parentheses explicitly to make things more clear if you want to. For example, a search for x y OR z NOT w is the same as x AND (y OR z) AND NOT w.
Subsearches
The search command, like all commands, can be used as a subsearcha search whose results are used as an argument to another search command. Subsearches are enclosed in square brackets. For example, to find all syslog events from the user that had the last login error, use the following command:
sourcetype=syslog [search login error | return user] 30
Here, a search for events having the terms login and error is performed, returning the first user value found, say bob, followed by a search for sourcetype=syslog user=bob. If youre ready to continue your adventure in learning Splunk, Chapter 4 introduces you to more commands you will find immediately helpful.
31
Category
Sorting Results Filtering Results
Description
Ordering results and (optionally) limiting the number of results. Taking a set of events or results and filtering them into a smaller set of results. Grouping events so you can see patterns. Taking search results and generating a summary for reporting. Filtering out (removing) some fields to focus on the ones you need, or modifying or adding fields to enrich your results or events.
Commands
sort search where dedup head tail transaction top/rare stats chart timechart fields replace eval rex lookup
Sorting Results
Sorting results is the province of the (you guessed it!) sort command.
sort
The sort command sorts search results by the specified fields. Table 4-2 shows some examples.
33
Exploring Splunk
Shorthand for Part of a Search If we show just part of a series of commands (as we do in Table 4-2), youll see:
... |
This means that some search preceded this command, but we are focusing on what comes afterward.
Table 4-2. sort Command Examples
Command
| sort 0 field1
Result
Sort results in ascending order by field1, returning all results (0 means return them all; dont stop at 10,000, which is the default). Sort results by field1 in ascending order, and then by field2 in descending order, returning up to 10,000 results (the default). Sort results in descending order by field1, and then in ascending order by field2, returning the first 100 sorted results. Sort results by filename: The first command lets Splunk decide how to sort the field values. The second command tells Splunk to sort the values numerically. The third command tells Splunk to sort the values lexicographically.
| sort field1,-field2
Hint: Ascending order is the default for search results. To reverse the order of results, use a minus sign in front of a field used to order the results. Figure 4-1 illustrates the second example. Well sort by ascending prices and descending ratings. The first result is the cheapest item with the highest user rating.
34
price 9.99 9.88 22.50 22.50 48.88 9.99 9.99 48.88 22.50
rating 1 2 2 3 3 4 4 5 5
fields ... ... ... ... ... ... ... ... ...
price 9.88 9.99 9.99 9.99 22.50 22.50 22.50 48.88 48.88
rating 2 1 4 4 2 3 5 3 5
fields ... ... ... ... ... ... ... ... ...
price 9.88 9.99 9.99 9.99 22.50 22.50 22.50 48.88 48.88
rating 2 4 4 1 5 3 2 5 3
fields ... ... ... ... ... ... ... ... ...
sort price,-rating
Filtering Results
These commands take search results from a previous command and reduce them to a smaller set of results. In other words, youre narrowing down your view of the data to show only the results you are looking for.
where
The where filtering command evaluates an expression for filtering results. If the evaluation is successful and the result is TRUE, the result is retained; otherwise, the result is discarded. For example:
source=job_listings | where salary > industry_average
This example retrieves jobs listings and discards those whose salary is not greater than the industry average. It also discards events that are missing either the salary field or the industry_average field. This example compares two fieldssalary and industry_average something we can only do with the where command. When comparing field values to literal values, simply use the search command:
source=job_listings salary>80000
35
Exploring Splunk
Command
| where distance/time > 100
Result
Keep results whose distance field value divided by the time field value is greater than 100. Keep results that match the IP address or are in the specified subnet.
distance 50 100 200 300 300 100 500
time 2 0.5 2
evaluate (distance/time>100) and keep only events for which the result is TRUE
where distance/time > 100
final results
dedup
Removing redundant data is the point of the dedup filtering command. This command removes subsequent results that match specified criteria. That is, this command keeps only the first count results for each combination of values of the specified fields. If count is not specified, it defaults to 1 and returns the first result found (which is usually the most recent).
36
Command
dedup host dedup 3 source dedup source sortby -delay
Result
Keep the first result for each unique host. Keep the first three results for each unique source. Keep the first result for each unique source after first sorting the results by the delay field in descending order. Effectively this keeps the result with the largest delay value for each unique source. Keep the first three results for each unique combination of source and host values. Keep the first result for each unique source, also keeping those with no source field.
dedup 3 source,host
field2 <fields. . . > f2_value1 . . .. f2_v alue2 ... f2_v alue3 ... f2_v alue4 ... f2_v alue5 ... f2_v alue8 ...
for events with matching source field values, remove all except the first three
dedup 3 source
final results
Key Points
To keep all results but remove duplicate values, use the keepevents option. The results returned are the first results found with the combination of specified field valuesgenerally the most recent ones. Use the sortby clause to change the sort order if needed.
37
Exploring Splunk
Fields where the specified fields do not all exist are retained by default. Use the keepnull=<true/false> option to override the default behavior, if desired.
head
The head filtering command returns the first count results. Using head permits a search to stop retrieving events from disk when it finds the desired number of results. Heads or Tails? The opposite of the head command is the tail command, which returns the last results, rather than the first. The results are returned in reverse order, starting at the end of the results. Keep in mind that first is relative to the input order of events, which is usually in descending time order, meaning that, for example, head 10 returns the latest 10 events.
Table 4-5. head Command Examples
Command
| head 5 | head (action=startup)
Result
Return the first 5 results. Return the first events until we reach an event that does NOT have an action field with the value startup.
field1 1 2 3 4 5
search operator
...|
38
Grouping Results
The transaction command groups related events.
transaction
The transaction command groups events that meet various constraints into transactionscollections of events, possibly from multiple sources. Events are grouped together if all transaction definition constraints are met. Transactions are composed of the raw text (the _raw field) of each member event, the timestamp (the _time field) of the earliest member event, the union of all other fields of each member event, and some additional fields the describe the transaction such as duration and eventcount.
Table 4-6. transaction Command Examples
Command
| transaction clientip maxpause=5s
Result
Group events that share the same client IP address and have no gaps or pauses longer than five seconds. With this command, the search results may have multiple values for the host field. For example, requests from a single IP address could come from multiple hosts if multiple people are accessing the server from the same location.
Group events that share the same unique combination of client IP address and host, where the first and last events are no more than 30 seconds apart and no event in the transaction occurred no more than five seconds apart. In contrast with the first example, each result event has a distinct combination of the IP address (clientip) and host value within the limits of the time constraints. Therefore, you should not see different values of host or clientip addresses among the events in a single transaction.
39
Exploring Splunk
Retrieve web access events that have an action=purchase value. These events are then grouped by the transaction command if they share the same clientip, where each session lasts no longer than 10 minutes and includes no more than three events. Group events together that have the same session ID (JSESSIONID) and come from the same IP address (clientip) and where the first event contains the string, signon and the last event contains the string, purchase.
The search defines the first event in the transaction as events that include the string, signon, using the startswith=signon argument. The endswith=purchase argument does the same for the last event in the transaction. This example then pipes the transactions into the where command, which uses the duration field to filter out transactions that took less than a second to complete.
The second example in Table 4-6, transaction clientip maxspan=30s maxpause=5s, is illustrated in Figure 4-5.
40
Key Points
All the transaction command arguments are optional, but some constraints must be specified to define how events are grouped into transactions. Splunk does not necessarily interpret the transaction defined by multiple fields as a conjunction (field1 AND field2 AND field3) or a disjunction (field1 OR field2 OR field3) of those fields. If there is a transitive relationship between the fields in the <fields list>, the transaction command uses it. For example, if you searched for transaction host cookie, you might see the following events grouped into a single transaction:
event=1 host=a event=2 host=a cookie=b event=3 cookie=b
The first two events are joined because they have host=a in common and then the third is joined with them because it has cookie=b in common with the second event. The transaction command produces two fields: duration: difference between the timestamps for the first and last events in the transaction. eventcount: number of events in the transaction. Although the stats command (covered later in this section) and the transaction command both enable you to aggregate events, there is an important distinction: stats calculates statistical values on events grouped by the value of fields (and then the events are discarded). transaction groups events, and supports more options on how they are grouped and retains the raw event text and other field values from the original events.
Reporting Results
Reporting commands covered in this section include top, stats, chart, and timechart.
top
Given a list of fields, the top command returns the most frequently occurring tuple of those field values, along with their count and percent41
Exploring Splunk
age. If you specify an optional by-clause of additional fields, the most frequent values for each distinct group of values of the by-clause fields are returned. The opposite of top is rare The opposite of the top command is the rare command. Sometimes you want to know what is the least common value for a field (instead of the most common). The rare command does exactly that.
Table 4-7. top Command Examples
Command
| top 20 url | top 2 user by host | top user, host
Result
Return the 20 most common URLs. Return the top 2 user values for each host. Return the top 10 (default) userhost combinations.
The second example in Table 4-7, top 2 user by host, is illustrated in Figure 4-6.
host host-1 host-1 host-1 host-1 host-1 host-1 host-1 host-2 host-2 host-2 host-2 host-2 user <fields. . . > user_A . . .. user_A . . .. user_B . . .. user_C . . .. user_C . . .. user_C . . .. user_D . . .. user_E . . ... user_E . . .. user_F . . .. user_G . . .. user_G . . ..
count 2 1 3 1 2 1 2
count 3 2 2 2
intermediate results: identifying count & percent of user values for each host value
final results
42
stats
The stats command calculates aggregate statistics over a dataset, similar to SQL aggregation. The resultant tabulation can contain one row, which represents the aggregation over the entire incoming result set, or a row for each distinct value of a specified by-clause. Theres more than one command for statistical calculations. The stats, chart, and timechart commands perform the same statistical calculations on your data, but return slightly different result sets to enable you to more easily use the results as needed. The stats command returns a table of results where each row represents a single unique combination of the values of the group-by fields. The chart command returns the same table of results, with rows as any arbitrary field. The timechart command returns the same tabulated results, but the row is set to the internal field, _time, which enables you to chart your results over a time range. Table 4-8 shows a few examples of using the stats command. What as means Note: The use of the keyword as in some of the commands in Table 4-14. as is used to rename a field. For example, sum(price) as Revenue means add up all the price fields and name the column showing the results Revenue.
Table 4-8. stats Command Examples
Command
| stats dc(host) | stats avg(kbps) by host | stats count(eval(method=GET)) as GET, count(eval(method=POST)) as POST by host
Result
Return the distinct count (i.e., unique) of host values. Return the average transfer rate for each host. Return the number of different types of requests for each Web server (host). The resultant table contains a row for each host and columns for the GET and POST request method counts. Return the total number of hits from the top 100 values of referer_domain.
43
Exploring Splunk
| stats count, max(Magnitude), min(Magnitude), range(Magnitude), avg(Magnitude) by Region | stats values(product_type) as Type, values(product_name) as Name, sum(price) as Revenue by product_id | rename product_id as Product ID | eval Revenue=$ .tostring(Revenue,commas)
Using USGS Earthquakes data, return the number of quakes and additional statistics, for each Region. Return a table with Type, Name, and Revenue columns for each product_id sold at a shop. Also, format the Revenue as $123,456.
The third example in Table 4-8, retrieving the number of GET and POST requests per host, is illustrated in Figure 4-7.
Table 4-9 lists statistical functions that you can use with the stats command. (These functions can also be used with the chart and timechart commands, which are discussed later.)
Table 4-9. stats Statistic al Functions
Mathematical Calculations
avg(X) count(X)
Returns average of the values of field X; see also, mean(X). Returns the number of occurrences of the field X; to indicate a field value to match, format the X argument as an expression: eval(field="value"). Returns the count of distinct values of field X. Returns the maximum value of field X. If the values are non-numeric, the max is determined per lexicographic ordering. Returns the middle-most value of field X. Returns the minimum value of field X. If the values are non-numeric, the min is determined per lexicographic ordering.
dc(X) max(X)
median(X) min(X)
44
Returns the most frequent value of field X. Returns the <percent-num>-th value of field X; for example, perc5(total) returns the 5th percentile value of the total field. Returns the difference between the max and min values of field X, provided values are numeric. Returns the sample standard deviation of field X. You can use wildcards when you specify the field name; for example, "*delay", which matches both "delay" and "xdelay". Returns the sum of the values of field X. Returns the sample variance of field X.
sum(X) var(X)
Value Selections
first(X) last(X)
Returns the first value of field X; opposite of last(X). Returns the last value of field X; opposite of first(X). Generally, a fields last value is the most chronologically oldest value. Returns the list of all values of field X as a multivalue entry. The order of the values matches the order of input events. Returns a list (as a multivalue entry) of all distinct values of field X, ordered lexicographically.
list(X)
values(X)
Returns the rate of field X per day Returns the rate of field X per hour Returns the rate of field X per minute Returns the rate of field X per year
Note: All functions except those in the timechart only category are applicable to the chart, stats, and timechart commands.
chart
The chart command creates tabular data output suitable for charting. You specify the x-axis variable using over or by. Table 4-10 shows a few simple examples of using the chart command; for more realistic scenarios, see Chapter 6.
45
Exploring Splunk
Command
Result
| chart max(delay) over host Return max(delay) for each value of host. | chart max(delay) by size bins=10 | chart eval(avg(size)/ max(delay)) as ratio by host user ... | chart dc(clientip) over date_hour by category_id usenull=f
Chart the maximum delay by size, where size is broken down into a maximum of 10 equal-size buckets. Chart the ratio of the average (mean) size to the maximum delay for each distinct host and user pair. Chart the number of unique clientip values per hour by category. usenull=f excludes fields that dont have a value.
| chart count over Magnitude Chart the number of earthquakes by by Region useother=f Magnitude and Region. Use the useother=f argument to not output an other value for rarer Regions. | chart count(eval(method=GET)) as GET, count(eval(method=POST)) as POST by host
Chart the number of GET and POST page requests that occurred for each Web server (host)
Figures 4-8 (tabulated results) and 4-9 (bar chart on a logarithmic scale) illustrate the results of running the last example in Table 4-10:
46
timechart
The timechart command creates a chart for a statistical aggregation applied to a field against time as the x-axis. Table 4-11 shows a few simple examples of using the timechart command. Chapter 6 offers more examples of using this command in context.
Table 4-11. timechart Command Example
Command
Result
| timechart span=1m avg(CPU) Chart the average value of CPU usage by host each minute for each host. | timechart span=1d count by product-type
Chart the number of purchases made daily for each type of product. The span=1d argument buckets the count of purchases over the week into daily chunks. Chart the average cpu_seconds by host and remove outlying values that may distort the timecharts y-axis. Chart hourly revenue for the products that were purchased yesterday. The per_hour() function sums the values of the price field for each item (product_name) and scales that sum appropriately depending on the timespan of each bucket.
47
Exploring Splunk
Chart the number of page requests over time. The count() function and eval expressions are used to count the different page request methods, GET and POST. For an ecommerce website, chart per_hour the number of produc t
views and purchasesanswering the question, how many views did not lead to purchases?
The fourth example in Table 4-11, charting hourly revenues by product name, is illustrated in figures 4-10 and 4-11.
48
more readable for a particular audience by using the replace command. Or you might need to add new fields with the help of commands such as eval, rex, and lookup: The eval command calculates the value of a new field based on other fields, whether numerically, by concatenation, or through Boolean logic. The rex command can be used to create new fields by using regular expressions to extracting patterned data in other fields. The lookup command adds fields based on looking at the value in an event, referencing a lookup table, and adding the fields in matching rows in the lookup table to your event. These commands can be used to create new fields or they can be used to overwrite the values of existing fields. Its up to you.
elds
The fields command removes fields from search results. Typical commands are shown in Table 4-6.
Table 4-12. fields Command Examples
Command
| fields field1, field2 | fields field1 field2 | fields field1 error* | fields field1 field2 | fields - _*
Result
Remove field1 and field2 from the search results. Keep only field1 and field2. Keep only field1 and all fields whose names begin with error. Keep only field1 and field2 and remove all internal fields (which begin with an underscore). (Note: Removing internal fields can cause Splunk Web to render results incorrectly and create other search problems.)
The first example in Table 4-12, fields field1, field2, is illustrated in Figure 4-12.
49
Exploring Splunk
<fields. . . > ... ... ... ... ... ... ... ... ...
<fields. . . > ... ... ... ... ... ... ... ... ...
field1 a b c d e f g h i
<fields. . . > ... ... ... ... ... ... ... ... ...
final results
Key Points
Internal fields, i.e., fields whose names start with an underscore, are unaffected by the fields command, unless explicitly specified.
replace
The replace command performs a search-and-replace of specified field values with replacement values. The values in a search and replace are case-sensitive.
Table 4-13. replace Command Examples
Command
replace *localhost with localhost in host replace 0 with Critical , 1 with Error in msg_level replace aug with August in start_month end_month replace 127.0.0.1 with localhost
Result
Change any host value that ends with localhost to localhost. Change msg_level values of 0 to Critical, and change msg_level values of 1 to Error. Change any start_month or end_ month value of aug to August. Change all field values of 127.0.0.1 to localhost.
The second example in Table 4-13, replace 0 with Critical , 1 with Error in msg_level, is illustrated in Figure 4-13.
50
msg_level 0 1 0 0 3 4 0 3 1
<fields. . . > ... ... ... ... ... ... ... ... ...
<fields. . . > ... ... ... ... ... ... ... ... ...
final results
eval
The eval command calculates an expression and puts the resulting value into a new field. The eval and where commands use the same expression syntax; Appendix E lists all the available functions.
Table 4-14. eval Command Examples
Command
| eval velocity=distance/ time | eval status = if(error == 200, OK, Error) | eval sum_of_areas = pi() * pow(radius_a, 2) + pi() * pow(radius_b, 2)
Result
Set velocity to distance divided by time. Set status to OK if error is 200; otherwise set status to Error. Set sum_of_areas to be the sum of the areas of two circles.
Figure 4-14 illustrates the first example in Table 4-14, eval velocity=distance/time.
51
Exploring Splunk
time 10 10 5
final results
eval velocity=distance/time
The eval command results create a new velocity field. (If a velocity field exists, the eval command updates its value.) The eval command creates or overrides only one field at a time.
rex
The rex command extracts fields whose value matches a specified Perl Compatible Regular Expression (PCRE). (rex is shorthand for regular expression.) What Are Regular Expressions? Think of regular expressions as wildcards on steroids. Youve probably looked for files with expressions like *.doc or *.xls. Regular expressions let you take that to a whole new level of power and flexibility. If youre familiar with regular expressions, youre probably not reading this box. To learn more, see http://www. regular-expressions.info easily the best site on the topic.
Table 4-15. rex Command Examples
Command
| rex From: (?<from>.*) To: (?<to>.*)
Result
Extract from and to fields using regular expressions. If a raw event contains From: Susan To: Bob, then from=Susan and to=Bob. Extract user, app, and SavedSearchName from a field called savedsearch_id. If savedsearch_id = bob;search;my_saved_search, then user=bob, app=search, and
SavedSearchName=my_saved_ search.
52
Use sed syntax to match the regex to a series of numbers, and replace them with an anonymized string.
Figure 4-15 illustrates the first example in Table 4-15, extracting from and to fields.
_raw From: Susan To: Bob Subject: current to-do list Message Hi Bob, I wanted to From: Meenu Subject: version 6 docs Message Hi Jan, Let's set up a time to From: John To: Miguel Message Here's what we need to arrange for the . . . from to _raw Susan Bob Subject: current to-do list Message Hi Bob, I wanted to Meenu Subject: version 6 docs Message Hi Jan, Let's set up a time to John Miguel Message Here's what we need to arrange for the . . .
final results
rex "From: (?<from>.*) To: (?<to>.*)
lookup
The lookup command manually invokes field lookups from a lookup table, enabling you to add field values from an external source. For example, if you have 5-digit zip codes, you might do a lookup on the street name to apply a ZIP+4 9-digit zip code.
Table 4-16. Command Examples
Command
| lookup usertogroup user as local_user OUTPUT group as user_group
Result
For a lookup table with fields user and group, specified in stanza name usertogroup in transform.conf,1 look up the value of each events local_user field. For entries that match, the value of the lookup tables group field is written to the events user_group field.
53
Exploring Splunk
Given a field lookup named dnslookup, referencing a Python script that performs a reverse DNS lookup and accepts either a host name or IP address as arguments, match the host name values (host field in your events to the host name values in the table, and then add the corresponding IP address values to your events (in the ip field). For a local lookup table that is present only in the search head, look up the value of each events user field. For entries that match, the value of the lookup tables zip field is written to the events user_zip field.
Figure 4-16 illustrates the first example in Table 4-16, lookup usertogroup user as local_user OUTPUT group as user_group.
usergroup lookup table user group <fields. . . > User10 A . . .. User9 B . . .. User1 C . . .. User7 D . . .. User2 E . . .. User3 F . . .. User27 G . . .. User98 H . . .. local_user <fields. . . > User1 . . .. User2 . . .. User3 . . ..
final results
lookup usertogroup user as local_user OUTPUT group as user_group
This chapter has provided a crash course in the commands in the SPL. The next chapter describes how you can enrich your data with tags and event types and tell Splunk to watch for certain patterns and alert you about them.
54
55
Exploring Splunk
56
Figure 5-1. Choosing Extract Fields from the Event Options menu starts the Interactive Field Extractor
The IFX appears in another tab or window in your browser. By entering the kinds of values you seek (such as a client IP address in web logs), Splunk generates a regular expression that extracts similar values (this is especially helpful for the regular expression-challenged among us). You can test the extraction (to make sure it finds the field youre looking for) and save it with the name of the field. To learn more about the Interactive Field Extractor, see http://splunk.com/goto/ book#ifx.
Exploring Splunk
Sometimes the command you use depends on the kind of data from which youre extracting fields. To extract fields from multiline tabular events (such as command-line output), use multikv, and to extract from XML and JSON data, use spath or xmlkv. To learn about commands that extract fields, see http://splunk.com/goto/ book#search_fields.
Figure 5-2. View a field summary by clicking on a field name in the Fields sidebar.
58
You can also narrow the events list to see only events that have a value for that field.
How many people bought flowers yesterday? Use stats dc (distinct count) to ensure that each IP address is counted only once.
sourcetype=access* action=purchase category_id=flowers | stats dc(clientip)
What is the 95th percentile of time the servers took to respond to web requests?
sourcetype=access* | stats perc95(spent)
59
Exploring Splunk
Figure 5-3. Sparklines show patterns in the data in the Events table
Here are a few more commands that demonstrate ways to use sparklines: What is the number of events for each status and category combination, over time?
sourcetype=access* | stats sparkline count by status, category_id
What is the average time response time for each product category, over time?
sourcetype=access* | stats sparkline(avg(spent)) by category_id
Using a different data set (earthquake magnitude data), see how earthquake magnitude varies by region and over 6 hour chunks of time, with the more popular regions first.3
source=eqs7day-M2.5.csv | stats sparkline(avg(Magnitude),6h) as magnitude_trend, count, avg(Magnitude) by Region | sort count
60
Tagging
Tags are an easy way to label any field value. If the host name bdgpu-login-01 isnt intuitive, give it a tag, like authentication_server, to make it more understandable. If you see an outlier value in the UI and want to be able to revisit it later and get more context, you might label it follow_up. To tag a field value in the events list, click the down arrow beside the field value you want to tag (see Figure 5-4).
You can manage all your tags by going to Manager Tags. Lets suppose youve labeled your various host values with tags such as webserver, database_server, and so on. You can then report on those custom tags to see your data the way you want instead of how it happens to be named. Again, you decide how you want to look at your data. For example, to compare how the various host types perform over time, run a search such as:
| timechart avg(delay) by tag::host
Reporting and the Joy of Negative Searching From the moment you start looking at data, you should be thinking about reporting. What would you like to know about the data? What are you looking for? What noise would you like to remove from the data so that you can easily find what youre looking for? This last point bears further explanation as an example of something Splunk does very well that few if any other data analysis software can: negative searching.
61
Exploring Splunk
Its often said that you cant prove a negative. You cant look everywhere and say, what I seek is not there. With Splunk you can do negative searching and in fact you should. The reason its hard to see whats happening with log files, and many other types of data, is that so much of it is the same, sort of businessas-usual machine data. With Splunk you can categorize that uninteresting data and tell Splunk to show you only whats unusual or different. Show me what I havent seen before. Some security experts use Splunk in just this way to identify anomalous events that could indicate an intrusion, for example. If theyve seen it before, they give it a tag and exclude it from their search. After you do this for a while, if anything odd happens, youll see it right away.
Event Types
When you search in Splunk, you start by retrieving events. You implicitly look for a particular kind of event by searching for it. You could say that you were looking for events of a certain type. Thats how event types are used: they let you categorize events. Event types facilitate event categorization using the full power of the search command, meaning you can use Boolean expressions, wildcards, field values, phrases, and so on. In this way, event types are even more powerful than tags, which are limited to field values. But, like tags, how your data is categorized is entirely up to you. You might create event types to categorize events such as where a customer purchased, when a system crashed, or what type of error condition occurred. Its all about what you need to know about your events. Here are some ground rules for a search that defines an event type: No pipes. You cant have a pipe in a search used to create an event type (i.e., it cannot have any search commands other than the implied search command). No subsearches. At the end of Chapter 3, we briefly covered the wheel-within-a-wheel that is subsearches; for now, remember that you cant use them to create event types. Heres a simple example. In our ongoing quest to improve our website, were going to create four event types based on the status field: status=2* is defined as success. status=3* is defined as redirect. status=4* is defined as client_error. status=5* is defined as server_error.
62
To create the event type success as weve defined it, you would perform a search like this:
sourcetype=access* status=2*
Next, choose Create Event type. The Save As Event Type dialog appears where you name the event type, optionally assign tags, and click Save. To see the event types matching your search results, click eventtype in the Fields sidebar. This multivalued field shows all the event types for the events in the events list. We create the other three event types in just the same way, and then run a stats count to see the distribution:
sourcetype=access*| stats count by eventtype
There are relatively few events with an event type of server_error but, nonetheless, they merit a closer look to see if we can figure out what they have in common. Clicking server_error lets us to drill down into events of just that event type, where we see 15 events that all look something like the one shown in Figure 5-6.
63
Exploring Splunk
The server_error events have one rather disturbing thing in common: people are trying to buy something when the server unavailable status occurs. In other words, this is costing us money! Its time to go talk to the person who administers that server and find out whats wrong. Nesting Event Types You can build more specific event types on top of more general event types. We could define a new event type web_error with other event types as building blocks:
eventtype=client_error OR eventtype=server_error
Of course, you should use this sparingly because you dont want to risk losing track and inadvertently creating circular definitions.
64
Earlier we mentioned negative searching. If you tag all the event types you dont especially want to see with a tag of normal, you can then search for events that are NOT normal. This brings abnormalities to the surface.
NOT tag::eventtype=normal
Visualizing Data
So far weve shown you a couple of ways to get at data visualizations: Clicking a fieldname in the Fields sidebar to see some quick graphics about a field. Using the top and stats search commands. Using sparklines to see inline visualizations in the events table results. This section shows you how to create charts and dashboards for visualizing your data.
Creating Visualizations
When you look at a table of data, you may see something interesting. Putting that same data into charts and graphs can reveal new levels of information and bring out details that are hard to see otherwise. To create charts of your data, after you run a search, select Create Report. Alternatively, in Splunk 4.3, click the Results Chart icon in the Results area to display a chart of your results. Splunk offers various chart types: column, line, area, bar, pie, and scatterplots. What product categories are affected most by 404 errors? This search calculates the number of events for each category_id and generates the pie chart shown in Figure 5-7.
sourcetype=access* status=404 | stats count by category_id
65
Exploring Splunk
Given that flowers and gifts are among the highest-margin products, wed better add some redirects for the bad URLs (and try to get the sites that are linking to our pages to update their links). When you mouse over any graphic in Splunk, you get more information about the data behind that portion of the graphic. See Figure 5-8.
Creating Dashboards
The end result of using Splunk for monitoring is usually a dashboard with several visualizations. A dashboard is made up of report panels, which can be a chart, a gauge, or a table or list of search results (often the data itself is interesting to view). When designing dashboards, ask yourself, Of all of these charts, which ones would I want to see first? Which ones would end users want to see first? Which ones would line-of-business managers want to see first? Maybe each audience needs its own dashboard. Then you can ask, What questions arise from looking at this dashboard? Splunk automatically handles many kinds of drill downs into chart specifics with a simple click on the chart. (Advanced users can specify drilldown behavior explicitly, but that is beyond the scope of this book.) One key point to remember is that simple visualizations are generally the most popular with all levels of users. You can, and should, make more advanced and detailed dashboards, but make sure to do a good job covering the simple, high-level views. Figure 5-9 shows an example of a dashboard.
The best way to build a dashboard is not from the top down but from the bottom up, with each panel. Start by using Splunks charting capabilities to show the vital signs in various ways. When you have several individual charts showing different parts of the systems health, place them onto a dashboard.
67
Exploring Splunk
Creating a Dashboard
In Splunk 4.3, to create a dashboard and add a report, chart, or search results to it: 1. Run a search that generates a report for a dashboard. 2. Select Create Dashboard panel. 3. Give your search a name, and click Next. 4. Decide if you want this report to go on a new dashboard or on an existing dashboard. If youre creating a new dashboard, give it a name. Click Next. 5. Specify a title for your dashboard and a visualization (table, bar, pie, gauge, etc.), and when you want the report for the panel to run (whenever the dashboard is displayed or on a fixed schedule). 6. Click Next followed by the View dashboard link or OK.
Viewing a Dashboard
At any time you can view a dashboard by selecting it from the Dashboards & Views menu at the top of the page.
Editing a Dashboard
While viewing your dashboard, you can edit it by clicking On in the Edit mode selector and then clicking the Edit menu of any panel you want to edit. From there, you can edit the search that generates a report or how its visualized, or delete the panel.
Creating Alerts
What is an alert? You can think of an alert as an if-then statement that gets evaluated on a schedule:
If this happens, then do that in response.
The if in this case is a search. The then is the action you want to be taken in response to the if clause being fulfilled. More formally, an alert is a search that runs periodically with a condition evaluated on the search results. When the condition matches, some actions are executed.
68
is in the search bar when you create an alert and uses that as a saved search, which becomes the basis for your alert (the if in your if-then). With the search you want in the search bar, select Create Alert. This starts a wizard that makes it easy to create an alert.
Scheduling an Alert
On the Schedule screen of the Create Alerts dialog, you name the alert and specify how you want Splunk to execute it. You can choose whether Splunk monitors for a condition by running a search in real time, by running a scheduled search periodically, or by monitoring in real time over a rolling window. Here are the use cases for these three options: Monitor in real time if you want to be alerted whenever the condition happens. Monitor on a scheduled basis for less urgent conditions that you nonetheless want to know about. Monitor using a real-time rolling window if you want to know if a certain number of things happen within a certain time period (its a hybrid of the first two options in that sense). For example, trigger the alert as soon as you see more than 20 404s in a 5-minute window. If you specify that you want to monitor on a schedule or in a rolling window, you must also specify the time interval and the number of results that should match the search to trigger the alert. Alternatively, you could enter a custom condition, which is a search that is executed if the alert condition is met. Custom conditions are described later in this chapter.
69
Exploring Splunk
The next step is to set limits and specify what to do if the alert is triggered.
Specifying Actions
What should happen if the alert condition occurs? On the Action screen of the Create Alert dialog, you specify what action or actions you want to take (sending email, running a script, showing triggered alerts in Alerts Manager). In Figure 5-11, the user chose all of the above actions, letting us see all the options available here.
Send email. Email has the following options: Email addresses. Enter at least one. Subject line. You can leave this as the default, which is Splunk Alert: AlertName. The alert name is substituted for $name$. (This means you could change that subject to: Oh no! $name$ happened.) Include the results that triggered the alert. Click the checkbox to include them either as an attached CSV file or select inline to put them right into the email itself. Run a script. You specify the script name, which must be placed in Splunks home directory, within /bin/scripts or within an apps / bin/scripts directory. Show triggered alerts in Alert manager, which you reach by clicking Alerts in the upper right corner of the UI. After you choose an action (or two or three), you can fill in a few more options: Set the severity. The severity is metadata for your reference so that you can classify alerts. The levels are info, low, medium, high, and critical. Severity shows up in Alert manager. Execute actions on all results or each result. This determines whether Splunk takes the action (such as sending an email) for the group of results that matches the search or for each individual result. All results is the default. Throttling. Alerts are effective only if they tell you what you need to know when you need to know it. Too many alerts and youll ignore them. Too few and you wont know whats happening. This option specifies how long Splunk should wait to perform the action associated with the alert again, after it has been triggered. If you specify a rolling window, the wizard defaults the throttling interval to match that window. More throttling options are described later in this chapter. After you click Next, the final step is to specify whether the alert is private or shared for read-only access to users of the current app. Click Finish to finalize the alert.
Exploring Splunk
in an isolated vital sign doesnt trigger an alert, but 10 vital signs getting within 10% of their upper limits do. Its easy to create alerts quickly using the wizard, but still more options for tuning alerts are available using Manager. Remember that saved searches underlie alerts. As a result, you edit them like you would a saved search. To edit to your alert, choose Manager and then Searches and Reports. Select a saved search from the list to display its parameters.
72
Throttling Alerts
Splunk lets you tune alerts so that they tell you something meaningful. A message that tells you something important is helpful. One hundred messages, on the other hand, whether justified or not, is not helpful. Its noise. Splunk lets you throttle alerts so that even if they are triggered, they go off only once in a particular time interval. In other words, if the first alert is like the first kernel of popcorn that pops, you dont want alerts for all those other kernels, which are really related to that first alert. (If popcorn had a second alert, it should go off just after all functional kernels pop and before any of them burn.) This is what throttling does. You can tell Splunk to alert you but not to keep alerting you. In the middle of the Managers screen for editing alerts is an option called Alert mode (see Figure 5-12).
You can be alerted once per search, that is, for all results, or you can be alerted once per result. Per result alerts can be further throttled by fields. For example, you may want to be alerted whenever the condition is fulfilled, but only once per host. Lets say that disk space is running low on a
73
Exploring Splunk
server and you want to be alerted when theres less than 30% free space available. If you specify host in Per result throttling fields, you would only be notified once for each host during the specified time period. If you were dealing with user login failures, you might enter username as the per-result-throttling field.
74
A brief clarification of terminology is needed here. Well refer to the saved if-then scheduled search as an alert, and an individual firing of that alert as an alert instance. The Alert manager shows the list of most recent firings of alerts (i.e., alert instances). It shows when the alert instance fired, and provides a link to view the search results from that firing and to delete the firing. It also shows the alerts name, app, type (scheduled, real-time, or rolling window), severity, and mode (digest or per-result). You can also edit the alerts definition.
75
PART II RECIPES
Monitoring Recipes
Monitoring can help you see what is happening in your data. How many concurrent users are there? How are key metrics changing over time? In addition to recipes that monitor various conditions, this section provides recipes that describe how to use search commands to extract fields from semi-structured and structured data.
79
Exploring Splunk
Solution
First, perform a search to retrieve relevant events. Next, use the concurrency command to find the number of users that overlap. Finally, use the timechart reporting command to display a chart of the number of concurrent users over time. Lets say you have the following events, which specify date, time, request duration, and username:
5/10/10 1:00:01 ReqTime=3 User=jsmith 5/10/10 1:00:01 ReqTime=2 User=rtyler 5/10/10 1:00:01 ReqTime=50 User=hjones 5/10/10 1:00:11 ReqTime=2 User=rwilliams 5/10/10 1:00:12 ReqTime=3 User=apond
You can see that, at 1:00:01, there are three concurrent requests (jsmith, rtyler, hjones); at 1:00:11, there are two (hjones, rwilliams); and at 1:00:12, there are three (hjones, rwilliams, apond). Use this search to show the maximum concurrent users for any particular time:
<your search here> sourcetype=login_data | concurrency duration=ReqTime | timechart max(concurrency)
Solution
Use the metadata command, which reports high-level information about hosts, sources, and source types in the Splunk indexes. This is what is used to create the Summary Dashboard. Note the pipe character is at the beginning of this search, because were not retrieving events from a Splunk index, rather were calling a data-generating command (metadata).
80
Use the following search to take the information on hosts, sort it so the least recently referenced hosts are first, and display the time in a readable time format:
| metadata type=hosts | sort recentTime | convert ctime(recentTime) as Latest_Time
Youll quickly see which hosts havent logged data lately. To learn more about the metadata command, see http://splunk.com/goto/ book#metadata
Solution
To search for specific parts of your data, classify your events using tags and event types. Tags are simpler but event types are more powerful (tags and event types are discussed in Chapter 5). You might wonder how this categorization of data comes under monitoring. Thats because when you categorize data using tags and event types, you not only categorize the data you have today, but you teach Splunk to categorize data like that every time it shows up. You are teaching Splunk to be on the lookout for data that has certain characteristics. Think of tags and event types like putting out an all points bulletin (APB) for your data.
Using Tags
You can classify simple field=value pairs using tags. For example, classify events that have host=db09 as a database host by tagging that field value. This creates a tag::host field having a value of database, on events with host=db09. You can then use this custom classification to generate reports. Here are a couple of examples that use tags. Show the top ten host types (good for bar or pie charts):
... | top 10 tag::host
Exploring Splunk
Because event types are not specific to a dimension, such as hosts, user type, or error codes, they are all in a common namespace, jumbled together. A search for top eventtypes might return database_host and web_error, which is probably not what you want because youd be comparing apples to oranges. Fortunately you can filter which event types you report on, using the eval command, if you use a common naming convention for your event types. As an example, using event types, compare how the various host types perform (displayed as a timechart), using only event types that end in _host:
| eval host_types = mvfilter(match(eventtype, _host$)) | timechart avg(delay) by host_types
Solution
For this solution, well use the example of music data to show the top 10 most played artists today and their average position for the month. Assume the events have an artist field and a sales field that tells how many units were sold at a particular time. Well use the sum of sales as our metricsum(sales)but we could use any other metric. The full search looks daunting at first, but you can break it down into simple steps: 1. Get the monthly rankings by artist. 2. Get the daily rankings by artist and append them to the results.
82
3. Use stats to join the monthly and daily rankings by artist. 4. Use sort and eval to format the results.
The earliest=-30d@d tells Splunk to retrieve events starting at 30 days ago (in other words, get events from the last month). stats calculates the sums of sales for each artist as the month_sales field. You now have a row for each artist, with two columns: month_sales and artist. sort 10 month_sales keeps only those rows with the ten largest month_sales values, in sorted order from largest to smallest. The streamstats command adds one or more statistics to each event, based on the current value of the aggregate at the time the event is seen (not on the results as a whole, like the stats command does). Effectively, streamstats count as MonthRank assigns the first result MonthRank=1, the second result MonthRank=2, and so on.
Exploring Splunk
Summary
Putting it all together, the search is as follows:
sourcetype=music_sales earliest=-30d@d | stats sum(sales) as month_sales by artist | sort 10 - month_sales | streamstats count as MonthRank | append [ search sourcetype=music_sales earliest=-1d@d | stats sum(sales) as day_sales by artist | sort 10 - day_sales | streamstats count as DayRank ] | stats first(MonthRank) as MonthRank first(DayRank) as DayRank by artist | eval diff=MonthRank-DayRank | sort DayRank | table DayRank, artist, diff, MonthRank
Variations
Here, we used the sum of sales as our metricsum(sales)but we could use any metric, such as min(sales), or change the time ranges to compare last week to this week. To learn more about the streamstats command, see http://splunk.com/goto/ book#streamstats
84
Solution
To see a drop over the past hour, well need to look at results for at least the past two hours. Well look at two hours of events, calculate a separate metric for each hour, and then determine how much the metric has changed between those two hours. The metric were looking at is the count of the number of events between two hours ago and the last hour. This search compares the count by host of the previous hour with the current hour and filters those where the count dropped by more than 10%:
earliest=-2h@h latest=@h | stats count by date_hour,host | stats first(count) as previous, last(count) as current by host | where current/previous < 0.9
The first condition (earliest=-2h@h latest=@h) retrieves two hours worth of data, snapping to hour boundaries (e.g., 2-4pm, not 2:01-4:01pm). We then get a count of the number of those events per hour and host. Because there are only two hours (two hours ago and one hour ago), stats first(count) returns the count from two hours ago and last(count) returns the count from one hour ago. The where clause returns only those events where the current hours count is less than 90% of the previous hours count (which shows that the percentage dropped 10%). As an exercise for you, think about what will go wrong with this search when the time span crosses midnight. Do you see how to correct it by adding first(_time) to the first stats command and sorting by that new value?
Variations
Instead of the number of events, use a different metric, such as the average delay or minimum bytes per second, and consider different time ranges, such as day over day.
Solution
First, run a search over all the events and mark whether they belong to this week or last week. Next, adjust the time value of last weeks events to look like this weeks events (so they graph over each other on the same time range). Finally create a chart.
85
Exploring Splunk
Lets get results from the last two weeks, snapped to the beginning of the week:
earliest=-2w@w latest=@w
Adjust last weeks events to look like they occurred this week:
eval _time = if (marker==last week, _time + 7*24*60*60, _time)
Chart the desired metric, using the week marker we set up, such as a timechart of the average bytes downloaded for each week:
timechart avg(bytes) by marker
This produces a timechart with two labeled series: last week and this week. Putting it all together:
earliest=-2w@w latest=@w | eval marker = if (_time < relative_time(now(), -1w@w), last week, this week) | eval _time = if (marker==last week, _time + 7*24*60*60, _time) | timechart avg(bytes) by marker
If you use this pattern often, youll want to save it as a macro to reuse it.
Variations
Explore different time periods, such as day over day, with different chart types. Try different charts other than avg(bytes). Alternatively, remove the snapping to week boundaries by setting earliest=-2w, not using a latest value (it defaults to now), and changing the relative_time() argument to 1w.
86
Solution
Use a moving trendline to help you see the spikes. Run a search followed by the trendline command using a field you want to create a trendline for. For example, on web access data, we could chart an average of the bytes field:
sourcetype=access* | timechart avg(bytes) as avg_bytes
To add another line/bar series to the chart for the simple moving average (sma) of the last 5 values of bytes, use this command:
trendline sma5(avg_bytes) as moving_avg_bytes
If you want to clearly identify spikes, you might add an additional series for spikeswhen the current value is more than twice the moving average:
eval spike=if(avg_bytes > 2 * moving_avg_bytes, 10000, 0)
The 10000 here is arbitrary and you should choose a value relevant to your data that makes the spike noticeable. Changing the formatting of the Y-axis to Log scale also helps. Putting this together our search is:
sourcetype=access* | timechart avg(bytes) as avg_bytes | trendline sma5(avg_bytes) as moving_avg_bytes | eval spike=if(avg_bytes > 2 * moving_avg_bytes, 10000, 0)
Variations
We used a simple moving average for the last 5 results (sma5). Consider a different number of values (for example, sma20), and other moving average types, such as exponential moving average (ema) and weighted moving average (wma). Alternatively, you can bypass the charting altogether and replace the above eval with a where clause to filter your results.
... | where avg_bytes > 2 * moving_avg_bytes
And by looking at the table view or as an alert, youll only see the times when the avg_bytes spiked. To learn more about the trendline search command, see http://splunk.com/goto/ book#trendline
87
Exploring Splunk
Solution
To produce these sparklines in your tables, simply enclose your stats or chart functions in the sparkline() function. Here, well use the example of web access logs. We want to create a small graph showing how long it took for each of our web pages to respond (assuming the field spent is the amount of time spent serving that web page). We have many pages, so well sort them to find the pages accessed the most (i.e., having the largest count values). The 5m tells Splunk to show details down to a 5-minute granularity in the sparklines.
sourcetype=access* | stats sparkline(avg(spent),5m), count by file | sort - count
Run this search over the last hour. The result is a series of mini graphs showing how long it took each page to load on average, over time.
Variations
Try using different functions other than avg. Try using values different than 5m for granularity. If you remove the 5m granularity altogether, Splunk automatically picks the right value for the search timespan.
Solution
Use the spath command, introduced in Splunk 4.3, to extract values from XML- and JSON-formatted data. In this example, well assume a source type of book data in XML or JSON. Well run a search that returns XML or JSON as the events text, and use the spath command to extract the author name:
sourcetype=books | spath output=author path=catalog.book.author 88
When called with no path argument, spath extracts all fields from the first 5000 characters, which is configurable, creating fields for each path element. Paths have the form foo.bar.baz. Each level can have an optional array index, indicated by curly braces (e.g., foo{1}.bar). All array elements can be represented by empty curly brackets (e.g., foo{}). The final level for XML queries can also include an attribute name, also enclosed by curly brackets (e.g., foo.bar{@title}) and prefaced with a @. After you have the extracted field, you can report on it:
... | top author
Variations
An older search command called xmlkv extracts simple XML key-value pairs. For example, calling ... | xmlkv on events that have a value of <foo>bar</foo> creates a field foo with a value bar. Another older command that extracts fields from XML is xpath.
Solution
Using commands to extract fields is convenient for quickly extracting fields that are needed temporarily or that apply to specific searches and are not as general as a source or source type.
Regular Expressions
The rex command facilitates field extraction using regular expressions. For example, on email data, the following search extracts the from and to fields from email data using the rex command:
sourcetype=sendmail_syslog | rex From: (?<from>.*) To: (?<to>.*)
Delimiters
If youre working with multiple fields that have delimiters around them, use the extract command to extract them. Suppose your events look like this:
|height:72|age:43|name:matt smith|
89
Exploring Splunk
Variations
Try using multikv, spath, or xmlkv.
Alerting Recipes
Recall from Chapter 5 that an alert is made up of two parts: A condition: An interesting thing you want to know about. An action: what to do when that interesting thing happens. In addition, you can use throttling to prevent over-firing of repeated alerts of the same type. For example: I want to get an email whenever one of my servers has a load above a certain percentage. I want to get an email of all servers whose load is above a certain percentage, but dont spam my inbox, so throttle the alerts for every 24 hours.
Solution
The following search retrieves events with load averages above 80% and calculates the maximum value for each host. The top source type comes with the Splunk Unix app (available at splunkbase.com), and is fed data from the Unix top command every 5 seconds:
sourcetype=top load_avg>80 | stats max(load_avg) by host
Set up the alert in the following way, using the instructions from Chapter 5: Alert condition: alert if the search returns at least one result. Alert actions: email and set subject to: Server load above 80%. Suppress: 1 hour.
90
Variations
Change alert conditions and suppression times
Solution
The following search retrieves weblog events, calculates the 95th percentile response time for each unique web address (uri_path), and finally filters out any values where the 95th percentile is less than 200 milliseconds: sourcetype=weblog | stats perc95(response_time) AS resp_time_95 by uri_path | where resp_time_95>200 Set up the alert in the following way: Alert condition: alert if the search returns at least X results (the number of slow web requests you think merit an alert being fired). Alert actions: email, with subject set to: Web servers running slow. If youre running in the cloud (for example, on Amazon EC2), maybe start new web server instances. Suppress: 1 hour.
Solution
The following search retrieves weblog events and returns a table of hosts that have fewer than 10000 requests (over the timeframe that the search runs):
sourcetype=weblog | stats count by host | where count<10000
91
Exploring Splunk
Set up the alert in the following way: Alert condition: alert if the search returns at least X results (the number of hosts you think merit an alert being fired). Alert actions: trigger a script that removes servers from the load balancer and shuts them down. Suppress: 10 minutes.
92
93
7 Grouping Events
These recipes offer quick solutions to some of the most common, realworld problems we see that can be solved by grouping events.
Introduction
There are several ways to group events. The most common approach uses either the transaction or stats command. But when should you use transaction and when should you use stats? The rule of thumb: If you can use stats, use stats. Its faster than transaction, especially in a distributed environment. With that speed, however, comes some limitations. You can only group events with stats if they have at least one common field value and if you require no other constraints. Typically, the raw event text is discarded. Like stats, the transaction command can group events based on common field values, but it can also use more complex constraints such as total time span of the transaction, delays between events within the transaction, and required beginning and ending events. Unlike stats, transaction retains the raw event text and field values from the original events, but it does not compute any statistics over the grouped events, other than the duration (the delta of the _time field between oldest and newest events in the transaction) and the eventcount (the total number of events in the transaction). The transaction command is most useful in two specific cases: When unique field values (also known as identifiers) are not sufficient to discriminate between discrete transactions. This is the case when an identifier might be reused, for example in web sessions identified by cookie/client IP. In this case, timespans or pauses should be used to segment the data into transactions. In other cases, when an identifier is reused, for example in DHCP logs, a particular message may identify the beginning or end of a transaction. When it is desirable to see the raw text of the events rather than an analysis on the constituent fields of the events.
95
Exploring Splunk
Again, when neither of these cases is applicable, it is a better practice to use stats, as search performance for stats is generally better than transaction. Often there is a unique identifier, and stats can be used. For example, to compute statistics on the duration of trades identified by the unique identifier trade_id, the following searches yield the same answer:
| transaction trade_id | chart count by duration | stats range(_time) as duration by trade_id | chart count by duration
The second search is more efficient. However, if trade_id values are reused but the last event of each trade is indicated by the text END, the only viable solution is:
| transaction trade_id endswith=END | chart count by duration
If, instead of an end condition, trade_id values are not reused within 10 minutes, the most viable solution is:
| transaction trade_id maxpause=10m | chart count by duration
Finally, a brief word about performance. No matter what search commands you use, its imperative for performance that you make the base search as specific as possible. Consider this search:
sourcetype=x | transaction field=ip maxpause=15s | search ip=1.2.3.4
Here we are retrieving all events of sourcetype=x, building up transactions, and then throwing away any that dont have an ip=1.2.3.4. If all your events have the same ip value, this search should be:
sourcetype=x ip=1.2.3.4 | transaction field=ip maxpause=15s
This search retrieves only the events it needs to and is much more efficient. More about this is in Finding Specific Transactions later in this chapter.
96
Solution
Typically, you can join transactions with common fields like:
| transaction username
But when the username identifier is called different names (login, name, user, owner, and so on) in different data sources, you need to normalize the field names. If sourcetype A only contains field_A and sourcetype B only contains field_B, create a new field called field_Z which is either field_A or field_B, depending on which is present in an event. You can then build the transaction based on the value of field_Z.
sourcetype=A OR sourcetype=B | eval field_Z = coalesce(field_A, field_B) | transaction field_Z
Variations
Above we invoked coalesce to use whichever field was present on an event, but sometimes you will need to use some logic to decide which field to use in unifying events. evals if or case functions may come in handy.
97
Exploring Splunk
Solution
Suppose you are searching for user sessions starting with a login and ending with a logout:
| transaction userid startswith=login endswith=logout
You would like to build a report that shows incomplete transactionsusers who have logged in but not logged out. How can you achieve this? The transaction command creates an internal boolean field named closed_txn to indicate if a given transaction is complete or not. Normally incomplete transactions are not returned, but you can ask for these evicted partial transactions by specifying the parameter keepevicted=true. Evicted transactions are sets of events that do not match all the transaction parameters. For example, the time requirements are not met in an evicted transaction. Transactions that fulfill all the requirements are marked as complete by having the field closed_txn set to 1 (rather than 0 for incomplete transactions). So the pattern for finding incomplete transactions would generally be:
| transaction <conditions> keepevicted=true | search closed_txn=0
In our case, however, theres a wrinkle. An endswith condition not matching will not set the closed_txn=0 because events are processed from newest to oldest. Technically, the endswith condition starts the transaction, in terms of processing. To get around this, we need to filter transactions based on the closed_txn field, as well as make sure that our transactions dont have both a login and a logout:
| transaction userid keepevicted=true | search closed_txn=0 NOT (login logout) startswith=login endswith=logout
Variations
A variation on this solution is to use stats, if your transactions dont have startswith/endswith conditions or time constraints, and you dont care about preserving the actual transaction. In this example, you just want the userid of users who havent logged out. First, we can search specifically for login and logout events:
action=login OR action=logout
98
Next, for each userid, we use stats to keep track of the action seen per userid. Because events are in time descending order, the first action is the most recent.
| stats first(action) as last_action by userid
Finally, we keep only events where the most recent user action was a login:
| search last_action=login
At this point we have the list of all userid values where the last action was a login.
Solution
The basic approach is to use the eval command to mark the points in time needed to measure the different durations, and then calculate the durations between these points using eval after the transaction command. Note: In this chapter, sample events in a transaction are numbered so that we can refer to them as event1, event2, and so on. For example, suppose you have a transaction made up of four events, unified by a common id field and you want to measure the duration of phase1 and phase2:
[1] Tue Jul 6 09:16:00 id=1234 start of event. [2] Tue Jul 6 09:16:10 id=1234 phase1: do some work. [3] Tue Jul 6 09:16:40 id=1234 phase2: do some more. [4] Tue Jul 6 09:17:00 id=1234 end of event.
By default, the timestamp of this transaction-based event will be from the first event (event1), and the duration will be the difference in time between event4 and event1. To get the duration of phase1, well need to mark timestamps for event2 and event3. evals searchmatch function works well for this example,
99
Exploring Splunk
but you have the full range of eval functions available to you for more complex situations.
| eval p1start = if(searchmatch(phase1), _time, null()) | eval p2start = if(searchmatch(phase2), _time, null())
Finally we calculate the duration for each transaction, using the values calculated above.
| eval p1_duration = p2start - p1start | eval p2_duration = (_time + duration) - p2start
In this example, we calculated the time of the last event by added _time (the time of the first event) and adding duration to it. Once we knew the last events time, we calculated p2_duration as the difference between the last event and the start of phase2.
Variations
By default, the transaction command makes multivalued fields out of the field values seen in more than one of a transactions composite events, but those values are just kept as an unordered, deduplicated bag of values. For example, if a transaction is made of 4 events, and those events each have a name field as followsname=matt, name=amy, name=rory, name=amythen the transaction made up of four events will have a multivalued field name with values of amy, matt, and rory. Note that weve lost the order in which the events occurred and weve missed an amy! To keep the entire list of values, in order, use the mvlist option. Here, were building a transaction and keeping the list of times for its events:
| eval times=_time | transaction id mvlist=times
From here we can add on eval commands to calculate differences. We can calculate the time between the first and second event in the transaction as follows:
| eval diff_1_2 = mvindex(times,1) - mvindex(times,0)
100
Solution
At first, you might be tempted to use the transaction or stats command. For example, this search returns, for each unique userid, the first value seen for each field:
| stats first(*) by userid
Note that this search returns the first value of each field seen for events that have the same userid. It provides a union of all events that have that user ID, which is not what we want. What we want is the first event with a unique userid. The proper way to do that is with the dedup command:
| dedup userid
Variations
If you want to get the oldest (not the newest) event with a unique userid, use the sortby clause of the dedup command:
| dedup userid sortby + _time
Solution
Suppose you have events as follows:
2012-07-22 11:45:23 code=239 2012-07-22 11:45:25 code=773 2012-07-22 11:45:26 code=-1 2012-07-22 11:45:27 code=-1 2012-07-22 11:45:28 code=-1 2012-07-22 11:45:29 code=292 2012-07-22 11:45:30 code=292 2012-07-22 11:45:32 code=-1 2012-07-22 11:45:33 code=444 2012-07-22 11:45:35 code=-1 2012-07-22 11:45:36 code=-1
Your goal is to get 7 events, one for each of the code values in a row: 239, 773, -1, 292, -1, 444, -1. You might be tempted to use the transaction command as follows:
| transaction code
101
Exploring Splunk
Using transaction here is a case of applying the wrong tool for the job. As long as we dont really care about the number of repeated runs of duplicates, the more straightforward approach is to use dedup, which removes duplicates. By default, dedup will remove all duplicate events (where an event is a duplicate if it has the same values for the specified fields). But thats not what we want; we want to remove duplicates that appear in a cluster. To do this, dedup has a consecutive=true option that tells it to remove only duplicates that are consecutive.
| dedup code consecutive=true
Solution
Suppose we have a basic transaction search that groups all events by a given user (clientip-cookie pair), but splits the transactions when the user is inactive for 10 minutes:
| transaction clientip, cookie maxpause=10m
Ultimately, our goal is to calculate, for each clientip-cookie pair, the difference in time between the end time of a transaction and the start time of a more recent (i.e. previous in order of events returned) transaction. That time difference is the gap between transactions. For example, suppose we had two pseudo transactions, returned from most recent to oldest:
T1: start=10:30 end=10:40 clientip=a cookie=x T2: start=10:10 end=10:20 clientip=a cookie=x
The gap in time between these two transactions is the difference between the start time of T1 (10:30) and the end time of T2 (10:20), or 10 minutes. The rest of this recipe explains how to calculate these values. First, we need to calculate the end time of each transaction, keeping in mind that the timestamp of a transaction is the time that the first event occurred and the duration is the number of seconds that elapsed between the first and last event in the transaction:
| eval end_time = _time + duration
102
Next we need to add the start time from the previous (i.e., more recent) transaction to each transaction. That will allow us to calculate the difference between the start time of the previous transaction and our calculated end_time. To do this we can use streamstats to calculate the last value of the start time (_time) seen in a sliding window of just one transaction global=false and window=1and to ignore the current event in that sliding windowcurrent=false. In effect, were instructing streamstats to look only at the previous events value. Finally, note that were specifying that this window is only applicable to the given user (clientip-cookie pair):
| streamstats first(_time) as prev_starttime global=false window=1 current=false by clientip, cookie
At this point, the relevant fields might look something like this:
T1: _time=10:00:06, duration=4, end_time=10:00:10 T2: _time=10:00:01, duration=2, end_time=10:00:03 prev_starttime=10:00:06 T3: _time=10:00:00, duration=0, end_time=10:00:01 prev_starttime=10:00:01
Now, we can finally calculate the difference in time between the previous transactions start time (prev_starttime) and the calculated end_ time. That difference is the gap between transactions, the amount of time (in seconds) passed between two consecutive transactions from the same user (clientip-cookie pair).
| eval gap_time = prev_starttime end_time
At this point you can do report on gap_time values. For example, what is the biggest and average gap length per user?
| stats max(gap_time) as max, avg(gap_time) as avg by clientip, cookie
103
Exploring Splunk
Variations
Given a simpler set of requirements, we can calculate the gaps between events in a much simpler way. If the only constraints for transactions are startswith and endswithmeaning there are no time (e.g., maxpause=10m) or field (e.g., clientip, cookie) constraints then we can calculate the gaps in transactions by simply swapping the startswith and endswith values. For example, given these events:
10:00:01 login 10:00:02 logout 10:00:08 login 10:00:10 logout 10:00:15 login 10:00:16 logout
Rather than:
| transaction startswith=login endswith=logout
We can make the gaps between the standard transactions (login then logout) be the transactions instead (logout then login):
| transaction endswith=login startswith=logout
From here the transactions are the gaps between logout and login events, so we can subsequently calculate gap statistics using duration:
| stats max(duration) as max, avg(duration) as avg
Another variation on the theme of finding time between events is if you are interested in the time between a given event (event A) and the most proximate newer event (event B). By using streamstats, you can determine the range of times between the last two events, which is the difference between the current event and the previous event:
| streamstats range(_time) as duration window=2
Solution
A general search for all transactions might look like this:
sourcetype=email_logs | transaction userid
104
Suppose, however, that we want to identify just those transactions where there is an event that has the field/value pairs to=root and from=msmith. You could use this search:
sourcetype=email_logs | transaction userid | search to=root from=msmith
The problem here is that you are retrieving all events from this sourcetype (potentially billions), building up all the transactions, and then throwing 99% of the data right in to the bit bucket. Not only is it slow, but it is also painfully inefficient. You might be tempted to reduce the data coming in as follows:
sourcetype=email_logs (to=root OR from=msmith) | transaction userid | search to=root from=msmith
Although you are not inefficiently retrieving all the events from the given sourcetype, there are two additional problems. The first problem is fatal: you are getting only a fraction of the events needed to solve your problem. Specifically, you are only retrieving events that have a to or a from field. Using this syntax, you are missing all the other events that could make up the transaction. For example, suppose this is what the full transaction should look like:
[1] 10/15/2012 10:11:12 userid=123 to=root [2] 10/15/2012 10:11:13 userid=123 from=msmith [3] 10/15/2012 10:11:14 userid=123 subject=serious error [4] 10/15/2012 10:11:15 userid=123 server=mailserver [5] 10/15/2012 10:11:16 userid=123 priority=high
The above search will not get event3, which has subject, or event4, which has server, and it will not be possible for Splunk to return the complete transaction. The second problem with the search is that to=root might be very common and you could actually be retrieving too many events and building too many transactions. So what is the solution? There are two methods: using subsearches and using the searchtxn command.
105
Exploring Splunk
Using Subsearches
Your goal is to get all the userid values for events that have to=root, or from=msmith. Pick the more rare condition to get the candidate userid values as quickly as possible. Lets assume that from=msmith is more rare:
sourcetype=email_logs from=msmith | dedup userid | fields userid
Now that you have the relevant userid values, you can search for just those events that contain these values and more efficiently build the transaction:
| transaction userid
Finally, filter the transactions to make sure that they have to=root and from=msmith (its possible that a userid value is used for other to and from values):
| search to=root AND from=msmith
Putting this all together, with the first search as a subsearch passing the userid to the outer search:
[ search sourcetype=email_logs from=msmith | dedup userid | fields userid ] | transaction userid | search to=root from=msmith
Use searchtxn
The searchtxn (search transaction) command does the subsearch legwork for you. It searches for just the events needed to build a transaction. Specifically, searchtxn does transitive closure of fields needed for transaction, running the searches needed to find events for transaction, then running the transaction search, and finally filtering them to the specified constraints. If you were unifying your events by more than one field, the subsearch solution becomes tricky. searchtxn also determines which seed condition is rarer to get the fastest results. Thus, your search for email transactions with to=root and from=msmith, simply becomes:
| searchtxn email_txn to=root from=msmith
106
But what is email_txn in the above search? It refers to a transaction-type definition that has to be created in a Splunk config filetransactiontype.conf. In this case, transactiontype.conf might look like:
[email_txn] fields=userid search = sourcetype=email_logs
The result of that search gives searchtxn the list of the userids to operate upon. It then runs another search for:
sourcetype=email_logs (userid=123 OR userid=369 OR userid=576 ...) | transaction name=email_txn | search to=root from=msmith
This search returns the needle-in-the-haystack transactions from the results returned by the searchtxn search. Note: If the transaction commands field list had more than one field, searchtxn would automatically run multiple searches to get a transitive closure of all values needed.
Variations
Explore using multiple fields with the searchtxn command. If youre interested in getting the relevant events and dont want searchtxn to actually build the transactions, use eventsonly=true.
Solution
One solution is to use subsearches and look for the last instance of this scenario. Do a subsearch for root logins and return starttimeu and endtimeu, which then scopes the parent search to those time boundaries
107
Exploring Splunk
when searching for either a failed_login or a password_changed from the same src_ip:
[ search sourcetype=login_data action=login user=root | eval starttimeu=_time - 60 | eval endtimeu=_time + 60 | return starttimeu, endtimeu, src_ip ] action=failed_login OR action=password_changed
The downside to this approach is that it only finds the last instance of a login and possibly has false positives, as it doesnt distinguish between failed_logins afterward or password_changed before. Instead, the problem can be solved by filtering the events down to just those we care about:
sourcetype=login_data ( action=login OR action=failed_login OR action=password_changed )
The transaction should consist of events from the same src_ip that start with a failed_login and end with a password_changed. Furthermore, the transaction should span no more than 2 minutes from start to finish:
| transaction src_ip maxspan=2m startswith=(action=failed_login) endswith=(action=password_changed)
Finally, you need to filter for only those transactions that have user=root. Since a failed_login event often wont have user=root (the user hasnt logged in), it is necessary to filter after the transaction:
| search user=root
Conversely, if it was certain that user=root was in all the relevant events, it should be added to the search clause, skipping the final filtering (search user=root).
108
Solution
Given the following ideal transaction that starts with a login action:
[1] 10:11:12 src_ip=10.0.0.5 user=root action=login [2] 10:11:13 src_ip=10.0.0.5 user=root action=cd / [3] 10:11:14 src_ip=10.0.0.5 user=root action=rm -rf * [4] 10:11:15 src_ip=10.0.0.5 user=root server=echo lol
The obvious search choice is to use transaction that startswith the login action:
... | transaction src_ip, user startswith=(action=login) maxevents=4
The problem is that you will get transactions that dont have action=login. Why? The startswith option does not tell transaction to return only transactions that actually begin with the string youre supplying. Rather it tells transaction that when it encounters a line that matches the startswith directive, it is the beginning of a new transaction. However, transactions will also be made for different values of src_ip, regardless of the startswith condition. To avoid this, add a filtering search command after the transaction search above:
| search action=login
The transactions returned will start with action=login and include the next three events for the src_ip and user. Note: If there are less than three events between two logins, the transaction will be smaller than 4 events. The transaction command adds an eventcount field to each transaction, which you can then use to further filter transactions.
Grouping Groups
Problem
You need to build transactions with multiple fields that change value within the transaction.
109
Exploring Splunk
Solution
Suppose you want to build a transaction from these four events, unified by the host and cookie fields:
[1] host=a [2] host=a cookie=b [3] host=b [4] host=b cookie=b
Because the value of host changes during this transaction, a simple transaction command unfortunately will make two distinct transactions:
| transaction host, cookie
When it sees event1 and event2, it builds a transaction with host=a, but when it gets to event3, which has a different value for host (host=b), it puts event3 and event4 into a separate transaction of events that have host=b. The result is that these four events are turned into two transactions, rather than one transaction based on the common value of cookie: Transaction1:
[1] host=a [2] host=a cookie=b
Transaction2:
[3] host=b [4] host=b cookie=b
You might be tempted to remove the host field from the transaction command and unify the transactions based on the cookie value. The problem is that this would create a transaction with event2 and event4, ignoring event1 and event3 because they do not have a cookie value. The solution to this problem is to build a transaction on top of a transaction:
| transaction host, cookie | transaction cookie
This second transaction command will take the above two transactions and unify them with a common cookie field. Note that if you care about the calculated fields duration and eventcount, they are now incorrect. The duration after the second transaction command will be the difference between the transactions it unifies rather than the events that comprise it. Similarly, the eventcount will be the number of transactions it unified, rather that the correct number of events. To get the correct eventcount after the first transaction command, create a field called mycount to store all the eventcount values, and then
110
after the second transaction command sum all the mycount values to calculate the real_eventcount. Similarly, after the first transaction command, record the start and end times of each transaction and then after the second transaction command get the minimum start time and the maximum end time to calculate the real_duration:
| transaction host, cookie | eval mycount=eventcount | eval mystart=_time | eval myend=duration + _time | transaction cookie mvlist=mycount | eval first = min(mystart) | eval last=max(myend) | eval real_duration=last-first | eval real_eventcount = sum(mycount)
111
8 Lookup Tables
These lookup table recipes briefly show advanced solutions to common, real-world problems. Splunks lookup feature lets you reference fields in an external CSV file that match fields in your event data. Using this match, you can enrich your event data with additional fields. Note that we do not cover external scripted lookups or time-based lookups.
Introduction
These recipes extensively use three lookup search commands: lookup, inputlookup, and outputlookup.
lookup
For each event, this command finds matching rows in an external CSV table and returns the other column values, enriching the events. For example, an event with a host field value and a lookup table that has host and machine_type rows, specifying | lookup mylookup host adds the machine_type value corresponding to the host value to each event. By default, matching is case-sensitive and does not support wildcards, but you can configure these options. Using the lookup command matches values in external tables explicitly. Automatic lookups, which are set up using Splunk Manager, match values implicitly. To learn more about configuring automatic lookups, see http://splunk.com/goto/book#autolookup.
inputlookup
This command returns the whole lookup table as search results. For example, | inputlookup mylookup returns a search result for each row in the table mylookup, which has two field values: host and machine_type.
outputlookup
You might wonder how to create a lookup table. This command outputs the current search results to a lookup table on disk. For example, | outputlookup mytable.csv saves all the results into mytable.csv.
113
Exploring Splunk
Further Reading
http://splunk.com/goto/book#lookuptutorial http://splunk.com/goto/book#externallookups
Solution
There are several solutions. Using an explicit lookup, you can simply use the eval coalesce function:
| lookup mylookup ip | eval domain=coalesce(domain,unkno wn)
Using automatic lookups, theres a setting for that. Go to Manager >> Lookups >> Lookup Definition >> mylookup, select the Advanced options checkbox, and make the following changes: Set Minimum matches: 1 Set Default matches: unknown Save the changes.
Solution
Splunk permits you to use reverse lookup searches, meaning you can search for the output value of an automatic lookup and Splunk can translate that into a search for the corresponding input fields of the lookup.
114
For example, suppose you have a lookup table mapping machine_name to owner:
machine_name, owner webserver1,erik dbserver7,stephen dbserver8,amrit
If your events have a machine_name field and if you wanted to search for a particular owner, erik, you might use an expensive search, like this:
| lookup mylookup machine_name | search owner=erik
This search is expensive because youre retrieving all of your events and filtering out any that dont have erik as the owner. Alternatively, you might consider an efficient but complicated subsearch:
[ inputlookup mylookup | search owner=erik | fields machine_name]
This search retrieves all the rows of the lookup table, filters out any rows that dont have erik as the owner, and returns a big OR expression of machine names for Splunk to ultimately run a search on. But none of this is necessary. If youve set up an automatic lookup table, you can simply ask Splunk to search for owner=erik. Thats it. Effectively, Splunk does the subsearch solution behind the scenes, generating the search of OR clauses for you. Note: Splunk also does automatic reverse searching for defined field extraction, tags, and eventtypesyou can search for the value that would be extracted, tagged, or typed, and Splunk retrieves the correct events.
Variations
Using automatic lookups and the built-in reverse lookups, you can recreate Splunks tagging system. For example, make a mapping from host to your field called host_tag. Now you can search for events based on their host_tag and not only the host value. Many people find it easier to maintain lookup tables than the Splunk tags.
115
Exploring Splunk
Solution
After weve retrieved events, we do our initial lookup against local_dns. csv, a local lookup file:
... | lookup local_dns ip OUTPUT hostname
If the lookup doesnt match, the hostname field is null for that event. We now perform the second, expensive lookup on events that have no hostname. By using OUTPUTNEW instead of OUTPUT, the lookup will only run on events that have a null value for hostname.
... | lookup dnslookup ip OUTPUTNEW hostname
Solution
You can do this manually by running sequential lookup commands. For example, if a first lookup table takes values of field A and outputs values of field B, and a second lookup table takes values of field B and outputs values of field C:
| lookup my_first_lookup A | lookup my_second_lookup B
More interestingly, this can be done using automatic lookups, where this chaining happens automatically. It is imperative, however, that the lookups are run in the correct order, by using the alphanumeric precedence of property names.
116
Go to Manager >> Lookups >> Automatic lookups, and create two automatic lookups, making sure that the one to run later has a named value greater than the previous lookup name. For example:
0_first_lookup = my_first_lookup A OUTPUT B 1_second_lookup = my_second_lookup B OUTPUT C
Note: Using lookup chaining as shown in this recipe, reverse lookups as in the Using Reverse Lookups recipe do not work because Splunk is currently not able to reverse multiple steps of automatic field lookups (e.g., automatically converting a search for chained output field value C=baz into a search for input field value A=foo).
Solution
If you were to simply do:
<some search> | outputlookup mylookupfile.csv
you might encounter two problems. First, events have many fields, including internal fields like _raw, and _time, which you dont want in your lookup table. Second, of the fields you do care about, most likely there are duplicate values on the events retrieved. To handle the first problem, we wont use the fields command because its inconvenient to remove internal fields. Instead, well use the table command to better limit the fields to what we explicitly specify. To solve the second problem, use the dedup command. Putting it all together:
| table field1, field2 | dedup field1 | outputlookup mylookupfile.csv
Exploring Splunk
Solution
The basic procedure is to get the set of results you want to append to the lookup table, use inputlookup to append the current contents of the lookup, and use outputlookup to write the lookup. The command looks like this:
your_search_to_retrieve_values_needed | fields the_interesting_fields | inputlookup mylookup append=true | dedup the_interesting_fields | outputlookup mylookup
First, we told Splunk to retrieve the new data and retain only the fields needed for the lookup table. Next, we used inputlookup to append the existing rows in mylookup, by using the append=true option. Next we remove duplicates with dedup. Finally, we used outputlookup to output all these results to mylookup.
Variations
Suppose you want your lookup table to have only the most recent 30 days of values. You can set up a lookup table to be updated daily from a scheduled search. When you set up your scheduled search to output the lookup table and before the outputlookup command, add a condition that filters out data older than 30 days:
... | where _time >= now() - (60*60*24*30)
where 60*60*60*24*30 is the number of seconds in 30 days. Building on the previous example, our search becomes:
your_search_to_retrieve_values_needed | fields just_the_interesting_fields | inputlookup mylookup append=true | where _time >= now() - (60*60*24*30) | outputlookup mylookup
Obviously, youll also need to keep _time as one of the fields in your lookup table.
118
Solution
When you have very large lookup tables and notice that performance is affected, there are several solutions. First, consider whether you can make smaller, more specific lookup tables. For example, if some of your searches need only a subset of the rows and columns, consider making a concise version of the lookup for those searches. The following search reduced the size of mylookup table by reducing the rows to those that meet some condition, removing duplicates, removing all columns but a needed input and output field, and finally writing the results to the mylookup2 table.
| inputlookup mylookup | search somecondition | dedup someinputfield | table someinputfield, someoutputfield | outputlookup mylookup2
If you cant reduce the size of the lookup table, there are other solutions. If your Splunk installation has several indexers, those indexers automatically replicate your lookup table. But if the lookup file is very large (e.g., 100MB), this may take a very long time long. One solution, if your bundles are being frequently updated, is to disable bundle replication and instead use NFS to make the bundles available to all nodes. See: http://splunk.com/goto/book#mount Another solution, if your lookup table doesnt change too often and you cannot rely on shared and mounted drives, is to use local lookups. To prevent the lookup from being replicated and distributed, add the lookup table to the replicationBlacklist in distsearch.conf. (See http://splunk.com/goto/book#distributed) Copy the lookup table CSV file to each of your indexers in
$SPLUNK_HOME/etc/system/lookup
When you run the search, add local=true option to the lookup search command. Note: Lookup definitions defined to implicitly run via props.conf by their very nature are not local and must be distributed to indexers. Finally, consider moving away from large CSV files and consider using external lookups (usually leveraging a script that queries a database).
119
Exploring Splunk
Note: When a .csv lookup table reaches a certain size (10 MB by default), Splunk indexes it for faster access. By indexing the .csv file, Splunk can search rather than scan the table. To edit the size before a file is indexed, edit limits.confs lookup stanza and change the max_memtable_bytes value.
Solution
If events with particular field values are a small subset of your events, you can efficiently use subsearches to find relevant events. Use inputlookup in a subsearch to generate a large OR search of all the values seen in your lookup table. The size of the list returned from a subsearch can be 10,000 items in size (modifiable in limits.conf).
yoursearch [ inputlookup mylookup | fields ip ]
You can test what the subsearch returns by running the search that is inside the subsearch and appending the format command:
| inputlookup mylookup | fields ip | format
See: http://splunk.com/goto/book#subsearch
Variation I
Similarly, to retrieve events with values NOT in your lookup table, use a pattern like:
yoursearch NOT [ inputlookup mylookup | fields ip ]
120
Variation II
Alternatively, if you want values in your lookup table that are not matched in your data, use:
| inputlookup mylookup | fields ip | search NOT [ search yoursearch | dedup ip | fields ip ]
which takes all values in the lookup and filters out those that match your data.
Variation III
For massive lists, here is a tricky and efficient search pattern to find all the values in your events that are also in the lookup table: retrieve your events and then append the entire lookup table to the events. By setting a field (e.g., marker), we can keep track of whether a result (think row) is an event or a lookup table row. We can use stats to get the list of IP addresses that are in both lists (count>1):
yoursearch | eval marker=data | append [ inputlookup mylookup | eval marker=lookup ] | stats dc(marker) as list_count by ip | where list_count > 1
Note: Although the append command appears to be executing a subsearch, it is not. There is no limit on the number of results appended, unlike a subsearch, which has a default limit of 10k results. If you need to use this technique over a very large timespan, it is more efficient to use another lookup table to maintain long-term state. In short, schedule a search over a shorter time windowsuch as one daythat calculates the last time an IP was seen. Then, use a combination of inputlookup, dedup, and outputlookup to incrementally update that lookup table over the very long haul. This gives you a very quick resource to look at to know the most recent state. See the Appending Results to Lookup Tables recipe for specifics.
121
Exploring Splunk
Solution
By default, Splunk returns up to 100 matches for lookups not involving a time element. You can update it to return only one. Using the UI, go to Manager >> Lookups >> Lookup definitions and edit or create your lookup definition. Select the Advanced options checkbox and enter 1 for Maximum matches. Alternatively, you can edit the applicable transforms.conf. Add max_ matches=1 to your lookups stanza. See: http://splunk.com/goto/book#field_lookup
Variations
If your lookup table has duplicates that you want to remove, you can clean them with a search similar to:
| inputlookup mylookup | dedup host | outputlookup mylookup
This eliminates all but the first distinct occurrence of each host in the file.
Matching IPs
Problem
You have a lookup table with ranges of IP addresses that you want to match.
Solution
Suppose your events have IP addresses in them and you have a table of IP ranges and ISPs:
network_range, isp 220.165.96.0/19, isp_name1 220.64.192.0/19, isp_name2 ...
You can specify a match_type for a lookup. Unfortunately, this functionality isnt available in the UI but you can set it in the transforms.conf config file.
122
See: http://splunk.com/goto/book#transform
Variations
The available match_type values are WILDCARD, CIDR, and EXACT. EXACT is the default and does not need to be specified. Also in transforms.conf, you can specify whether lookup matching should be case sensitive (the default) or not. To have matching be case insensitive, use:
case_sensitive_match = False
Solution
Suppose you have a lookup table with URLs youd like to match on:
url, allowed *.google.com/*, True www.blacklist.org*, False */img/*jpg, False
By including wildcard (*) characters in your lookup table values, you can direct Splunk to match on wildcards. As in the Matching IPs recipe, you can specify a match_type for a lookup in the transforms.conf config file:
[mylookup] match_type = WILDCARD(url)
123
Exploring Splunk
Note: By default the maximum matches for lookup tables is 100, so if you have multiple rows that match, the output fields will have multiple values. For example, a url of www.google.com/img/pix.jpg would match the first and third row in the table above, and the allowed field would become a multivalued field with the values True and False. Usually this is not what you want. By setting the Maximum matches setting to 1, the first matching value will be used, and you case use the order of the table to determine precedence. You can find this setting at Manager >> Lookups >> Lookup Definition >> mylookup, after selecting the Advanced options checkbox.
Variations
This chapters first recipe dealt with default values when a lookup fails to match. Yet another way to accomplish this is with wildcard matching. Make the last item in your lookup table have a match value of *, and set the minimum and maximum matches for your lookup table to be 1.
124
125
Exploring Splunk
Multidimensional databases are designed for analyzing large groups of records. The term OLAP (On-Line Analytical Processing) has become almost synonymous with multidimensional database. OLAP tools enable users to analyze different dimensions of multidimensional data. Multidimensional databases are great for data mining and monthly reporting, but not for real-time events. Machine data is at a much lower level of detail than transactional data. Transactional data might store all of the product, shipping, and payment data associated with an online purchase. The machine data associated with this purchase would include thousands of records, or events, that track every users click, every page and image loaded, every ad requested, and so on. Machine data is not just about the finished result, or the destination, but about the entire journey! Because its so detailed, machine data can be used for a wide variety of purposes. In the world of IT, machine data can, for example, help find problems and also show whether systems are operating within typical ranges of performance. In the world of business, machine data can track consumer behavior and help segment consumers for targeted marketing messages. To help you get a better idea of the nature of machine data, this appendix briefly describes some of the different types you may encounter.
Application Logs
Most homegrown and packaged applications write local log files, often by logging services built into middlewareWebLogic, WebSphere, JBoss, .NET, PHP, and others. Log files are critical for day-to-day debugging of production applications by developers and application support. Theyre also often the best way to report on business and user activity and to detect fraud because they have all the details of transactions. When developers put timing information into their log events, log files can also be used to monitor and report on application performance.
of an error. Web logs are fairly standard and well structured. The main challenge is in dealing with them is their sheer volume, as busy websites typically experience billions of hits a day as the norm.
Clickstream Data
Use of a web page on a website is captured in clickstream data. This provides insight into what a user is doing and is useful for usability analysis, marketing, and general research. Formats for this data are nonstandard, and actions can be logged in multiple places, such as the web server, routers, proxy servers, and ad servers. Monitoring tools often look at a partial view of the data from a specific source. Web analytics and data warehouse products sample the data, thereby missing a complete view of behavior and offering no real-time analysis.
127
Exploring Splunk
Message Queuing
Message queuing technologies such as TIBCO, JMS, and AquaLogic are used to pass data and tasks between service and application components on a publish/subscribe basis. Subscribing to these message queues is a good way to debug problems in complex applicationsyou can see exactly what the next component down the chain received from the prior component. Separately, message queues are increasingly being used as the backbone of logging architectures for applications.
Packet Data
Data generated by networks is processed using tools such as tcpdump and tcpflow, which generate pcaps data and other useful packet-level and session-level information. This information is necessary to handle performance degradation, timeouts, bottlenecks, or suspicious activity that indicates that the network may be compromised or the object of a remote attack.
Conguration Files
Theres no substitute for actual, active system configuration to understand how the infrastructure has been set up. Past configs are needed for debugging past failures that could recur. When configs change, its important to know what changed and when, whether the change was authorized, and whether a successful attacker compromised the system to backdoors, time bombs, or other latent threats.
tems, third party tools, and storage technologies provide different options for auditing read access to sensitive data at the file system level. This audit data is a vital data source for monitoring and investigating access to sensitive information.
129
Sensitive
Command names Command keywords
Insensitive
X X
Examples
TOP, top, sTaTs AS used by stats, rename, ; BY used by stats, chart, top, ; WITH used by replace error, ERROR, Error avg, AVG, Avg used by stats, chart, AND, OR, NOT (boolean operators) vs. and, or, not (literal keywords) host vs. HOST host=localhost, host=LOCALhost \d\d\d vs. \D\D\D error vs. ERROR
Search terms Statistical functions Boolean operators X (uppercase) Field names Field values Regular expressions
replace com-
X X
X X X X
mand
131
Prevalence 10964 4840 2045 1840 1416 1185 1127 730 534 505 487 467 451 438 437 384 373 314 307 280 260
133
Exploring Splunk
12 9 9 8
134
135
Events
An event is one line of data. Here is an event in a web activity log:
173.26.34.223 - - [01/Jul/2009:12:05:27 -0700] GET /trade/ app?action=logout HTTP/1.1 200 2953
More specifically, an event is a set of values associated with a timestamp. While many events are short and only take up a line or two, others can be long, such as a whole text document, a config file, or whole Java stack trace. Splunk uses line-breaking rules to determine how it breaks these events up for display in the search results.
137
Exploring Splunk
Hosts
A host is the name of the physical or virtual device from which an event originates. Hosts provide an easy way to find all data originating from a particular device.
Indexes
When you add data to Splunk, Splunk processes it, breaking the data into individual events, timestamps the events, and stores them in an index so that the data can be searched and analyzed later. By default, data you feed to Splunk is stored in the main index, but you can create and specify other indexes for Splunk to use for different data inputs.
Fields
Fields are searchable name/value pairings in event data. As Splunk processes events at index time and search time, it automatically extracts fields. At index time, Splunk extracts a small set of default fields for each event, including host, source, and sourcetype. At search time, Splunk extracts what can be a wide range of fields from the event data, including user-defined patterns and obvious field name/value pairs such as userid=jdoe.
Tags
Tags are aliases to field values. For example, if two host names refer to the same computer, you could give both host values the same tag (for example, hal9000). When you search for tag=hal9000, Splunk returns events involving both host name values.
138
Event Types
Event types are dynamic tags attached to an event, if it matches the search definition of the event type. For example, if you define an event type called problem with a search definition of error OR warn OR fatal OR fail, whenever a search result contains error, warn, fatal, or fail, the event has an eventtype field/value with eventtype=problem. If you were searching for login, the logins with problems would be annotated with eventtype=problem. Event types are cross-referenced searches that categorize events at search time.
Apps
Apps are collections of Splunk configurations, objects, and code. Apps allow you to build different environments that sit on top of Splunk. You can have one app for troubleshooting email servers, one app for web analysis, and so on.
Permissions/Users/Roles
Saved Splunk objects, such as savedsearches, eventtypes, reports, and tags, enrich your data, making it easier to search and understand. These objects have permissions and can be kept private or shared with other users by roles (such as admin, power, or user). A role is a set of capabilities that you define, such as whether a particular role is allowed to add data or edit a report. Splunk with a free license does not support user authentication.
Transactions
A transaction is a set of events grouped into one for easier analysis. For example, because a customer shopping online generates multiple web access events with the same SessionID, it may be convenient to group those events into one transaction. With one transaction event, its easier to generate statistics such as how long shoppers shopped, how many items they bought, which shoppers bought items and then returned them, and so on.
139
Exploring Splunk
Forwarder/Indexer
A forwarder is a version of Splunk that allows you to send data to a central Splunk indexer or group of indexers. An indexer provides indexing capability for local and remote data.
SPL
A search is a series of commands and arguments, chained together with pipe character (|) that takes the output of one command and feeds it into the next command.
search-args | cmd1 cmd-args | cmd2 cmd-args | ...
Search commands are used to take indexed data and filter unwanted information, extract more information, calculate values, transform them, and statistically analyze results. The search results retrieved from the index can be thought of as a dynamically created table. Each search command redefines the shape of that table. Each indexed event is a row, with
140
columns for each field value. Columns include basic information about the data and data dynamically extracted at search-time. At the head of each search is an implied search-the-index-for-events command, which can be used to search for keywords (e.g., error), boolean expressions (e.g., (error OR failure) NOT success), phrases (e.g., database error), wildcards (e.g., fail* matches fail, fails, and failure), field values (e.g., code=404), inequality (e.g., code!=404 or code>200), a field having any value or no value (e.g., code=* or NOT code=*). For example, the search:
sourcetype=access_combined error | top 10 uri
retrieves indexed access_combined events from disk that contain the term error (ANDs are implied between search terms), and then for those events, reports the top 10 most common URI values.
Subsearches
A subsearch is an argument to a command that runs its own search, returning those results to the parent command as the argument value. Subsearches are enclosed in square brackets. For example, this command finds all syslog events from the user with the last login error:
sourcetype=syslog [search login error | return user]
Note that the subsearch returns one user value because by default the return command returns one value, although there are options to return more (e.g., | return 5 user).
For example, error earliest=-1d@d latest=-1h@h retrieves events containing error that from yesterday (snapped to midnight) to the last hour (snapped to the hour). Time Units: Specified as second (s), minute (m), hour (h), day (d), week (w), month (mon), quarter(q), or year (y). The preceding value defaults to 1 (i.e., m is the same as 1m).
141
Exploring Splunk
Snapping: Indicates the nearest or latest time to which your time amount rounds down. Snapping rounds down to the most recent time that is not after the specified time. For example, if its 11:59:00 and you snap to hours (@h), you snap to 11:00, not 12:00. You can snap to a day of the week, too; use @w0 for Sunday, @w1 for Monday, and so on.
Returns results in a tabular output for (time series) charting. Removes subsequent results that match. Calculates an expression. (See EVAL FUNCTIONS table.) Removes fields from search results. Returns the first/last N results. Adds field values from an external source. Renames a specified field; wildcards can be used to specify multiple fields. Replaces values of specified fields with a specified new value. Specifies regular expression to use to extract fields. Filters results to those that match the search expression. Sorts search results by the specified fields. Provides statistics, grouped optionally by fields. Displays the most/least common values of a field. Groups search results into transactions.
Optimizing Searches
The key to fast searching is to limit the data to read from disk to an absolute minimum and then to filter that data as early as possible in the search so that processing is done on the smallest amount of data. Partition data into separate indexes if youll rarely perform searches across multiple types of data. For example, put web data in one index and firewall data in another.
142
More tips: Search as specifically as you can (fatal_error, not *error*). Limit the time range (e.g., -1h not -1w). Filter out unneeded fields as soon as possible. Filter out results as soon as possible before calculations. For report generating searches, use the Advanced Charting view, and not the Timeline view, which calculates timelines. Turn off the Field Discovery switch when not needed. Use summary indexes to precalculate commonly used values. Make sure your disk I/O is the fastest you have available.
SEARCH EXAMPLES
Filter Results
Filter results to only include those with fail in their raw text and status=0. Remove duplicates of results with the same host value. Keep only search results whose _raw field contains IP addresses in the nonroutable class A (10.0.0.0/8). Cluster results together, sort by their cluster_count values, and then return the 20 largest clusters (in data size).
| search fail status=0
Group Results
| cluster t=0.9 showcount=true | sort limit=20 -cluster_count
Group results that have the same host | transaction host cookie and cookie, occur within 30 seconds maxspan=30s maxpause=5s of each other, and do not have a pause greater than 5 seconds between each event into a transaction. Group results with the same IP address (clientip) and where the first result contains signon and the last result contains purchase.
| transaction clientip startswith="signon" endswith="purchase"
143
Exploring Splunk
Order Results
Return the first 20 results. Reverse the order of a result set. Sort results by ip value (in ascending order) and then by url value (in descending order). Return the last 20 results (in reverse order). Return events with uncommon values.
| head 20 | reverse | sort ip, -url
| tail 20
Reporting
| anomalousvalue action=filter pthresh=0.02
| chart max(delay) by size Return the maximum "delay" by "size", where "size" is broken down bins=10 into a maximum of 10 equal sized buckets.
Return max(delay) for each value of foo split by the value of bar. Return max(delay) for each value of foo. Remove all outlying numerical values. Remove duplicates of results with the same host value and return the total count of the remaining results. Return the average for each hour of any unique field that ends with the string lay (such as delay, xdelay, and relay). Calculate the average value of CPU each minute for each host. Create a timechart of the count of from web sources by host. Return the 20 most common values of the url field. Return the least common values of the url field.
| chart max(delay) over foo by bar | chart max(delay) over foo | outlier | stats dc(host)
| timechart span=1m avg(CPU) by host | timechart count by host | top limit=20 url | rare url
144
Add Fields
Set velocity to distance / time. Extract from and to fields using regular expressions. If a raw event contains From: Susan To: David, then from=Susan and to=David. Save the running total of count in a field called total_count. For each event where count exists, compute the difference between count and its previous value and store the result in countdiff. Keep the host and ip fields, and display them in the order: host, ip. Remove the host and ip fields. Keep the host and ip fields, and display them in the order: host, ip. Remove the host and ip fields. Combine the multiple values of the recipients field into one value. Separate the values of the recipients field into multiple field values, displaying the top recipients. Create new results for each value of the multivalue field recipients. Combine each result that is identical except for its RecordNumber, setting RecordNumber to a multivalued field with all the varying values. Find the number of recipient values. Find the first email address in the recipient field.
| eval velocity=distance/ time | rex field=_raw "From: (?<from>.*) To: (?<to>.*)"
Filter Fields
| fields + host, ip | fields - host, ip
Modify Fields
| fields + host, ip | fields - host, ip
Multivalued Fields
| nomv recipients | makemv delim="," recipients | top recipients | mvexpand recipients | fields EventCode, Category, RecordNumber | mvcombine delim=, RecordNumber | eval to_count = mvcount(recipients) | eval recipient_first = mvindex(recipient,0)
145
Exploring Splunk
Find all recipient values that end in | eval netorg_recipients = mvfilter(match(recipient, .net or .org Find the combination of the values of foo, "bar", and the values of baz. Find the index of the first recipient value that matches "\.org$" Look up the value of each event's user field in the lookup table usertogroup, setting the events group field. Write the search results to the lookup file users.csv. Read in the lookup file users.csv as search results.
| eval newval = mvappend(foo, "bar", baz) | eval orgindex = mvfind(recipient, "\.org$")
Lookup Tables
| lookup usertogroup user output group
EVAL FUNCTIONS
The eval command calculates an expression and puts the resulting value into a field (e.g., ...| eval force = mass * acceleration). The following table lists the functions eval understands, in addition to basic arithmetic operators (+ - * / %), string concatenation (e.g., ...| eval name = last . , . last), and Boolean operations (AND OR NOT XOR < > <= >= != = == LIKE).
Eval Functions Table
Function
abs(X) case(X,"Y",)
Description
Returns the absolute value of X. Takes pairs of arguments X and Y, where X arguments are Boolean expressions that, when evaluated to TRUE, return the corresponding Y argument. Ceiling of a number X. Identifies IP addresses that belong to a subnet.
Examples
abs(number) case(error == 404, "Not found", error == 500,"Internal Server Error", error == 200, "OK")
ceil(X) cidrmatch("X",Y)
146
coalesce(X,)
Returns the first value that is not null. Evaluates an expression X using double precision floating point arithmetic. Returns eX. Returns the floor of a number X. If X evaluates to TRUE, the result is the second argument Y. If X evaluates to FALSE, the result evaluates to the third argument Z. Returns TRUE if X is Boolean. Returns TRUE if X is an integer. Returns TRUE if X is not NULL. Returns TRUE if X is NULL. Returns TRUE if X is a number. Returns TRUE if X is a string.
exact(X)
This function returns the len(field) character length of a string X. Returns TRUE if and only if X is like the SQLite pattern in Y. Returns the natural log of X. Returns the log of the first argument X using the second argument Y as the base. Y defaults to 10.
like(field, "foo%")
like(X,"Y")
ln(X) log(X,Y)
ln(bytes) log(number,2)
147
Exploring Splunk
lower(X) ltrim(X,Y)
lower(username)
Returns X with the char- ltrim(" ZZZabcZZ ", acters in Y trimmed from " Z") the left side. Y defaults to spaces and tabs. Returns True, if X match- match(field, "^\d{1,3}\.\d$") es the regex pattern Y. Returns the greater of the two values. Returns the MD5 hash of string value X. Returns the min. Returns the number of values of X. Filters a multivalued field based on the Boolean expression X.
max(delay, mydelay) md5(field) min(delay, mydelay) mvcount(multifield) mvfilter(match(emai l, "net$"))
mvindex(X,Y,Z)
mvindex( multiReturns a subset of the multivalued field X from field, 2) start position (zerobased) Y to Z (optional). mvjoin(foo, ";") Given a multivalued field X and string delimiter Y, joins the individual values of X using Y.
mvjoin(X,Y)
now()
Returns the current time, represented in Unix time. Takes no arguments and returns NULL. Given two arguments, fields X and Y, returns X if the arguments are different; returns NULL, otherwise. Returns the constant pi. Returns XY.
now()
null() nullif(X,Y)
pi() pow(X,Y)
pi() pow(2,10)
148
random()
relative_time(X,Y)
Given epochtime time X and relative time specifier Y, returns the epochtime value of Y applied to X. Returns a string formed by substituting string Z for every occurrence of regex string Y in string X.
relative_ time(now(),"-1d@d")
replace(X,Y,Z)
Returns date with the month and day numbers switched, so if the input is 1/12/2009 the return value is 12/1/2009: replace(date, "^(\d{1,2})/ (\d{1,2})/", "\2/\1/") round(3.5)
round(X,Y)
Returns X rounded to the amount of decimal places specified by Y. The default is to round to an integer.
rtrim(X,Y)
Returns X with the char- rtrim(" ZZZZabcZZ acters in Y trimmed from ", " Z") the right side. If Y is not specified, spaces and tabs are trimmed. Returns true if the event matches the search string X. Returns X as a multivalued field, split by delimiter Y. Returns the square root of X. Returns epochtime value X rendered using the format specified by Y.
searchmatch("foo AND bar") split(foo, ";")
searchmatch(X)
split(X,"Y")
sqrt(X) strftime(X,Y)
149
Exploring Splunk
strptime(X,Y)
Given a time represented by a string X, returns value parsed from format Y. Returns a substring field X from start position (1-based) Y for Z (optional) characters. Returns the wall-clock time with microsecond resolution.
strptime(timeStr, "%H:%M")
substr(X,Y,Z)
time()
tonumber(X,Y)
Converts input string X tonumber("0A4",16) to a number, where Y (optional, defaults to 10) defines the base of the number to convert to. Returns a field value of X as a string. If X is a number, it reformats it as a string; if a Boolean value, either "True" or "False". If X is a number, the second argument Y is optional and can either be "hex" (convert X to hexadecimal), "commas" (formats X with commas and 2 decimal places), or "duration" (converts seconds X to readable time format HH:MM:SS). This example returns foo=615 and foo2=00:10:15:
tostring(X,Y)
trim(X,Y)
Returns X with the char- trim(" ZZZZabcZZ ", " acters in Y trimmed from Z") both sides. If Y is not specified, spaces and tabs are trimmed. Returns a string representation of its type. This example returns: "NumberStringBoolInvalid":
typeof(12)+ typeof(string)+ typeof(1==2)+ typeof(badfield)
typeof(X)
150
upper(X) urldecode(X)
Returns the uppercase of X. Returns the URL X decoded. Given pairs of arguments, Boolean expressions X and strings Y, returns the string Y corresponding to the first expression X that evaluates to False and defaults to NULL if all are True.
validate(X,Y,)
validate(isint(port), "ERROR: Port is not an integer", port >= 1 AND port <= 65535, "ERROR: Port is out of range")
Description
Returns the average of the values of field X. Returns the number of occurrences of the field X. To indicate a field value to match, format X as eval(field="value"). Returns the count of distinct values of the field X. Returns the first seen value of the field X. In general, the first seen value of the field is the chronologically most recent instance of field. Returns the last seen value of the field X. Returns the list of all values of the field X as a multivalue entry. The order of the values reflects the order of input events. Returns the maximum value of the field X. If the values of X are non-numeric, the max is found from lexicographic ordering. Returns the middle-most value of the field X. Returns the minimum value of the field X. If the values of X are non-numeric, the min is found from lexicographic ordering. Returns the most frequent value of the field X.
151
last(X) list(X)
max(X)
median(X) min(X)
mode(X)
Exploring Splunk
Returns the X-th percentile value of the field Y. For example, perc5(total) returns the 5th percentile value of a field total.. Returns the difference between the max and min values of the field X. Returns the sample standard deviation of the field X. Returns the population standard deviation of the field X. Returns the sum of the values of the field X. Returns the sum of the squares of the values of the field X. Returns the list of all distinct values of the field X as a multivalue entry. The order of the values is lexicographical. Returns the sample variance of the field X.
REGULAR EXPRESSIONS
Regular expressions are useful in many areas, including search commands regex and rex; eval functions match() and replace(); and in field extraction. REGEX
\s \S \d \D \w
NOTE
white space not white space Digit not digit word character (letter, number, or _ ) not a word character any included character
EXAMPLE
\d\s\d \d\S\d \d\d\d-\d\d-\d\d\ d\d \D\D\D \w\w\w
EXPLANATION
digit space digit digit nonwhitespace digit SSN three non-digits three word chars
\W [...]
\W\W\W [a-z0-9#]
three non-word chars any char that is a thru z, 0 thru 9, or # any char but x, y, or z zero or more words chars integer
[^...] * +
152
\d\d\d-?\d\d-?\d\ d\d\d \w|\d (?P<ssn>\d\d\d\d\d\-\d\d\d\d) (?:\w|\d)|(?:\ d|\w) ^\d+ \d+$ \d{3,5} \[ (?=\D)error
SSN with dashes being optional word or digit character pull out a SSN and assign to 'ssn' field word-char then digit OR digit then word-char line begins with at least one digit line ends with at least one digit between 3-5 digits escape the [ char
error must be
preceded by a non-digit
(?! ...)
negative lookahead
(?!\d)error
error cannot be
preceded by a digit
24 hour (leading zeros) (00 to 23) 12 hour (leading zeros) (01 to 12) Minute (00 to 59) Second (00 to 61) subseconds with width (%3N = millisecs, %6N = microsecs, %9N = nanosecs) AM or PM
Time zone (GMT)
153
Exploring Splunk
DAYS
Day of month (leading zeros) (01 to 31) Day of year (001 to 366) Weekday (0 to 6) Abbreviated weekday (Sun) Weekday (Sunday) Abbreviated month name (Jan) Month name (January) Month number (01 to 12) Year without century (00 to 99) Year (2008) 1998-12-31 98-12-31 Jan 24, 2003 January 24, 2003 q|25 Feb '03 = 2003-02-25|
154