Splunk Validated Architectures
Splunk Validated Architectures
Splunk Validated Architectures
Table of Contents
Introduction ........................................................................................................................... 1
Document Structure ................................................................................................................ 2
Reasons to Use Splunk Validated Architectures ................................................................... 2
Pillars of Splunk Validated Architectures ............................................................................... 3
What to Expect from Splunk Validated Architectures ............................................................ 3
Roles and Responsibilities ..................................................................................................... 4
Overview of the Splunk Validated Architectures Selection Process ..................................... 4
Step 1a: Define Your Requirements for Indexing and Search .............................................. 6
Step 2a: Choose a Topology for Indexing and Search ........................................................ 11
Step 1b: Define Your Requirements for Data Collection ..................................................... 22
Step 2b: Select Your Data Collection Components ............................................................. 26
Step 3: Apply Design Principles and Best Practices ........................................................... 38
Summary & Next Steps ...................................................................................................... 45
Next Steps ............................................................................................................................ 45
Appendix.............................................................................................................................. 46
Appendix "A": SVA Pillars Explained ................................................................................... 46
Appendix "B": Topology Components .................................................................................. 47
Introduction
Splunk Validated Architectures (SVAs) are proven reference architectures for stable, efficient, and
repeatable Splunk deployments. Many of Splunk's existing customers have experienced rapid
adoption and expansion, leading to certain challenges as they attempt to scale. At the same time,
new Splunk customers are increasingly looking for guidelines and certified architectures to ensure
that their initial deployment is built on a solid foundation. SVAs have been developed to help our
customers with these growing needs.
Whether you are a new or existing Splunk customer, SVAs will help you build an environment that is
easier to maintain and simpler to troubleshoot. SVAs are designed to provide you with the best
possible results while minimizing your total cost of ownership. Additionally, your entire Splunk
foundation will be based on a repeatable architecture which will allow you to scale your deployment
as your needs evolve over time.
SVAs offer topology options that consider a wide array of organizational requirements, so you can
easily understand and find a topology that is right for your requirements. The Splunk Validated
Architectures selection process will help you match your specific requirements to the topology that
best meets your organization's needs. If you are new to Splunk, we recommend implementing a
Validated Architecture for your initial deployment. If you are an existing customer, we recommend
that you explore the option of aligning with a Validated Architecture topology. Unless you have
unique requirements that make it necessary to build a custom architecture, it is very likely that a
Validated Architecture will fulfill your requirements while remaining cost effective.
This whitepaper will provide you with an overview of SVAs. Within this whitepaper you will find the
resources you need to go through the SVA selection process, including the requirements
questionnaire, deployment topology diagrams, design principles, and general guidelines.
If you need assistance implementing a Splunk Validated Architecture, contact Splunk Professional
Services (https://www.splunk.com/en_us/support-and-services/splunk-services.html).
1
Splunk Validated Architectures
Document Structure
SVAs are broken into three major content areas:
1. Indexing and Search Topologies
2. Data Collection Architecture components
3. Design Principles and Best Practices
Indexing and search covers the architecture tiers that provide the core indexing and search
capabilities of a Splunk deployment. The data collection component section guides you in choosing
the right data collection mechanism for your your requirements.
Design Principles and Best Practices apply to your architecture as a whole and will help you make
the correct choices when working out the details of your deployment.
2
Splunk Validated Architectures
The system is The system can The system is The system is The system is
continuously maintain an designed to scale designed to centrally operable
operational and optimal level of on all tiers, protect data, and manageable
able to recover service under allowing you to configurations, across all tiers.
from planned and varying usage handle and assets while
unplanned patterns. increased continuing to
outages or workloads deliver value.
disruptions. effectively .
These pillars are in direct support of the Platform Management & Support Service in the Splunk
Center Of Excellence model.
Clustered and non- Implementation choices (OS, baremetal vs. virtual vs. Cloud etc.).
clustered deployment
options. Deployment sizing.
3
Splunk Validated Architectures
Role Description
Enterprise Architects Responsible for architecting Splunk deployments to meet enterprise
needs.
Consultants Responsible for providing services for Splunk architecture, design, and
implementation.
Splunk Engineers Responsible for managing the Splunk lifecycle.
Managed Service Entities that deploy and run Splunk as a service for customers.
Providers
4
Splunk Validated Architectures
Step 2: Choose a Choose a topology that • You'll choose a topology that best meets
Topology for: meets identified your requirements.
requirements.
a) Indexing and • Keep things simple and in accordance
Search with the SVA, so you can appreciate the
b) each data easier path to scalability.
collection mechanism For diagrams and descriptions of topology
options, refer to Step 2 below.
Step 3: Apply Prioritize your design • Each design principle reinforces one or
Design Principles principles and review tier- more of the pillars of Splunk validated
and Best Practices specific implementation best architectures.
practices.
• You'll prioritize design principles in
accordance with the needs of your
organization.
• Tier-specific recommendations will guide
your topology implementation.
For a breakdown of design principles, refer
to Step 3 below.
5
Splunk Validated Architectures
Topology Categories
The following is a key to SVA topology categories. These categories are used in the questionnaire
below. You will also find references to these categories in the next steps of the SVA selection
process.
6
Splunk Validated Architectures
7
Splunk Validated Architectures
8
Splunk Validated Architectures
8 Are you intending to Please ensure you ES requires a dedicated D/C/M +10
deploy the Splunk App read and understand Search Head
for Enterprise Security the specific limitations environment (either
(ES)? the Splunk App for standalone or
Enterprise Security is clustered).
subject to as
documented with
each topology.
10 Do you have highly Highly sensitive log Multiple, independent Custom Custom
restrictive security data may not be indexing environments
policies that prevent allowed to be co- are needed, potentially
co-location of specific located with lower- with a shared, hybrid
log data sources on risk datasets on the search tier. This is
shared same physical beyond the scope of
servers/indexers? system/within the SVAs and requires
same network zone custom architecture
based on corporate development.
policies.
9
Splunk Validated Architectures
Example #2
Now, let's say you answered "yes" only to question #1. You will end up with a topology category of
"S1", indicating a single-server Splunk deployment as your ideal topology.
10
Splunk Validated Architectures
11
Splunk Validated Architectures
This deployment topology provides you with a very cost-effective • No High Availability for
solution if your environment meets all of the following criteria: a) you do Search/Indexing
not have any requirements to provide high-availability or automatic
disaster recovery for your Splunk deployment, b) your daily data ingest
is under 300GB/day, and c) you have a small number of users with • Scalability limited by
non-critical search use cases. hardware capacity
This topology is typically used for smaller, non business-critical use- (straightforward
cases (often departmental in nature). Appropriate use cases include migration path to a
data onboarding test environments, small DevOps use cases, distributed deployment)
application test and integration environments, and similar scenarios.
The primary benefits of this topology include easy manageability, good
search performance for smaller data volumes, and a fixed TCO.
12
Splunk Validated Architectures
You need to move to a distributed topology in either of the following • No High Availability for
situations: a) your daily data volume to be sent to Splunk exceeds Search Tier
the capacity of a single-server deployment, or b) you want/need to
provide highly available data ingest. Deploying multiple, • Limited High Availability
independent indexers will allow you to scale your indexing capacity for indexing tier, node
linearly and implicitly increase the availability for data ingest. failure may cause
incomplete search results
The TCO will increase in a predictable and linear fashion as you for historic searches
add indexer nodes. The recommended introduction of the
Monitoring Console (MC) component allows you to monitor the
13
Splunk Validated Architectures
14
Splunk Validated Architectures
This topology introduces indexer clustering in conjunction with an • No High Availability for
appropriately configured data replication policy. This provides high- Search Tier
availability of data in case of indexer peer node failure. However, you
should be aware that this applies only to the indexing tier and does • Total number of unique
not protect against search head failure. buckets in indexer
cluster limited to 5MM
Note for ES customers: If your category code is C11 (i.e. you intend (V6.6+), 15MM total
to deploy the Splunk App for Enterprise Security), a single dedicated buckets
15
Splunk Validated Architectures
search head is required to deploy the app (this is not pictured in the • No automatic DR
topology diagram). capability in case of
data center outage
This topology requires an additional Splunk component named the
Cluster Master (CM). The CM is responsible for coordination and
enforcement of the configured data replication policy. The CM also
serves as the authoritative source for available cluster peers
(indexers). Search Head configuration is simplified by configuring the
CM instead of individual search peers.
You have the option of configuring the forwarding tier to discover
available indexers via the CM. This simplifies the management of the
forwarding tier.
Be aware that data is replicated within the cluster in a non-
deterministic way. You will not have control over where requested
copies of each event are stored. Additionally, while scalability is
linear, there are limitations with respect to total cluster size (~50PB of
searchable data under ideal conditions).
We recommend deployment of the Monitoring Console (MC) to
monitor the health of your Splunk environment.
16
Splunk Validated Architectures
This topology adds horizontal scalability and removes the single point of • No DR capability in
failure from the search tier. A minimum of three search heads are case of data center
required to implement a SHC. outage
To manage the SHC configuration, an additional Splunk component • ES requires dedicated
called the Search Head Cluster Deployer is required for each SHC. This SH/SHC
component is necessary in order to deploy changes to configuration
files in the cluster. The Search Head Cluster Deployer has no HA • Managing an ES
requirements (no runtime role). deployment on SHC
supported, but
17
Splunk Validated Architectures
The SHC provides the mechanism to increase available search capacity challenging (Involve
beyond what a single search head can provide. Additionally, the SHC PS)
allows for scheduled search workload distribution across the cluster.
The SHC also provides optimal user failover in case of a search head • SHC cannot have
failure. more than 100 nodes
18
Splunk Validated Architectures
19
Splunk Validated Architectures
This topology adds horizontal scalability and removes the single point of • No search artifact
failure from the search tier in each site. A minimum of three search replication across
heads are required to implement a SHC (per site). sites, SHCs are
standalone
To manage the SHC configuration, an additional Splunk component
called the Search Head Cluster Deployer is required for each SHC. • Cross-site latency for
This component is necessary in order to deploy changes to index replication must
configuration files in the cluster. The Search Head Cluster Deployer has be within documented
no HA requirements (no runtime role). limits
The SHC provides the following benefits: a) increased available search • SHC cannot have
capacity beyond what a single search head can provide, b) scheduled more than 100 nodes
search workload distribution across the cluster, and c) optimal user
failover in case of a search head failure.
A network load-balancer that supports sticky sessions is required in
front of the SHC members in each site to ensure proper load balancing
of users across the cluster.
Note for ES customers: If your category code is M13 (i.e. you intend
to deploy the Splunk App for Enterprise Security), a single dedicated
search head cluster contained within a site is required to deploy the app
(this is not explicitly pictured in the topology diagram). To be able to
recover an ES SH environment from a site failure, 3rd party technology
can be used to perform a failover of the search head instances, or a
20
Splunk Validated Architectures
This is the most complex validated architecture, designed for • Network latency across
deployments that have strict requirements around high-availability sites must be within
and disaster recovery. We strongly recommend involving Splunk documented limits
Professional Services for proper deployment. When properly
deployed, this topology provides continuous operation of your Splunk • Failover of the SHC may
infrastructure for data collection, indexing, and search. require manual steps if
only a minority of cluster
This topology involves implementation of a "stretched" search head members survive
cluster that spans one or more sites. This provides optimal failover
for users in case of a search node or data center failure. Search
artifacts and other runtime knowledge objects are replicated in the
SHC. Careful configuration is required to ensure that replication will
happen across sites, as the SHC itself is not site-aware (i.e. artifact
replication is non-deterministic).
21
Splunk Validated Architectures
22
Splunk Validated Architectures
Data is ingested properly The importance of ideal event distribution across indexers cannot be
(timestamps, line breaking, overstated. The indexing tier works most efficiently when all available
truncation) indexers are equally utilized. This is true for both data ingest as well
as search performance. A single indexer that handles significantly
more data ingest compared to peers can negatively impact search
response times. For indexers with limited local disk storage, uneven
event distribution may also cause data to be prematurely aged out
before meeting the configured data retention policy.
Data is optimally distributed If data is not ingested properly because event timestamps and line
across available indexers breaking are not correctly configured, searching this data will become
very difficult. This is because event boundaries have to be enforced at
search time. Incorrect or missing timestamp extraction configurations
can cause unwanted implicit timestamp assignment. This will confuse
your users and make getting value out of your data much more difficult
than it needs to be.
All data reaches the Any log data that is collected for the purpose of reliable analytics
indexing tier reliably and needs to be complete and valid, such that searches performed on the
without loss data provide valid and accurate results.
All data reaches the Delays in data ingest will increase the time between a potentially
indexing tier with minimum critical - event occurring and the ability to search for and react to it.
latency Minimal ingest latency is often crucial for monitoring use cases that
trigger alerts to staff or incur automated action.
Data is secured while in If the data is either sensitive or has to be protected while being sent
transit over non-trusted networks, encryption of data may be required to
prevent unauthorized third-party interception. Generally, we
recommend all connections between Splunk components to be SSL
enabled.
Network resource use is The network resource impact of log data collection must be minimized
minimized so as not to impact other business critical network traffic. For leased-
line networks, minimizing network utilization also contributes to a
lower TCO of your deployment.
Authenticate/authorize data To prevent rogue data sources from affecting your indexing
sources environment, consider implementing connection
authentication/authorization. This may be covered by using network
controls, or by employing application-level mechanisms (e.g.,
SSL/TLS).
Because of its vital role for your deployment, the guidance in this document focuses on
architectures that support ideal event distribution. When a Splunk environment does not provide
expected search performance, it is in almost all cases either caused by not meeting minimum
storage performance requirements and/or uneven event distribution that limits exploiting search
parallelization.
Now that you understand the most critical architectural considerations, let's find out what specific
data collection requirements you need to fulfill.
23
Splunk Validated Architectures
1 Do you need to This a core requirement for You will need to install UF
monitor local files or almost all Splunk deployment the universal forwarder
execute data scenarios. on your endpoints and
collection scripts on manage its configuration
endpoints? centrally.
2 Do you need to Syslog is a ubiquitous transport You will need a syslog SYSLOG
collect log data sent protocol often used by purpose- server infrastructure that
HEC
via syslog from built devices that do not allow serves as the collection
devices that you installation of custom software. point.
cannot install
software on
(appliances, network
switches, etc.)?
3 Do you need to Writing to log files on endpoints You will need to use the HEC
support collection of requires providing disk space Splunk HTTP event
log data from and management of these log collector (HEC) or
applications that log files (rotation, deletion, etc.). another technology that
to an API versus Some customers want to get serves as a log sink.
writing to local disks? away from this model and log
directly to Splunk using
available logging libraries.
4 Do you need to Many enterprises have adopted You will need an KAFKA
collect data from a an event hub model where a integration between the
KINESIS
streaming event data centralized streaming data streaming data provider
provider? platform (like AWS Kinesis or and Splunk. HEC
Kafka) serves as the message
transport between log data
producers and consumers.
6 Do you need to Splunk provides various Your data collection tier DCN
collect log data using modular inputs that allow will require one or more
programmatic execution of scripts against data collection nodes
means, e.g., by APIs for a wide variety of data (DCN) implemented with
24
Splunk Validated Architectures
7 Do you need to route Some use cases require data Depending on the use HF
(a subset of) data to that is indexed in Splunk to also case specifics, you may
other systems be forwarded to another need an intermediary
besides — and in system. Often, the forwarded forwarding tier built with
addition to — data consists of only a subset a Heavy Forwarder to
Splunk? of the source data, or the data support event-based
has to be modified before being routing and filtering.
forwarded. Alternatively, you can
forward data post-
indexing by using the
cefout command
contained in the Splunk
App for CEF.
10 Do you need to Statsd and collectd are Splunk supports specific METRICS
capture metrics ubiquitous technologies in use index types and
using statsd or to gather metrics from host collection methods to
collectd? systems and applications. feed those indices using
either UF, HF or HEC.
25
Splunk Validated Architectures
26
Splunk Validated Architectures
The diagram above shows the Deployment Server (DS) in the management tier, which is used to
manage the configurations on data collection components. Also, the License Master (LM) is shown
here since data collection nodes require access to the LM to enable Splunk Enterprise features.
The cluster master (CM), if available, can be used by forwarders for indexer discovery, removing
the need to manage available indexers in the forwarder output configuration.
In the above diagram, AutoLB represents the Splunk built-in auto-load balancing mechanism. This
mechanism is used to ensure proper event distribution for data sent using the Splunk proprietary
S2S protocol (default port 9997). Note: Using an external network load-balancer for S2S traffic is
currently not supported and not recommended.
To load-balance traffic from data sources that communicate with an industry-standard protocol (like
HTTP or syslog), a network load balancer is used to ensure even load and event distribution across
indexers in the indexing tier.
27
Splunk Validated Architectures
• Throttling capabilities.
• Built-in, load-balancing across available indexers.
• Optional network encryption using SSL/TLS.
• Data compression (use only without SSL/TLS).
• Multiple input methods (files, Windows Event logs, network inputs, scripted inputs).
• Limited event filtering capabilities (Windows event logs only).
• Parallel ingestion pipeline support to increase throughput/reduce latency.
With few exceptions for well-structured data (json, csv, tsv), the UF does not parse log sources into
events, so it cannot perform any action that requires understanding of the format of the logs. It also
ships with a stripped down version of Python, which makes it incompatible with any modular input
apps that require a full Splunk stack tack to function.
It is normal for a large number of UFs (100s to 10,000s) to be deployed on endpoints and servers in
a Splunk environment and centrally managed, either with a Splunk deployment server, or a third-
party configuration management tool (like e.g. Puppet or Chef).
(HF) Heavy Forwarder
The heavyweight forwarder (HWF) is a full Splunk Enterprise deployment configured to act as a
forwarder with indexing disabled. A HWF generally performs no other Splunk roles. The key
difference between a UF and a HWF is that the HWF contains the full parsing pipeline, performing
the identical functions an indexer performs, without actually writing and indexing events on disk.
This enables the HWF to understand and act on individual events, for example to mask data or to
perform filtering and routing based on event data. Since it is a full Splunk Enterprise install, it can
host modular inputs that require a full Python stack to function properly for data collection or serve
as an endpoint for the Splunk HTTP event collector (HEC). The HWF performs the following
functions:
• Parses data into events.
• Filters and routes based on individual event data.
• Has a larger resource footprint than the UF.
• Has a larger network bandwidth footprint than the UF (~5x).
• GUI for management.
In general, HWFs are not installed on endpoints for the purpose of data collection. Instead, they are
used on standalone systems to implement data collection nodes (DCN) or intermediary forwarding
tiers. Use a HWF only when requirements to collect data from other systems cannot be met
with a UF.
Examples of such requirements include:
• Reading data from RDBMS for the purposes of ingesting it into Splunk (database inputs).
• Collecting data from systems that are reachable via an API (cloud services, VMWare
monitoring, proprietary systems, etc.).
• Providing a dedicated tier to host the HTTP event collector service .
• Implementing an intermediary forwarding tier that requires a parsing forwarder for
routing/filtering/masking.
28
Splunk Validated Architectures
29
Splunk Validated Architectures
The following diagram illustrates the two deployment options for HEC:
The management tier contains the license master (required by HF) as well as the deployment
server to manage the HTTP inputs on the listening components. Note: If the indexing tier is
clustered and receives HEC traffic directly, HEC configuration is managed via the cluster master
instead of the deployment server.
The decision for which deployment topology you choose depends largely on your specific needs. A
dedicated HEC listener tier introduces another architectural component into your deployment. On
the positive side, it can be scaled independently and provides a level of isolation from the indexing
tier from a management perspective. Also, since the dedicated HEC tier requires a HF, it will parse
all inbound traffic, taking that workload off of the indexers.
On the other hand, hosting the HEC listener directly on the indexers will likely ensure better event
distribution across the indexing tier, because HTTP is a well-understood protocol for all network
load balancers and the appropriate load balancing policy can help ensure that the least busy
indexers get served first.
In the spirit of deploying the simplest possible architecture that meets your requirements, we
recommend you consider hosting your HEC listener on the indexers, assuming you have sufficient
system capacity to do so. This decision can easily be reverted later if the need arises, simply by
deploying an appropriately sized and configured HF tier and changing the LB configuration to use
the HF's IP addresses instead of the indexers. That change should be transparent to client
applications.
Note: If you do require indexer acknowledgment for data sent via HEC, a dedicated HEC listener
tier is recommended to minimize duplicate messages caused by rolling indexer restarts.
30
Splunk Validated Architectures
Note: This HEC deployment architecture is providing the transport for some of the other data
collection components discussed later, specifically Syslog and metrics data collection.
In a scenario with a single intermediary forwarder, all endpoints connect to this single forwarder
(potentially thousands), and the intermediary forwarder in turn only connects to one indexer at any
given time. This is not an optimal scenario because the following consequences are likely to occur:
• A large data stream from many endpoints is funneled through a single pipe that exhausts your
system and network resources.
• Limited failover targets for the endpoints in case of IF failure (your outage risk is reverse
proportional to the number of IFs).
• Small number of indexers are served at any given point in time. Searches over short time
periods will not benefit from parallelization as much as they could otherwise.
31
Splunk Validated Architectures
Intermediary forwarders also add an additional architecture tier to your deployment which can
complicate management and troubleshooting and adds latency to your data ingest path. Try to
avoid using intermediary forwarding tiers unless this is the only option to meet your requirements.
You may consider using an intermediary tier if you have:
• Sensitive data needs to be obfuscated/removed before sending across the network to indexers.
An example is when you must use a public network.
• Strict security policies do not allow for direct connections between endpoints and indexers such
as multi-zone networks or cloud-based indexers.
• Bandwidth constraints between endpoints and indexers requiring a significant subset of events
to be filtered.
• Event-based routing to dynamic targets is requirements.
Consider sizing and configuration needs for any intermediary forwarding tier to ensure availability of
this tier, provide sufficient processing capacity to handle all traffic, and support good event
distribution across indexers. The IF tier has the following requirements:
• Sufficient number of data processing pipelines overall.
• Redundant IF infrastructure.
• Properly tuned Splunk load-balancing configuration. For example, autoLBVolume,
EVENT_BREAKER, EVENT_BREAKER_ENABLE, possibly forceTimeBasedAutoLB as
needed.
The general guideline suggests to have twice as many IF processing pipelines as indexers in the
indexing tier.
Note: A processing pipeline does not equate to a physical IF server. Provided sufficient system
resources. For example, CPU cores, memory and NIC bandwidth, are available, a single IF can be
configured with multiple processing pipelines.
If you need an IF tier (see questionnaire), default to using UF for the tier since they provide higher
throughput at a lower resource footprint for both the system and network. Use HF if you the UF
capabilities do not meet you requirements.
32
Splunk Validated Architectures
This architecture supports proper data onboarding in the same way a universal forwarder does on
any other endpoint. You can configure the SCD to identify multiple different log types and write out
log events in appropriate files and directories where a Splunk forward can pick them up. This also
adds a level of persistence to the syslog log stream by writing events to disk, which can limit
exposure to data loss for messages sent using the unreliable UDP as transport.
The diagram shows syslog sources sending data using TCP or UDP on port 514 to a load-balanced
pool of syslog servers. Multiple servers ensure HA for the collection tier and can prevent data loss
during maintenance operations. Each syslog server is configured to apply rules to the syslog stream
that result in syslog events being written to dedicated files/directories for each source type (firewall
events, OS syslog, network switches, IPS, etc.). The UF that is deployed to each server monitors
those files and forwards the data to the indexing tier for processing into the appropriate index.
Splunk AutoLB is used to distribute the data evenly across the available indexers.
The deployment server shown in the management tier can be used to centrally manage the UF
configuration
33
Splunk Validated Architectures
A benefit of this topology is it eliminates the need to deploy and configure UF/HF. The HTTP load
balancer serves the HEC listeners on the indexers (or a dedicated HEC listener tier) to ensure the
data being spread across the HEC endpoints evenly. Configure this load balancer with the "Least
Connections" policy.
34
Splunk Validated Architectures
The diagram shows Kafka Publishers sending messages to the Kafka bus. The tasks hosted in the
Kafka Connect cluster consume those messages via the Splunk Connect for Kafka and send the
data to the HEC listening service using a network load balancer. Again, the HEC listening service
can be either hosted directly on the indexers, or on a dedicated HEC listener tier. Please refer to
the HEC section for details. Management tier components are only required if a dedicated HF tier is
deployed to host HEC listeners.
35
Splunk Validated Architectures
36
Splunk Validated Architectures
The diagram shows AWS log sources being sent using a Kinesis stream to the Firehose, which —
with proper configuration — will send the data to the HEC listening service via an AWS ELB. Again,
the HEC listening service can be either hosted directly on the indexers, or on a dedicated HEC
listener tier. Please refer to the HEC section for details.
Management tier components shown are only required if a dedicated HF tier is deployed to host
HEC listeners.
Statsd currently supports UDP and TCP transport, which you can use as a direct input on a Splunk
Forwarder, or Indexer. However, it is not a best practice to send TCP/UDP traffic directly to
forwarders in production as the architecture is not resilient and prone to event loss (see Syslog
collection) caused by required Splunk forwarder restarts.
37
Splunk Validated Architectures
Deployment Tiers
SVA design principles cover all of the following deployment tiers:
Tier Definition
Search • Search heads
Indexing • Indexers
Collection • Forwarders
• Modular Inputs
• Network
• HEC (HTTP Event Collector)
• etc
Management / Utility • CM
• DS
• LM
• DMS
• SHC-D
38
Splunk Validated Architectures
MANAGEABILITY
PERFORMANCE
practices apply to you)
AVAILABILITY
SCALABILITY
SECURITY
1 Keep search tier close (in network terms)
to the indexing tier
Any network delays between search and
indexing tier will have direct impact on
search performance
39
Splunk Validated Architectures
40
Splunk Validated Architectures
MANAGEABILITY
PERFORMANCE
AVAILABILITY
SCALABILITY
practices apply to you)
SECURITY
1 Enable parallel pipelines on capable
servers to
Parallelization features enable
exploitation of available system
resources that would otherwise sit idle.
Note that I/O performance must be
adequate before enabling ingest
parallelization features.
41
Splunk Validated Architectures
42
Splunk Validated Architectures
MANAGEABILITY
PERFORMANCE
AVAILABILITY
SCALABILITY
practices apply to you)
SECURITY
1 Use UF to forward data whenever
possible. Use of the Heavy Forwarder
should be limited to the use cases that
require it.
Built-in autoLB, restart capable, centrally
configurable, small resource demand
43
Splunk Validated Architectures
MANAGEABILITY
PERFORMANCE
AVAILABILITY
SCALABILITY
practices apply to you)
SECURITY
1 Consider consolidating LM, CM, SHC-D and
MC on a single instance for small
environments
These server roles have very little resource
demands and are good candidates for
colocation. In larger indexer clusters, the CM
may require a dedicated server to efficiently
manage the cluster.
2 Consider a separate instance for DS for
medium to large deployments
Once a significant number of forwarders are
managed via the Deployment Server, the
resource needs will increase to where a
dedicated server is required to maintain the
service.
3 Consider multiple DSs behind LB for super
large deployments
Note: This may require help from Splunk
professional services to be setup and
configured properly
4 Determine whether DS
phoneHomeIntervalInSecs can be backed off
the 60 second default
A longer phone home interval will have
positive effect on DS scalability
5 Use dedicated/secured DS to avoid client
exploitation via app deployment
Anyone with access to the Deployment
Server can modify Splunk configuration
managed by that DS, including potentially
deploying malicious application to forwarder
endpoints. Securing this role appropriately is
prudent.
6 Use the Monitoring Console (MC) to monitor
the health of your deployment and alert on
health issues.
The monitoring console provides a pre-built,
splunk-specific set of monitoring solutions and
contains extensible platform alerts that can
notify you about degarding health of your
environment.
44
Splunk Validated Architectures
Next Steps
So, what comes after choosing a Validated Architecture? The next steps on your journey to a
working environment include:
Customizations
• Consider any necessary customizations your chosen topology may need to meet specific
requirements.
Deployment Model
• Decide on deployment model (bare metal, virtual, cloud).
System
• Select your technology (servers, storage, operating systems) according to Splunk system
requirements.
Sizing
• Gather all the relevant data you will need to size your deployment (data ingest, expected search
volume, data retention needs, replication, etc.) Splunk Storage Sizing (https://splunk-
sizing.appspot.com/) is one of the available tools.
Staffing
• Evaluate your staffing needs to implement and manage your deployment. This is an essential
part of building out a Splunk Center of Excellence.
We are here to assist you throughout the Validated Architectures process and with next steps.
Please feel free to engage your Splunk Account Team with any questions you might have. Your
Account Team will have access to the full suite of technical and architecture resources within
Splunk and will be happy to provide you with further information.
Happy Splunking!
45
Splunk Validated Architectures
Appendix
This section contains additional reference information used in the SVAs.
Availability The ability to be continuously operational and 1. Eliminate single points of failure
able to recover from planned and unplanned / Add redundancy
outages or disruptions. 2. Detect planned and unplanned
failures/outages
3. Tolerate planned/unplanned
outages, ideally automatically
4. Plan for rolling upgrades
Scalability The ability to ensure that the system is 1. Scale vertically and horizontally
designed to scale on all tiers and handle 2. Separate functional
increased workloads effectively. components that need to be
scaled individually
3. Minimize dependencies
between components
4. Design for known future growth
as early as possible
5. Introduce hierarchy in the
overall system design
Security The ability to ensure that the system is 1. Design for a secure system
designed to protect data as well as from the start
configurations/assets while continuing to 2. Employ state-of-the art
deliver value. protocols for all communications
46
Splunk Validated Architectures
47
Splunk Validated Architectures
Search Search Head The search head Search heads are dedicated
(SH) provides the UI for Splunk instances in distributed
Splunk users and deployments. Search heads
coordinates scheduled can be virtualized for easy
search activity. failure recovery, provided they
are deployed with appropriate
CPU and memory resources.
Data Forwarders General icon for any This includes universal and
Collection and other component involved in heavy forwarders, network data
data data collection. inputs and other forms of data
collection collection (HEC, Kafka, etc.)
components
48