Splunk 102 Flashcards

1
Q

What is a bucket in Splunk?

A

A file system directory containing a portion of index

A Splunk Enterprise index typically consists of many buckets, organized by age.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Whate are type of Splunk buckets?

A

Hot, warm, cold, frozen, thawed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Bucket’s “aging” process?

A

As buckets age, they “roll” from one state to the next. When data is first indexed, it gets written to a hot bucket. Hot buckets are buckets that actively being written to. An index can have several hot buckets open at a time. Hot buckets are also searchable.

When certain conditions are met (for example, the hot bucket reaches a certain size or the indexer gets restarted), the hot bucket becomes a warm bucket (“rolls to warm”), and a new hot bucket is created in its place. The warm bucket is renamed but it remains in the same location as when it was a hot bucket. Warm buckets are searchable, but they are not actively written to. There can be a large number of warm buckets.

Once further conditions are met (for example, the index reaches some maximum number of warm buckets), the indexer begins to roll the warm buckets to cold, based on their age. It always selects the oldest warm bucket to roll to cold. Buckets continue to roll to cold as they age in this manner. Cold buckets reside in a different location from hot and warm buckets. You can configure the location so that cold buckets reside on cheaper storage.

Finally, after certain other time-based or size-based conditions are met, cold buckets roll to the frozen state, at which point they are deleted from the index, after being optionally archived.

If the frozen data has been archived, it can later be thawed. Data in thawed buckets is available for searches.

Settings in indexes.conf determine when a bucket moves from one state to the next.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is single-instance deployment? How can we scale our deployment?

A

In single-instance deployments, one instance of Splunk Enterprise handles all aspects of processing data, from input through indexing to search. A single-instance deployment can be useful for testing and evaluation purposes and might serve the needs of department-sized environments.

To support larger environments, however, where data originates on many machines and where many users need to search the data, you can scale your deployment by distributing Splunk Enterprise instances across multiple machines. When you do this, you configure the instances so that each instance performs a specialized task. For example, one or more instances might index the data, while another instance manages searches across the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an instance?

A

A single running installation of Splunk Enterprise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Splunk component?

A

One of several types of Splunk Enterprise instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What categories of Splunk there is? Give some examples of components for each category.

A

These are the available processing component types:
Indexer
Forwarder
Search head

Management components include:
license master
monitoring console
deployment server
indexer cluster master node
search head cluster deployer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a License Master?

A

It is a component that is responsible for keeping track of data ingestion quota.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a License Quota?

A

It is the maximum daily volume of data ingested into Splunk to a given purchased license.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some sizes of enviroment?

A

<3 tb is a small enviroment
10-30 tb - large environment
>50+ tb = massive envionment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Tell me your environment!

A
Paweł: 
26tb ingestion - 28tb quota
around 20k forwarders
around 145 indexers
around 250 clients
14 search heads 
but company's plans include adding 20% devices more to the network

How to create fake environment:

a) quota = ingestion +2tb
b) 1tb = around 1000 forwarders (go under a little bit)
c) 1tb = 6 indexers (go under 6-8%)
d) 200 users per 17 tb

Mariusz:
? ? ? ?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which Splunk components typically share an instance?

A

Deployment Server and License Master

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does data enter the indexer?

A

Through a port and an IP adress.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which port we have to open to enable indexers to recieve data?

A

9997 (sometimes 9998)

Some people call it “the Indexer port”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What two types of files indexes store? (before parsing)

A

Raw data (full log files) and indexed files (tsidx)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is tsidx?

A

Copy of raw data with metadata attached to it. (indexed files)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is metadata?

A

Metadata is “data that provides information about other data”. In Splunk the metadata attached to the events includes:
host (typically the hostname, IP address, or a fully qualified domain name of the network (FQDN) host from which the event orginated. Think “server”)
Source - event source is the name of the file, stream or other input from which te event originates. Think “path”
Sourcetype - format of the data input that identifies the structure of the data, usually named by the admin or user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How does Splunk process data/logs?

A

Process occurs from two stages:

Parsing stage:

  1. From a continuous log (a stream of data), the data is split into events, and these events (indiviciual occurrences of recorded activity) are stored in the indexes.
  2. A set of metadata is then attached to each event; metadata includes host, source, and sourcetype and they are used as and identifier, along with the timestamp, of that particular event

Indexing Stage:

  1. Places events into storage segments called “buckets” that then can be searched upon. Determine the level of segmentation, which affects indexing and searching speed, search capability, and efficiency of data compression
  2. Writing the raw data nad index files to disk, where post indexing compression occures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is an event?

A

A single piece of data in Splunk software, similar to a record in a log file or other data input. When data is indexed, it is divided into individual events. Each event is given a timestamp, host, source, and source type. Often, a single event corresponds to a single line in your inputs, but some inputs (for example, XML logs) have multiline events, and some inputs have multiple events on a single line. When you run a successful search, you get back events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does happen in Parsing Stage of data processing process?

A

Parsing stage:

  1. From a continuous log (a stream of data), the data is split into events, and these events (indiviciual occurrences of recorded activity) are stored in the indexes.
  2. A set of metadata is then attached to each event; metadata includes host, source, and sourcetype and they are used as and identifier, along with the timestamp, of that particular event
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does happen in Indexing stage of the data processing process?

A

Indexing Stage:

  1. Places events into storage segments called “buckets” that then can be searched upon. Determine the level of segmentation, which affects indexing and searching speed, search capability, and efficiency of data compression
  2. Writing the raw data and index files to disk, where post indexing compression occures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a host, source, and sourcetype?

A

Host - typically the hostname, IP address, or a fully qualified domain name of the network (FQDN) host from which the event orginated. Think “server”

Source - event source is the name of the file, stream or other input from which te event originates. Think “path”

Sourcetype - format of the data input that identifies the structure of the data, usually named by the admin or user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is included in metadata that is attached to each event in Parsing Stage?

A

Host - typically the hostname, IP address, or a fully qualified domain name of the network (FQDN) host from which the event orginated. Think “server”

Source - event source is the name of the file, stream or other input from which te event originates. Think “path”

Sourcetype - format of the data input that identifies the structure of the data, usually named by the admin or user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a preconfigured Index?

A

Those are the indexes that come OOTB with Splunk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does OOTB mean?

A

Out Of The Box means that some software’s feature comes with the “base” of the software, and it doesn’t need to be installed seperatly etc. to be accessed and used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Tell us about 5 Splunk preconfigured indexes?

A

main: This is the default index. All processed data will be stored here unless otherwise specified

_internal: Stores all splunk component’s internal logs and processing metrics. It is often used for troubleshooting. Search for logs that say ERROR or WARN.

_audit: Stores events related to the activities conducted in the component - including files system changes, and user auditing such as search history and user-activity error logs.

_summary: Summamy indexing allows you to run fast searches over a large data set by scheduling Splunk to summarize data then “import” data into the summary index from another larger index over time

_fishbucket: This index tracks how far into a file indexing has occurred to prevent duplicate data from being stored. This is especially useful in the event of a server shutdown or connection error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How to activate preconfigured indexes?

A

By configuring indexes.conf properly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What does the main index do?

A

This is the default index. All processed data will be stored here unless otherwise specified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What does the _internal index do?

A

Stores all splunk component’s internal logs and processing metrics. It is often used for troubleshooting. Search for logs that say ERROR or WARN.

IT HOUSES INFORMATION FROM SPLUNKD.LOG WHICH IS A VERY IMPORTANT LOG FILE THAT TELLS YOU ABOUT THE HEALTH OF THE SPLUNK COMPONENT THAT YOU ARE ON.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does _audit index do?

A

Stores events related to the activities conducted in the component - including files system changes, and user auditing such as search history and user-activity error logs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What does _summary index do?

A

Summamy indexing allows you to run fast searches over a large data set by scheduling Splunk to summarize data then “import” data into the summary index from another larger index over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What does _fishbucket index do?

A

This index tracks how far into a file indexing has occurred to prevent duplicate data from being stored. This is especially useful in the event of a server shutdown or connection error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What does will happen when you dont specify (or specify wrong index) to which index the data has to go to?

A

It will go to the main index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is splunkd.log?

A

It is one of the most important Splunk internal logs. It is stored in internal index.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How would you troubleshoot?

A

Check splunkd.log

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How can you access splunkd.log?

A

Through back end or through search head

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Where do you find buckets in the Linux File System?

A

$SPLUNK_HOME/var/lib/splunk/defaultdb

38
Q

What is average usage and average life of each buckets?

A

hot: <7days 20x/day
warm: 7 days - 1 month 5x/week
cold: 1-3 months 5x/month
frozen 3months + /2x year

39
Q

What is the location of hot bucket?

A

$SPLUNK_HOME/var/lib/splunk/defaultdb/db/*

40
Q

What is the location of warm bucket?

A

$SPLUNK_HOME/var/lib/splunk/defaultdb/db/*

41
Q

What is the location of cold bucket?

A

$SPLUNK_HOME/var/lib/splunk/defaultdb/coldb/*

42
Q

What is the location of frozen bucket?

A

Deletion is the default. You must configure if you desire to archive the data instead

43
Q

What is the location of THAWED bucked?

A

$SPLUNK_HOME/var/lib/splunk/defaultdb/thaweddb/*

44
Q

What does “db” stand for?

A

data base

45
Q

How to configure Splunk indexer policies for bucket storage using indexes.conf?

A

By configuring different retention policies in the indexes.conf file

46
Q

What are retention policies?

A

Those are attributes that control when data rolls from warm to cold, cold to frozen etc…

47
Q

What does maxHotBuckets do?

A

maxHotBuckets = states maximum hot buckets that can exist for index

48
Q

What does maxDataSize do?

A

maxDataSize = maximum size of the hot bucket before it rolls to warm. When setting the maximum size, you should use auto_high_volume for high volume indexes (such as a network index); othwerise, use auto.

49
Q

What does maxTotalDataSizeMB do?

A

maxTotalDataSizeMB = determines when buckets roll from cold to frozen by size

50
Q

What does frozenTimePeriodInSecs do?

A

frozenTimePeriodInSecs = determines when buckets roll from cold to frozen by time.

51
Q

What is the calculation to specify when cold buckets roll to frozen?

A

86400 seconds x (hot days+cold days)

52
Q

What does coldToFrozenDir do?

A

coldToFrozenDir = is the file path to where the frozen data will be stored. This directory will usually have volume or disk attached to it.

53
Q

What does coldPath.maxDataSizeMB do?

A

coldPath.maxDataSizeMB = tell splunk how long deas data stay in cold bucket by size in MB

54
Q

What does maxWarmDBCount do?

A

maxWarmDBCount = tells how many warm buckets we want to have (300 is default)

55
Q

How to help yourself with determining your splunk environment storages?

A

https://splunk-sizing.appspot.com/

56
Q

What is a distributed search?

A

A deployment topology that portions search management and search fulfillment/indexing activities across multiple Splunk Enterprise instances. In distributed search, a Splunk Enterprise instance, referred to as the search head, distributes search requests to other instances, called search peers, which perform the actual searching, as well as the data indexing. The search head merges the results back to the user.

Distributed search provides horizontal scaling, so that a single Splunk Enterprise deployment can search and index arbitrarily large amounts of data. Distributed search is also useful for correlating data across data silos.

57
Q

Explain the process of data getting into the indexers in a round-robin fashion

A

Round robin is a mechanism of distributing data from forwarders to available indexers. So when forwarder starts forwarding process it searches for available indexers, it sends data to the available one for (default) 30 seconds, then to other avaible one for next 30 seconds etc…

58
Q

What is $SPLUNK_DB?

A

It is the path to indexes which can be stored outside of Splunk

59
Q

What are logs? 2 examples

A

Logs - it is a file that records events that occure in OS, or other software. So for example on Linux we have /var/log/secure file, which stores logs for authentication process of the system (so successful, failed attempts etc.), or /var/log/cron file, that stores cronjobs related events.

60
Q

What are configurations? 2 examples

A

Configurations - in simple definitions to configurate something means to set the way that thing behaves. In splunk enviroments configurations of different components are stored in .conf files. Eg. inputs.conf (data source, index name, type of data), indexes.conf (indexes name, how and for how long to store the data)

61
Q

What are metrics? 2 examples

A

Metrics - IT metrics are quantifable measurements used to help to specify of the quality of given product/system/software etc, and demonstrate the value of it.
Eg.:
- Uptime: This is the amount of time that systems are available and functional.
- Application crash rate—how many times an application fails divided by how many times it was used.

62
Q

What are alerts? 2 examples

A

) Alerts - tasks that continually look for and report on specific events or conditions. When the conditions of the alert are met, an alert notification is triggered.
eg:
- Some software will achieve threshold for response time, and it generates an alert
- There have been too many unsuccessful login tries to given server/system/software, which could generate an alert

63
Q

What is an event?

A

those are individual occurences of recorded activity/split data

A single piece of data in Splunk software, similar to a record in a log file or other data input.

64
Q

How is Splunk license measured by?

A

The Splunk licensing structure is based off of the indexing volume that is processed by Splunk on a per day basis. The “upper limit” of the data index volume is called “license quota”, and it can result in intense penalties, if some company’s indexing volume will exceed that quota. Companies have to increase their license quota on as needed basis as they ingest more data.

65
Q

Explain the process of how would you bring data in Splunk

A

By gathering the data from the source with universal forwarders, sending them (also with UF) to Indexers which will parse and retain that data, what in turn would allow to access that data, or in more practical terms - gain insights that or coming from it through a search head. Of course there is much, much more in the whole process (configurating components, installing Splunk, calculating which type of license do we need etc…) but those are the basics of the process.

66
Q

Explain in your own words the two stages of processing data in splunk and what each stage does to the data

A

In first stage, from raw stream of data the data is split into smaller pieces called events, and it stores them in indexes. Then metadata is being attached to single one of them (metadata = data about data, so in this case it includes host, so from what server the logs are coming from, then source, so path to the data file, and sourcetype which states the type of data). In the second stage the events with metadata are being to placed to buckets (so segments of data). That data can be searched upon (excluding data stored in “frozen” buckets”). Then it writes (stores) INDEXED data and RAW data to disk, and it compresses it.

67
Q

What are some attributes of monitoring stanza?

A

. - absolute path to the source of data

  • if we want monitoring to be enabled at all
  • type of data
  • index name to which the data is going to be attributed to
68
Q

What is a Heavy Forwarder?

A

Heavy forwarder both collects data, and parses it before forwarding it. It is slower than universal forwarder, so depending on the needs of the environment we might want to use it, but most likely we would stick to universal forwarders.

69
Q

How does data flow through these 4 components of Splunk: Deployment Server, Universal Forwarder, Indexer, Searchhead.

A

Through Deployment Server we can control the way that other Splunk components behave. So, for example it could send config files to Universal Forwarder (technically UF would pull those files from DS), and through them tell it, from which source to gather data from, and to which servers (so indexers) to send the data to. So UF gathers data from the source, sends it to Indexer whose job is to parse, refine, filter that data in indexes, which in turn makes that data possible to access and gain insights out of it through the Searchhead.

70
Q

What storage space would you allocate for the cold data retention per indexer given the following metrics: 4 TBs of daily ingestion with hot data being retained for 5 days, cold data for 60 days, and frozen data for 5 years?

A

8.4TB

71
Q

What is a virtual machine? 2 examples

A

Virtual Machine - it is a type of software, that allows us to run operating systems on other operating systems. For example we can emulate Linux on our Windows system with the software called Oracle VM Virtual Box, or emulate Windows on our Linux system with VMware Player.

72
Q

What is a network device? 2 examples

A

Network devices - are hardware, that makes up the network infrastructure, eg. hub, switch, router, modems etc.

73
Q

What is a database? 2 examples

A

Databases - are systematized collection of structured data, eg. SQL Server, mySql, Oracle, Redis

74
Q

What is the filepath location of Splunk’s buckets?

A

$SPLUNK_HOME/var/lib/splunk/defaultdb/ (excluding frozen buckets as the data gets deleted or achived in the directory we specify)

75
Q

What is the difference between the following attributes of indexes.conf: maxTotalDataSizeMB and frozenTimePeriodInSecs?

A

Those attributes determine when buckets roll from cold and frozen. maxTotalDataSizeMB specifies at what limit data size limit that happens, and frozenTimePeriodInSecs specifies after how many seconds that happens.

76
Q

How would we configure hot bucket retention by time in indexes.conf?

A

by setting the maxHotSpanSecs attribute (in seconds!)

77
Q

Our environment is currently ingesting 2 TBs with a license quota of 4 TBs of data. We are looking to increase ingestion by 1 TB. What would be the most important next step?

A

Not quite sure yet.

78
Q

What component does not have a GUI

A

???

79
Q

If I wanted to ensure that data isn’t being duplicated when my server goes down can you tell me where in Splunk I should look?

A

_fishbucket index

80
Q

If I had to troubleshoot servers not reporting in Splunk, where could I search?

A

_internal index

81
Q

When is metadata applied to data?

A

At the parsing stage in indexers after it has split the data into events.

82
Q

Explain the bucket life-cycle in detai

A

The bucket’s life has 5 phases in it’s cycle.

  1. Hot: this is the directory where all the data enters into the index and is written to the disk
  2. Warm: data comes here when hot bucket is full or Splunk is restarted. This bucket sahres the same directory as the hot ones, and with hot bucket it is the fastest but also most expensive storage. It stores frequently searched data.
  3. Cold: The data gets here after the time or space limit has been reached. It has rarely searched data that has aged. It is slower and cheaper storage and it is considered the archive tier.
  4. Frozen: archived data, or we can compare it to windows “recycle bin” - to access the data we would have to recover it through thawing process. This data is of low priority, but some companies can store it for many years, depending on it’s policies. (this data is not searchable!!!)
  5. Thawed: Buckets restored from an archive. If you archive frozen buckets, you can later return them to the index by thawing them.
83
Q

How would you configure when buckets roll from cold to frozen?

A

With frozenTimePeriodInSecs attribute in indexes.conf. So if we want to look at it from colt to frozen perspective, we would have to use this calculation: 86400 seconds x (hot days + cold days)
So seconds in a day x how many days the data will be stored in hot buckets and in cold ones before it moves to frozen.

84
Q

What are the differences between hot, warm, cold, and frozen buckets?

A

Hot and warm bucket stores the most recent, and the most accessed data. It is fastest to search in hot/warm bucket, but also it is most expensive storage. Cold bucket stores not so often searched data that has moved forward from warm bucket. but it a It is slower, but is also a cheaper storage. Frozen bucket stores old data that probably won’t be accessed, but it is kept for different reasons (eg. company’s inner policies). The data is not searchable (you have to thaw it first) but has the cheapest storage.

85
Q

What is a file path to the warm bucket?

A

$SPLUNK_HOME/var/lib/splunk/defaultdb/db/*

86
Q

How to determine when buckets roll from cold to frozen by size?

A

The maxTotalDataSizeMB parameter controls the combined size for all these buckets together. When size limit has been reached, the oldest cold bucket is rolled to frozen (and is no longer counted - regardless of whether this means deletion or archival).

87
Q

what is an attribute to set the file path to where the frozen data will be stored. This directory will usually have volume or disk attached to it.

A

coldToFrozenDir

88
Q

How to tell splunk tells how many warm buckets we want to have

A

maxWarmDBCount

89
Q

How to tell splunk how long does data stays in hot bucket by size in MB

A

homePath.maxDatasizeMB

90
Q

How to tell splunk how long deas data stay in cold bucket by size in MB

A

coldPath.maxDataSizeMB