Indexer Clustering (Architecture and theory) Flashcards

1
Q

How to change linux hostname?

A

hostnamectl set-hostname

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the command to install splunk package?

A

rpm -ivh

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Tell us the boot command

A

./splunk enable boot-start -systemd-managed 0 -user splunk –accept-license –auto-ports

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to make a component a license-slave of license-master?

A

./splunk edit licenser-localslave -master_uri https://:8089

or /etc/system/local/server.conf

[license]

master_uri = https://[ip]:8089

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to create an Indexer clustering?

A
  • Configure CM
  • Configure indexers and peer with CM
  • Configure and connect SH to Cluster Master
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to configure an instance to become CM with a single command?

A

./splunk edit cluster-config -mode master -replication_factor 2 -search_factor 2 -secret -cluster_label

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to connect Indexers with a CM?

A

./splunk edit cluster-config -mode slave -master_uri https://:8089 -secret

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to validate that indexer is connected with CM?

A

tail -f splunkd.log

GUI of CM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to validate that all peers are checking in to the CM?

A

./splunk list cluster-peers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to connect SH to CM?

A

On search head: ./splunk edit cluster-config -mode searchhead -master_uri https://:8089 -replication_port 8080 -secret

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to configure component to become a deployment client?

A

./splunk set deploy-poll https://:8089

or edit

$SPLUNK_HOME/etc/system/local/deploymentclient.conf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to configure given component to send internal logs to an indexer?

A

with outputs.conf:

[tcpout]

disabled = false

defaultGroup = studentXX-indexers

forwardedindex.filter.disable = true

indexAndForward = false

[tcpout:studentXX-indexers]

disabled = false

server = studentXX-idx01-ip:9997, studentXX-idx02-ip:9997, studentXX-idx03-ip:9997

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Where can we change the splunk components name?

A

server.conf:

[general]
serverName = student01-CM01

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to Validate and push bundles to peers?

A

Validate CM bundle:

./splunk validate cluster-bundle –check-restart

Deploy apps from Master Node to Peer-nodes by applying the Cluster Bundle:

./splunk apply cluster-bundle

Validate that the bundle is valid:

./splunk show cluster-bundle-status

./splunk list cluster-config

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Event Processing?

A

Event processing covers everything that happens to your data between the time you define an input and the time the data appears in the Splunk index.

During indexing, Splunk Enterprise performs event processing. It processes incoming data to enable fast search and analysis, storing the results in the index as events. While indexing, Splunk Enterprise enhances the data in various ways, including by:

Separating the datastream into individual, searchable events:

  • Creating or identifying timestamps.
  • Extracting fields such as host, source, and sourcetype.
  • Performing user-defined actions on the incoming data, such as identifying custom fields, masking sensitive data, writing new or modified keys, applying breaking rules for multi-line events, filtering unwanted events, and routing events to specified indexes or servers.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What types of indexes we have?

A

Splunk Enterprise supports two types of indexes:

  • Events indexes. Events indexes impose minimal structure and can accommodate any type of data, including metrics data. Events indexes are the default index type.
  • Metrics indexes. Metrics indexes use a highly structured format to handle the higher volume and lower latency demands associated with metrics data. Putting metrics data into metrics indexes results in faster performance and less use of index storage, compared to putting the same data into events indexes. For information on the metrics format, see the Metrics manual. There are minimal differences in how indexers process and manage the two index types. Despite its name, event processing occurs in the same sequence for both events and metrics indexes. Metrics data is really just a highly structured kind of event data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the events index?

A

Events indexes impose minimal structure and can accommodate any type of data, including metrics data. Events indexes are the default index type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is metrics index?

A

Metrics indexes use a highly structured format to handle the higher volume and lower latency demands associated with metrics data. Putting metrics data into metrics indexes results in faster performance and less use of index storage, compared to putting the same data into events indexes. For information on the metrics format, see the Metrics manual.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is data pipeline?

A

The route that data takes through Splunk Enterprise, from its origin in sources such as log files and network feeds, to its transformation into searchable events that encapsulate valuable knowledge. The data pipeline includes these segments: Input Parsing Indexing Search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are data’s pipeline segments?

A

Input Parsing Indexing Search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is an indexed field?

A

A field that is incorporated into the index at index time. Indexed fields include the default fields, such as host, source, and sourcetype, as well as custom index-time field extractions. In rare cases, there is some value to adding fields to the index. However, this can negatively affect indexing performance and search times across your entire deployment. There is no way to modify or remove field extractions afterwards. You can add non-indexed fields, which are extracted at search time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is index time?

A

The time span from when Splunk Enterprise receives new data to when the data is written to a Splunk Enterprise index. During that time, the data is parsed into segments and events; default fields and timestamps are extracted; and transforms are applied.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is search time?

A

Refers to the period of time beginning when a search is launched and ending when it finishes. During search time, certain types of event processing take place, such as search time field extraction, field aliasing, source type renaming, event type matching, and so on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What it is the best time to perform most knowledge-building activities?

A

Conversely, as a general rule, it is better to perform most knowledge-building activities, such as field extraction, at search time. Index-time custom field extraction can degrade performance at both index time and search time. When you add to the number of fields extracted during indexing, the indexing process slows. Later, searches on the index are also slower, because the index has been enlarged by the additional fields, and a search on a larger index takes longer. You can avoid such performance issues by instead relying on search-time field extraction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is distributed search?

A

one or more search heads distribute search requests across multiple indexers. The indexers still perform the actual searching of their own indexes, but the search heads manage the overall search process across all the indexers and present the consolidated search results to the user.

With distributed search, a Splunk Enterprise instance called a search head sends search requests to a group of indexers, or search peers, which perform the actual searches on their indexes. The search head then merges the results back to the user.

26
Q

Where are indexes located?

A

$SPLUNK_HOME/var/lib/splunk

27
Q

Why should we have multiple indexes?

A

There are several key reasons for having multiple indexes: To control user access. To accommodate varying retention policies. To speed searches in certain situations. The main reason you’d set up multiple indexes is to control user access to the data that’s in them. When you assign users to roles, you can limit user searches to specific indexes based on the role they’re in. In addition, if you have different policies for retention for different sets of data, you might want to send the data to different indexes and then set a different archive or retention policy for each index. Another reason to set up multiple indexes has to do with the way search works. If you have both a high-volume/high-noise data source and a low-volume data source feeding into the same index, and you search mostly for events from the low-volume data source, the search speed will be slower than necessary, because the indexer also has to search through all the data from the high-volume source. To mitigate this, you can create dedicated indexes for each data source and send data from each source to its dedicated index. Then, you can specify which index to search on. You’ll probably notice an increase in search speed.

28
Q

How to send all events from a data input to a specific index?

A

The following example inputs.conf stanza sends all data from /var/log to an index named fflanda:

[monitor:///var/log]

disabled = false

index = fflanda

29
Q

What is index parallelization?

A

Index parallelization is a feature that allows an indexer to maintain multiple pipeline sets. A pipeline set handles the processing of data from ingestion of raw data, through event processing, to writing the events to disk. A pipeline set is one instance of the processing pipeline described in How indexing works. It is called a “pipeline set” because it comprises the individual pipelines, such as the parsing pipeline and the indexing pipeline, that together constitute the overall processing pipeline By default, an indexer runs just a single pipeline set. However, if the underlying machine is under-utilized, in terms of available cores and I/O both, you can configure the indexer to run two pipeline sets. By running two pipeline sets, you potentially double the indexer’s indexing throughput capacity. Note: The actual amount of increased throughput on your indexer depends on the nature of your data inputs and other factors. In addition, if the indexer is having difficulty handling bursts of data, index parallelization can help it to accommodate the bursts, assuming again that the machine has the available capacity. To summarize, these are some typical use cases for index parallelization, dependent on available machine resources: Scale indexer throughput. Handle bursts of data. For a better understanding of the use cases and to determine whether your deployment can benefit from multiple pipeline sets, see Parallelization settings in the Capacity Planning Manual. Note: You cannot use index parallelization with multiple pipeline sets for metrics data that is received from a UDP data input. If your system uses multiple pipeline sets, use a TCP or HTTP Event Collector data input for metrics data. For more about metrics, see the Metrics manual.

30
Q

How can we optimize an forwarder? (1 example)

A

You can configure forwarders to run multiple pipeline sets. Multiple pipeline sets increase forwarder throughput and allow the forwarder to process multiple inputs simultaneously. This can be of particular value, for example, when a forwarder needs to process a large file that would occupy the pipeline for a long period of time. With just a single pipeline, no other files can be processed until the forwarder finishes the large file. With two pipeline sets, the second pipeline can ingest and forward smaller files quickly, while the first pipeline continues to process the large file. Assuming that the forwarder has sufficient resources and depending on the nature of the incoming data, a forwarder with two pipelines can potentially forward twice as much data as a forwarder with one pipeline.

31
Q

What is a license warning and what is the process after getting one?

A

License warnings occur when you exceed the maximum daily indexing volume allowed for your license. Here are the conditions: Your daily indexing volume is measured from midnight to midnight using the clock on the license master. If you exceed your licensed daily volume on any one calendar day, you generate a license warning. If you generate a license warning, you have until midnight on the license master to resolve the warning before it counts against the total number of warnings allowed by your license. For guidance on what to do when a warning appears, see Correct license warnings.

32
Q

What does homePath in indexes.conf do?

A

The path that contains the hot and warm buckets. (Required.) This location must be writable.

33
Q

What does coldPath in indexes.conf do?

A

The path that contains the cold buckets. (Required.) This location must be writable.

34
Q

What does thawedPath in indexes.conf do?

A

The path that contains any thawed buckets. (Required.) This location must be writable.

35
Q

What does repFactor on indexes.conf do?

A

Determines whether the index gets replicated to other cluster peers. (Required for indexes on cluster peer nodes.)

36
Q

What does maxHotBuckets on indexes.conf do?

A

The maximum number of concurrent hot buckets. This value should be at least 2, to deal with any archival data. The main default index, for example, has this value set to 10.

37
Q

What does maxDataSize in indexes.conf do?

A

Determines rolling behavior, hot to warm. The maximum size for a hot bucket. When a hot bucket reaches this size, it rolls to warm. This attribute also determines the approximate size for all buckets.

38
Q

What does maxWarmDBCount in indexes.conf do?

A

Determines rolling behavior, warm to cold. The maximum number of warm buckets. When the maximum is reached, warm buckets begin rolling to cold.

39
Q

What does maxTotalDataSizeMB in indexes.conf do?

A

Determines rolling behavior, cold to frozen. The maximum size of an index. When this limit is reached, cold buckets begin rolling to frozen.

40
Q

What does frozenTimePeriodInSecs in indexes.conf do?

A

Determines rolling behavior, cold to frozen. Maximum age for a bucket, after which it rolls to frozen.

41
Q

What does coldToFrozenDir in indexes.conf do?

A

Location for archived data. Determines behavior when a bucket rolls from cold to frozen. If set, the indexer will archive frozen buckets into this directory just before deleting them from the index.

42
Q

What does coldToFrozenScript in indexes.conf do?

A

Script to run just before a cold bucket rolls to frozen. If you set both this attribute and coldToFrozenDir, the indexer will use coldToFrozenDir and ignore this attribute.

43
Q

What does homePath.maxDataSizeMB coldPath.maxDataSizeMB in indexes.conf do?

A

Maximum size for homePath (hot/warm bucket storage) or coldPath (cold bucket storage). If either attribute is missing or set to 0, its path is not individually constrained in size.

44
Q

What does maxVolumeDataSizeMB in indexes.conf do?

A

Maximum size for a volume. If the attribute is missing, the individual volume is not constrained in size.

45
Q

This talks about attributes in indexes.conf. What is this attribute?

The path that contains the hot and warm buckets. (Required.) This location must be writable.

A

What does homePath in indexes.conf do?

46
Q

This talks about attributes in indexes.conf. What is this attribute?

The path that contains the cold buckets. (Required.) This location must be writable.

A

What does coldPath in indexes.conf do?

47
Q

This talks about attributes in indexes.conf. What is this attribute?

The path that contains any thawed buckets. (Required.) This location must be writable.

A

What does thawedPath in indexes.conf do?

48
Q

This talks about attributes in indexes.conf. What is this attribute?

Determines whether the index gets replicated to other cluster peers. (Required for indexes on cluster peer nodes.)

A

What does repFactor on indexes.conf do?

49
Q

This talks about attributes in indexes.conf. What is this attribute?

The maximum number of concurrent hot buckets. This value should be at least 2, to deal with any archival data. The main default index, for example, has this value set to 10.

A

What does maxHotBuckets on indexes.conf do?

50
Q

This talks about attributes in indexes.conf. What is this attribute?

Determines rolling behavior, hot to warm. The maximum size for a hot bucket. When a hot bucket reaches this size, it rolls to warm. This attribute also determines the approximate size for all buckets.

A

What does maxDataSize in indexes.conf do?

51
Q

This talks about attributes in indexes.conf. What is this attribute?

Determines rolling behavior, warm to cold. The maximum number of warm buckets. When the maximum is reached, warm buckets begin rolling to cold.

A

What does maxWarmDBCount in indexes.conf do?

52
Q

This talks about attributes in indexes.conf. What is this attribute?

Determines rolling behavior, cold to frozen. The maximum size of an index. When this limit is reached, cold buckets begin rolling to frozen.

A

What does maxTotalDataSizeMB in indexes.conf do?

53
Q

This talks about attributes in indexes.conf. What is this attribute?

Determines rolling behavior, cold to frozen. Maximum age for a bucket, after which it rolls to frozen.

A

What does frozenTimePeriodInSecs in indexes.conf do?

54
Q

This talks about attributes in indexes.conf. What is this attribute?

Location for archived data. Determines behavior when a bucket rolls from cold to frozen. If set, the indexer will archive frozen buckets into this directory just before deleting them from the index.

A

What does coldToFrozenDir in indexes.conf do?

55
Q

This talks about attributes in indexes.conf. What is this attribute?

Script to run just before a cold bucket rolls to frozen. If you set both this attribute and coldToFrozenDir, the indexer will use coldToFrozenDir and ignore this attribute.

A

What does coldToFrozenScript in indexes.conf do?

56
Q

This talks about attributes in indexes.conf. What is this attribute?

Maximum size for (hot/warm bucket storage) or (cold bucket storage). If either attribute is missing or set to 0, its path is not individually constrained in size.

A

What does homePath.maxDataSizeMB coldPath.maxDataSizeMB in indexes.conf do?

57
Q

This talks about attributes in indexes.conf. What is this attribute?

Maximum size for a volume. If the attribute is missing, the individual volume is not constrained in size.

A

What does maxVolumeDataSizeMB in indexes.conf do?

58
Q

How Splunk names it’s buckets?

A

Bucket names depend on:

a) The state of the bucket: hot, warm/cold/thawed
b) The type of bucket directory: non-clustered, clustered-originating, clustered replicated

59
Q

What is a volume in splunk?

A

It represents a directory on the file system where indexed data resides.

Volumes can store data from multiple indexes. You would typically use seperate volume for hot/warm and cold buckets, for instance you can set up one volume to contain the hot/warm buckets for all your indexes, and another volume to contain the cold buckets.

60
Q

How to configure a volume in Splunk?

A

in indexes.conf:

[volume:]

path =

maxVolumedataSizeMB = … (optional)

61
Q

How to use a volume in Splunk?

A

Once you configured volumes, you can use them to define index’s homepath and coldpath. For Example:

In indexes.conf:

[idx1]
homePath = volume:hot1/idx1
coldPath = volume:cold1/idx1

[idx2]
homePath = volume:hot1/idx2
coldPath = volume:cold1/idx2