Search Flashcards

1
Q

What is the system recommodation for the reference searchhead?

A

16 cores, 12 GB RAM, RAID1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the system recommodations for the high end searchhead?

A

There is no high end reference available. As a general rule, one search consumes up to one core. If you have a high amount of users who search, the amount of CPU cores should be higher.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When a user performed a search, where does the search artifacts live and for how long?

A

The search artifacts lives under /opt/splunk/var/run/splunk/dispatch

TTL for ad-hoc and manual searches incl remote searches are 10 mins

Scheduled search do live twice the schedule period

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is SRS in terms of search artifacts?

A

Search artifacts are stored in the dispatch folder. The dispatch folder contains several directories which related to the SID of the search.

Each SID directory has a results.srs.gz, which contain the splunk search result (SRS) in a binary serialization format, it is per default not human readable.

To convert the SRS in a readable CSV format, use the splunkd toCsv [output path] tool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the job inspector do?

A

The job inspector is report which gets generated for each search.

The job inspector does contain valuable information of the search costs.

It also contains informations of how Splunk breaks down the searches into the central and remote parts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

List some of the ways to minimize the search costs?

A
  • Define index and sourcetype
  • Set the time range
  • Make sure to use a search mode which fits to your needs
  • Try to avoid using NOT, use AND instead
  • Try to avoid using transactions, use stats first() instead
  • Try to avoid using join, use stats first() instead
  • Use streaming commands before non-streaming commands
  • Instead of using wildcards, be more specific
  • Limit the output of your search
  • Try to use the TERM() directive for eg ip addresses
  • Use fields to only work with the important fields
  • Use filter commands before calculating commands
  • Use Data Model Acceleration or Report Acceleration
  • Use the job inspector to recognize the slowest part of your search and tune it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does the inner-search works in Splunk?

A

Example on how a search works:

1) Search for index=main name=peter within the last 24 hours
2) Search gets streamed to the Indexer tier (after checking if there is enough disk space available and detention mode)
3) Indexer checks if the queried index already exists
4) Splunk hashed the search terms (name=peter) and compares them to the hashes in the bloomfilter, which reside in the related index
5) The bloomfilter provides informations if the search term does NOT exisit in the buckets
6) If there is a match in the hashes, Splunk now checks the TSIDX files related to the positive buckets to find out where exactly the raw data is located
7) The TSIDX files provide a seek address and Splunk now finds the data in the journal.gz files and uncompresses them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the size of an uncompressed slice in the journal.gz?

A

~128KB of uncompressed data make up a slice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why should you avoid wildcards in your search?

A

Because wildcards are not compatible with bloomfilters and searches will take longer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why should you avoid the NOT operator in your search?

A

Bloom filters are designed to quickly locate data. Searching for terms which does not exists, will take longer (use AND or OR operator instead).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the order of extractions processed on the raw data (props.conf)?

A
  1. Inline field extraction (EXTRACT-)
  2. Field extraction using a field transform (REPORT-)
  3. Automatic key-value field extraction (KV_MODE)
  4. Field aliases (FIELDALIAS-)
  5. Calculated fields (EVAL-)
  6. Lookups (LOOKUP-)
  7. Event types (eventtypes.conf)
  8. Tags (tags.conf)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between search type and search mode?

A

Search mode is the method which Splunk uses to process the data on a search-time level. There are 3 different modes available (fast, smart, verbose).

Search type describes the way the SPL is used. On a high level, there are two types of searches:

  • raw searches (typcially searching for eg http codes)
  • transforming searches (eg performing statistical calculation)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Before a search gets send out to the indexer, which two parameters gets checked before performing the search on an indexer?

A
  • Available disk space

- Detention (active|inactive)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

On a low level, what 4 types of searches does exist?

A
  • Streaming commands
  • Non-streaming commands
  • Transforming commands
  • Generating commands
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If a search begins with a | (pipe), what kind of search is that?

A

A generation command

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an example of a streaming command?

A

Streaming commands gets streamed to the indexer. The indexer then takes over this part of the search and streams back the results.

A typical example is the eval command or the rex command.

17
Q

What is the difference between a non-streaming command and a centralized command?

A

It is the same. Non-streaming commands does not get streamed to an indexer, hence they are performed centralized.

18
Q

List 4 examples of non-streaming commands

A
  • top
  • dedup
  • stats
  • sort
  • many transforming commands
19
Q

List 4 examples of transforming commands

A
  • timechart
  • chart
  • stats
  • top
  • rare
20
Q

What are the characteristics of a non-streaming command?

A

A non-streaming command requires the events from all of the indexers before the command can operate on the entire set of events

21
Q

What is the reason to use a subsearch?

A

A typical scenario of using a subsearch is, if the target is a moving host (eg most active host today). Once the target has been detected, the inner-search hands over its results to the outer-search (main search)

22
Q

If the outer-search of a subsearch has a time defined (earliest=-30m), does it apply to the subsearch too?

A

No, the defined time in the outer search does not apply to the subsearch. Only the time which is defined in the global time range picker applies to the subsearch.

If the time range in both searches needs to be different, set the time directly in the subsearch too.

23
Q

What are the limitations of a subsearch?

A

A subsearch can only display/process up to 10k (can be changed through limits.conf) events. The search also runs maximum 60s before it stopps. The user experience and the results can be sluggish.

A subsearch is only recommended to use for a small set of data.

24
Q

A subsearch has a runtime limitation of 60s per default. Where can it be changed?

A

limits.conf

25
Q

What is an alternative to a subsearch?

A

the stats() command (works not in all cases)

26
Q

What is the difference between a remoteSearch and reportSearch in regards to the Job Inspector?

A

A remoteSearch is performed on indexer (eg streaming commands) and a reportSearch works locally on the SearchHead (eg non-streaming commands)

27
Q

What is the meaning of the field ‘dispatch.check_disk_usage’ in the Job Inspector?

A

The time spent checking the disk usage of this job

28
Q

What is the meaning of the field ‘eai:acl’ in the Job Inspector?

A

Describes the app and user-level permissions. For example, is the app shared globally, and what users can run or view the search?

29
Q

If you want to meassure how a search performed, which field do you use check in the Job Inspector?

A

scanCount/second

Rate should hover between 10k and 20k events per second for performance to be deemed good

30
Q

What are the characteristics of a search with the mode ‘fast’ ?

A

Splunk only returns information on default fields and fields that are required to fulfill your search. If you are searching on specific fields, those fields are extracted.

Under the Fast mode you will see only event lists and event timelines for searches that do not include transforming commands

31
Q

What is the meaning of the field ‘command.search.kv’ in the Job Inspector?

A

Tells how long it took to apply field extractions to the events.