Search Flashcards

1
Q

Describe the anatomy of a Search:

A
  1. Request is received
  2. Disk space on indexer is checked
  3. Create dispatch directory in $SPLUNK_HOME/var/run/dispatch
  4. Initialize config subsystem (props.conf, transforms.conf) using bundle identified by SH $SPLUNK_HOME/var/run//searchpeers/-
  5. Find buckets that match the time of the search
  6. Consult the bloom filters
  7. Find events matching any keywords within the lexicon (.tsdx files)
  8. Use the results returned to find the event offsets within the raw data from the value array
  9. Uncompress the appropriate slice in the rawdata/journal.gz to get the _raw for the event(s)
  10. Process the raw data with the automatic extractions in this order:
    - sourcetype RENAME, EXTRACT-xxx, REPORT-xxx, KV_MODE, FIELDALIAS-xxx, EVAL-yyy, LOOKUP-xxx
  11. Send the results to the Search Head
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If indexer is in manual detention, can it still e searched?

A

yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what does the dispatch directory contain?

A

Contains search status, results, log, and extracted fields in CSV format.

Kept for 10 minutes by default.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the initialize configs for the Search Head

A

Bundle is sent from the SH and includes knowledge Objects(KO) (saved searches, lookups, eventtypes.

Process of distributing KOs means that peers by default receive nearly the entire contents of the SH’s Apps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

During search, does the indexer check to see if it has enough disk space to run the search?

A

Yes.

diskUsage and detention settings are checked in server.conf on Indexer
If indexer is in Manual detention, it can still be searched.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Are hot buckets included in every search?

A

No, Hot buckets are not touched if the time range does not require it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are bloom filters?

A
  • Hash table that can eliminate buckets, therefore indexer only needs to search buckets that are not ruled out by the Bloom Filter.
  • The execution cost of retrieving events from disk grows with the size and number of tsidx files.
  • Bloom filters decrease the number of tsidx files that the indexer needs to search, decreasing the time it takes to search each bucket.
  • If a (warm or cold) filter-less bucket is older than the configured maxBloomBackfillBucketAge in indexes.conf, Splunk will not create a bloom filter for that bucket.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the Splunk lexicon?

A

Finds events that match the keywords in the search.

A location tag is created for the location of the keyword in a file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the order of extractions processed on the raw data (props.conf)?

A
  1. Inline field extraction (EXTRACT-)
  2. Field extraction using a field transform (REPORT-)
  3. Automatic key-value field extraction (KV_MODE)
  4. Field aliases (FIELDALIAS-)
  5. Calculated fields (EVAL-)
  6. Lookups (LOOKUP-)
  7. Event types (eventtypes.conf)
  8. Tags (tags.conf)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe Job Inspection

A
  • Allows for the post mortem inspection of search metrics.
  • Time Spent on Commands
  • Time Spent searching
  • Time spent fetching
  • Workload undertaken by search peers
  • Also available via REST /services/search/jobs
  • The Execution costs section contains information about the search processing components that were used to process your search.
  • With this information you can troubleshoot the efficiency of your search by narrowing down which processing components are impacting the search performance.
  • The fields shown in the Search job properties section provide information about the search job like the total amount of disk space used (in bytes), and the number of possible events that were dropped (for real-time searches).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

WHat are the different types of search commands?

A
  • streaming
  • non-streaming
  • transforming
  • generating
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe streaming commands:

A
  • Operate on each event individually
  • Distributable Streaming commands run on Indexers
  • Eval, fields, rename, regex
  • Improves Processing time, but all commands prior must also be able to be run on the indexer, else the search is run on the SH.
  • Order of events does not matter.
  • Centralized/stateful streaming run on search heads
    • head, streamstats
    • order of events matter
    • only work on searchhead
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe non-streaming commands:

A
  • Force the entire set of events to the search head.

- Sort, dedup, top

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe transforming commands:

A
  • Non-Streaming Command that operates on the entire Dataset
  • Generate a reporting Data structure.
    Chart, TimeChart, stats.
  • Can be either Streaming or Reporting
    – Streaming Reporting (Stats, Chart) generates output in Batches
    – Reporting (CLUSTER, GEOSTATS) Takes all events at once.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe generating commands:

A
  • Invoked at the beginning of a search with a leading |
  • Do not expect or require input.
    Dbinspect, datamodel, inputcsv
  • Most Generating commands are centralized
  • Results are usually returned in a list or table
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are subsearches?

A
  • Best for Small result sets
  • Join, set require subsearch
  • Used to produce terms for outer search
  • Always run first
17
Q

What is a caveat of using calculated fields that are named the same as a lookup field?

A

Fields from lookups are unavailable when calculated fields reference them in an eval expression.

18
Q

When does a subsearch run?

A

Subsearches always run before the main search

19
Q

When should you not run a SubSearch?

A

For subsearches that return many results, it is generally more efficient to use stats and/or eval.

  • Generally, subsearches take longer that other types of searches
  • Can be confirmed using the job inspector
  • GUI provides no feedback while subsearch runs; can result in sluggish user experience
20
Q

How can you tell it is a subsearch?

A

it is encloded in square brackets []

21
Q

What is the best search advice?

A

Filter early
Specify an index
Utilize indexed extractions where available
Use the TERM directive if applicable
Place streaming/remote commands before non-streaming commands
Avoid using table, except a the very end
- will cause data to be pushed to the search head
Remove unnecessary data using | fields

22
Q

What two things are a search broken into in the Search Job Properties of the Job Inspector?

A

remoteSearch(done on the indexers) and reportSearch(part of the search string which happens on the Search Head)