Data Collection Flashcards

1
Q

Describe different ways data can be ingested by an indexer

A
  • Universal Forwarder, Heavy Forwarder
  • Monitor Files
  • Scripted Inputs
  • Network Inputs (TCP, UDP)
  • HEC
  • WindowsEventLog,admon,perfmon,regmon
  • FirstInFirstOut (FIFO)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What firewall rules needs to be defined for Splunks applicationserver (default port 8065)

A

None. It is a loopback port only for internal communication.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

List the types of Forwarder

A
  • Universal Forwarder (smallest footprint)
  • Heavy Forwarder (medium footprint)
  • Light Forwarder [deprecated since v6] (smaller footprint)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the chunk size of data send from an Universal Forwarder to an Indexer?

A

A forwarder sends data in 64kb blocks (unparsed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you make sure that data from a Universal Forwarder does not arrive truncated or trashed on the indexer tier?

A

Always configure EVENT_BREAKER and EVENT_BREAKER_ENABLE. The regex can be copied from LINE_BREAKER. This makes sure that the the Universal Forwarder does send properly broken data chunks to the indexer tier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

List the most common pre-trained sourcetypes

A
  • Application Server (log4j, Websphere)
  • Mailserver (sendmail, postfix)
  • OS (Linux, Windows, OSX)
  • Network (Cisco)
  • Datebases (DB2, mysql)
  • Webserver (access_combined, apache)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the fishbucket?

A

Splunk remembers in here what it has read

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Is the _fishbucket index an real index?

A

It used to be a real index until version 3.x. It still lives in the $SPLUNK_DB directory but it has its own structure, based on a btree database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can you reset a file to be re-indexed again?

A
Stop splunk
-------------------
./splunk cmd btprobe -d $SPLUNK_HOME/var/lib/splunk/fishbucket/splunk_private_db 
--file  --reset 
-------------------
Start splunk
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe steps to troubleshoot data inputs?

A
  • Checking splunkd log
  • Using btool (splunk btool –debug input list)
  • Activating debugging
  • Check permissions on the source file
  • Check input format (e.g. binaries, non-utf8 conform)
  • Check if file exists
  • Check network connection
  • Check CRC of the file
  • Tailing processor can be checked with ./splunk _internal call /services/admin/inputstatus/TailingProcessor:FileStatus
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you clean the eventtdata from an index?

A
Stop splunk
-------------------
./splunk clean eventdata -index  
( to clean all indexes, just drop off -index  )
-------------------
Start splunk
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What happens if you remove the fishbucket index on a Universal Forwarder?

A

It kicks of a process to re-index all eventdata. Be very careful to proceed with this step. Only recommended in a well planned scenario.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a oneshot?

A

Copy the file directly into Splunk. This uploads the file once, but Splunk Enterprise does not continue to monitor it.

You cannot use the oneshot command against a remote Splunk Enterprise instance. You also cannot use the command with either recursive folders or wildcards as a source. Specify the exact source path of the file you want to monitor.

This is a common method for PS consultants to test and validate props/transforms configurations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the great eight?

A
EVENT_BREAKER_ENABLE = true
EVENT_BREAKER = \r\n
MAX_TIMESTAMP_LOOKAHEAD = 30
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H-%M-%S
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)
TRUNCATE = 100000
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why the great eight are recommended to use?

A

Because it improves the data process significant. Splunk gets everything what it needs to know to parse data. There is no need to detect the settings by itself, which redues the load. It is best practise to always use the great eight.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does a monitor input configuration looks like if you want to index two files from /var/log/, based on best practises?

A

inputs.conf:

[monitor:///var/log]
whitelist = (openvpnas.log.|secure.)$
index=syslog

props.conf

[source::…openvpnas.log*]
sourcetype = openvpn:log

[source::…secure*]
sourcetype = linux:secure

This configuration makes sure that there is no overlapping in reading the inputs.

17
Q

How do you configure a heavy forwarder properly?

A

By activating the default app /apps/SplunkForwarder on a full instance

18
Q

How do you configure a light forwarder properly?

A

By activating the default app /apps/SplunkLightForwarder on a full instance

19
Q

When should a batch input not be used?

A

This should not be used for files that continue to be written to

20
Q

Is WMI the way to go to index Windows data?

A

No, it is not recommended to use WMI. There are security concerns related to WMI. There are also effiency reasons. There are informations that WMI did not get any improvments since Windows Server 2008.

21
Q

Does the HEC inputs.conf can live under its own app context?

A

No, that is not possible. HEC inputs.conf always needs to live in the app context of of $SPLUNK_HOME/splunk_httpinput/local/inputs.conf to work properly.

22
Q

How does Splunk know where to put the incoming data from a HEC input?

A

It is defined by the HEC token. The token gets associated with sourcetype/index while it gets generated.

23
Q

What else (next to defining the destination index and sourcetype) can the HEC token do?

A

It is also the authentication method

24
Q

What is the definition of cooked data?

A

Cooked data covers a parsed and unparsed data stream (but not raw data).

25
Q

What is crcSalt?

A

If set to ‘source’ , the full directory path to the source file is added to the CRC. This ensures that each file being monitored has a unique CRC.

Not recommended for rotating files.

26
Q

When Splunk checks a file for changes, how many characters does Splunk look into the file to check for changes?

A

The first 256 bytes (default)