Homework Week 4 - Indexer Layer Flashcards

1
Q
  1. How does data flow through these 4 components of Splunk: Deployment Server, Universal Forwarder, Indexer, Searchhead.
A

The primary function of Universal Forwarders is to collect and forward data from source machines such as servers and routers to Splunk. The Deployment Server is used to centrally manage and configure Universal Forwarders across one’s environment; sends configurations, apps, and updates to the Universal Forwarders .Indexers are responsible for storing and indexing the data received from Universal Forwarders. The Search Head retrieves data from the Indexers based on user queries, and then presents the results to the user.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. What storage space would you allocate for the cold data retention per indexer given the following metrics: 4 TBs of daily ingestion with hot data being retained for 5 days, cold data for 60 days, and frozen data for 5 years?
A

6.9 TB per indexer;117.2 TB all indexers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Give 2 examples and explanation of each: a) Virtual machines, b) Network devices, c) Databases, d) Logs, e) Configurations, f) Metrics, g) Alerts
A

a) Virtual Machines:

Development Server VM: Main purpose is for when a software development team needs a dedicated environment for testing new applications. A virtual machine that emulates the target production environment is created. This sort of VM allows the team to isolate their development work and test new code without affecting the production system.

Virtual Desktop Infrastructure (VDI): Virtual machines are often used to give corporate employees virtual desktops. These VMs are hosted on servers in the data center and allow employees to access their desktop environment remotely. This centralizes management, security, and reduces the need for individual desktop hardware.

b) Network Devices:

Router: Routers are critical network devices that connect different networks together. For instance, a home router connects a local area network (LAN) to the internet, allowing multiple devices in the home to access online resources.

Switch: A network switch is used within a local network to connect multiple devices (e.g., computers, printers) and efficiently manage data traffic within the LAN. They are commonly used in corporate and data center networks.

c) Databases:

Customer Database: A company maintains a customer database containing information such as names, contact details, and purchase history. This database is used by the marketing team for targeted advertising and the sales team for customer relationship management.

Inventory Database: A retail store uses a database to track inventory levels of products in real-time. When a product is sold, the database is updated to reflect the change in stock, ensuring accurate inventory management.

d) Logs:

Server Logs: Server logs record activities on a web server. For instance, access logs contain information about each web request, including the IP address of the client, the requested URL, and the response code. These logs are critical for diagnosing issues, monitoring traffic, and ensuring security.

Security Logs: In a cybersecurity context, logs from firewall devices, intrusion detection systems (IDS), and antivirus software provide a record of potential security threats and events. Analyzing these logs helps security teams detect and respond to cyberattacks.

e) Configurations:

Router Configuration: Network devices like routers are configured with settings such as IP addresses, access control lists (ACLs), and routing protocols. These configurations dictate how network traffic is managed and routed within an organization.

Application Configuration: Software applications often have configuration files that control their behavior. For example, an email client may have configuration settings for incoming and outgoing mail servers, email signatures, and notification preferences.

f) Metrics:

Website Performance Metrics: Web servers generate metrics such as response time, request rate, and error rate. Monitoring these metrics helps ensure that a website is performing well and provides a good user experience.

Server Resource Utilization Metrics: Servers often generate metrics related to CPU usage, memory utilization, and disk space. Monitoring these metrics helps IT teams proactively manage server resources and detect performance bottlenecks.

g) Alerts:

Network Intrusion Alerts: Intrusion detection systems (IDS) generate alerts when they detect potentially malicious activity on a network, such as unauthorized access attempts or suspicious traffic patterns. These alerts trigger security incident response procedures.

Application Error Alerts: Software applications can be configured to generate alerts when critical errors occur. These alerts notify administrators or developers when issues need immediate attention, helping to minimize downtime and disruptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. What is the relationship between the deployment servers and Universal forwarders?
A

Splunk administrators define configurations for data collection and forwarding on the Deployment Server. These configurations are specific to each Universal Forwarder. The Deployment Server pushes these configurations to the respective Universal Forwarders. This process ensures that all forwarders are set up correctly and consistently .Universal Forwarders periodically check in with the Deployment Server to see if there are any updates or changes to their configurations. With the configurations in place, Universal Forwarders collect data from the source machines and forward it to the designated Indexers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. Splunk license usage is measured by
A

D. New data being indexed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. Describe a heavy forwarder and what it does.
A

Heavy forwarder is equipped with the ability to both collect data inputs, forward them to indexers, and parse the data as an indexer would. Heavy forwarders can be taken into consideration when dealing with data that require index-time extractions or as a system requirement for certain apps like DBConnect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. Explain the process of how would you bring data in Splunk?
A

First the logs are sent to the indexers from the forwarder. Then the indexers take the logs and format them and organizes them into indexes. Next, configurations determine how long the data will stay in the index and when it gets moved to a slower storage or deleted. Finally, the search heads search the indexes to create visualizations and reports.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. Explain the licensing structure of Splunk – how does Splunk charge for the use of this software?
A

The Splunk licensing model is based on the indexing volume that is processed by Splunk on a per day basis. Splunk charges for the amount of data being ingested per day.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. Explain in your own words the two stages of indexing and what each stage does to the data.
A

The two stages of indexing, parsing, and indexing, help to prepare data for effective analysis in Splunk.

Parsing examines raw data and extracts meaningful information from it, while indexing creates an organized, searchable catalog of that parsed data.

Together, these two stages facilitate the efficient searching, analysis, and visualization of data, which is an essential element of Splunk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  1. What is the default port of an indexer and what does it do?
A

The default port of an indexer is port 9997. It enables the indexers to receive data from the forwarder.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  1. What are some attributes of a monitoring stanza?
A

Some attributes of a monitoring stanza are, monitor, disabled, sourcetype, and index.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  1. Tell me about your environment?
A

In my environment we have a current quota of about 50TB, and we are currently ingesting about 49TB per day with 600 users. We have about 290 indexers, with close to 32,000 forwarders and about 12 search heads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  1. What is the file path to the location of Splunk’s buckets?
A

$SPLUNK_HOME/var/lib/splunk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
  1. What is the difference between the following attributes of indexes.conf: maxTotalDataSizeMB and frozenTimePeriodInSecs?
A

The difference between the two is maxTotalDataSize is measured in MB and it controls when data rolls over by size.

frozenTimePeriodInSecs is measured in seconds and controls when data rolls over from cold to frozen by time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
  1. How would we configure hot bucket retention by time in indexes.conf?
A

Use the maxHotSpanSecs setting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
  1. Our environment is currently ingesting 2 TBs with a license quota of 4 TBs of data. We are looking to increase ingestion by 1 TB. What would be the most important next step?
A

D. Add more universal forwarders to collect more data

17
Q
  1. What is the meaning of the acronym OOTB?
A

Out of The Box-used for preconfigured indexes that come with the Splunk Enterprise Package.

18
Q
  1. What is the summary index used for? (Do not use the explanation from the class powerpoint)
A

To give faster results of report/dashboard. It is able to store any cron/save search result in summary index so that the data in summary index can be reduced.

19
Q
  1. Which component does not have a gui?
A

C. Universal forwarder & License Master

20
Q
  1. The Heavy Forwarder takes the load off the indexers by parsing and with the ability to index data locally.
A

True

21
Q
  1. If I wanted to ensure that data isn’t being duplicated when my server goes down can you tell me where in Splunk I should look?
A

_fishbucket index

22
Q
  1. If I had to troubleshoot servers not reporting in Splunk, where could I search?
A

You would check your _internal index. This stores Splunk D.log, which is a very important log in Splunk that tells about the health of a Splunk component.

23
Q
  1. When is metadata applied to data?
A

When data is being sent to an indexer from a forwarder. The meta data will be the IP address of my computer.

24
Q
  1. What is metadata? Explain each component
A

Data that describes other data, providing a structured reference that helps to sort and identify attributes of the information it describes.

The IP address (host), source=where the data is coming from (the file path to where it’s coming from), and the format of data(sourcetype).

25
Q
  1. Explain the bucket life-cycle in detail
A

1-Bucket lifecycle starts at the hot bucket which is the directory where all the data enters into the index and is written to disk; most recent data is here.

2-The next tier down is the warm bucket, data comes here when Splunk is restarted or the hot bucket is full. This data shares the same path as the hot bucket and stores recent, frequently searched data on a fast disk.

3-Next, is the cold bucket where rarely searched data that has aged and is tucked away into slower and cheaper storage can be found.
While read-only and still searchable, this is considered the archive tier.

4-Lastly, is the frozen bucket in which data is pushed to dead media like tape or deleted. Not searchable-must recover files through a thawing process before the data becomes searchable again.

26
Q
  1. How would you configure when buckets roll from cold to frozen?
A

I would add the number of hot and cold days multiply it by 86400 secs.

27
Q
  1. Give a description of each of the preconfigured or default Splunk indexes
A

a. main-this the default index. All processed data will be stored here unless otherwise specified.

b. _internal-Stores all Splunk components’ internal logs and processing metrics. It is often used for troubleshooting. Search for logs that say ERROR or WARN.

c. _audit-Stores events related to the activities conducted in the component-including file system changes, and user auditing such as search history and user-activity error logs.

d. _summary-summary indexing allows you to run fast searches over a large data set by scheduling Splunk to summarize data and then import data into the summary index over time.

e. _fishbucket-This index tracks how far into a file indexing has occurred to prevent duplicate data from being stored. This is especially useful in the event of a server shutdown or connection errors.

28
Q
  1. What are the differences between hot, warm, cold and frozen buckets?
A

Hot buckets are the initial destination for incoming data. They serve as a temporary storage location optimized for high-speed writes. Hot buckets typically have a short retention period, often set to minutes. Data in hot buckets is readily accessible and available for real-time searching and analysis. When hot buckets reach their retention time or size limit, they roll to warm buckets.

Warm buckets provide intermediate storage for data that is still relevant but less frequently accessed. Warm buckets have a longer retention period compared to hot buckets. Data in warm buckets is accessible for searching and analysis but with slightly longer search times compared to hot data. After a defined period in warm buckets, data may roll to cold buckets.

Cold buckets are designed for long-term storage of historical data. Cold buckets have an extended retention period, often weeks or months, depending on your data retention policies. Data in cold buckets is accessible for searching and analysis, but search times may be longer compared to warm or hot data.

Frozen buckets are used for very long-term storage of historical data, often data that must be retained for compliance or legal reasons. Frozen buckets have the longest retention period, often years, based on your data retention policies. Data in frozen buckets is rarely accessed, and accessing it requires a time-consuming thawing process to decompress and reindex the data.

29
Q
  1. What is the file path to the warm bucket?
A

$SPLUNK_HOME/var/lib/splunk/defaultdb/db/

30
Q
  1. You are told to configure a new index called “ponies”. Show us what this stanza might look like, and list at least 5 attributes with an explanation of what each does.
A

[ponies]
homePath = $SPLUNK_DB/poniesdb/db
coldPath = $SPLUNK_DB/poniesdb/colddb
thawedPath = $SPLUNK_DB/poniesdb/thaweddb
maxDataSize = auto_high_volume
maxHotIdleSecs = 86400

[ponies]: This is the section header that defines the configuration for the “ponies” index.

homePath: Specifies the location where the raw indexed data will be stored on disk for this index. In this example, it’s set to $SPLUNK_DB/poniesdb/db, which is a placeholder for the Splunk database directory.

coldPath: Specifies the location where cold buckets (long-term storage for historical data) will be stored for this index. In this example, it’s set to $SPLUNK_DB/poniesdb/colddb.

thawedPath: Specifies the location where thawed buckets (data retrieved from frozen storage) will be stored. In this example, it’s set to $SPLUNK_DB/poniesdb/thaweddb.

maxDataSize: Determines the maximum size of the index. The value “auto_high_volume” means that the index size will automatically adjust for high-volume data. You can set this attribute to control the maximum size of the index, which can help manage storage resources.

maxHotIdleSecs: Specifies the maximum time in seconds that a hot bucket can remain idle before rolling to warm. In this example, it’s set to 86,400 seconds (equivalent to 24 hours). This attribute helps manage the lifecycle of hot buckets and is especially useful when you have fluctuating data volumes.

31
Q
  1. Explain the process of data getting into the indexers in a round-robin fashion
A

Distributing data across multiple indexers in a round-robin fashion is a standard method for load balancing and producing high availability.

This process involves forwarding data from data sources to multiple indexers in a circular or sequential manner to evenly distribute the workload.

By forwarding data to indexers in a round-robin fashion, load balancing is achieved, which ensures that no single indexer becomes a bottleneck. Also, this provides fault tolerance because if one indexer goes offline, data continues to be indexed on the others, ensuring data availability.