Cribl Admin CCOE Flashcards

1
Q

Which of the following is a valid JavaScript method?

A

.startswith
.endswith
.match

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following logical operator are used as an “and” operator?

A

&&

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Value Expressions can be used in the following locations

A

Capture Screen and Routes Filtering Screen
Routes Filtering Screen and Pipeline Filtering
Pipeline Filtering and Capture Screen
None of the above! is the correct answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Value Expressions are used to evaluate true or false.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of the following logical operator are used as an “not” operator?

A

!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Git

What command shows you the files that have changed, been added, or are tracked?

A

Status

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What order must you use to add a new file to a remote repository?

A

add, commit, push

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which command allows you to see a history or commits?

A

git log

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which command allows you to add a file to the respository?

A

add

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Worker Process

A

A process within a Single Instance, or within Worker Nodes, that handles data inputs, processing, and output. Worker Processess operate in parallel. Each Worker Process will maintain and manage its own outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Worker Node

A

An instance running as a managed worker, whose configuration is fully managed by the Leader Node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Worker Group

A

A collection of Worker Nodes that share the same configuration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Leader Node

A

an instance running in Leader mode, used to centrally author configurations, and monitor a distributed deployment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Mapping Ruleset

A

an ordered list of Filters, used to map Workers to Worker Groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which of the following is not a Worker responsibility?

A

Back up to Git (local only)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which of the following is not an advantage of a Distributed deployment over a single instance?

A

Advanced data processing capabilities
Advantages include - Higher reliability, unlimited scalability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Load Balancing among the Worker Processes is done the following way:

A

The first connection will go to a random Worker Process, and the remaining connection will go in increasing order to the following Worker Processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

All Cribl Stream deployments are based on a shared-nothing architecture pattern, where instances/Nodes and their Worker Processes operate separately

A

True!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The Single Stream instance is valid for dev, QA or testing environments

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In Distributed Mode, the Worker Node…

A

is Stateless
Can continue running even without communication to the Leader with limitations
Can be accessed from inside the Leader
The main path between Sources and Destinations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which of the following is true regarding Worker and Leader communication?

A

Worker initiates the communication between Leader and Workers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Worker processes within a Node are distributed using a round robin process based on connections

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Which of the following are valid Stream deployment options?

A

Single Instance (software loaded on single host)
Distributed Deployment (Leader and Workers)
Stream deployed in Cribl’s cloud (SaaS)
Stream deployed in customers own cloud instance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Worker Group to Worker Group communication is best done by using…

A

Stream TCP
and
Stream HTTP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Cribl.Cloud advantages
Simplified administration Simplified distributed architecture Git preconfigured Automatic restarts and upgrades Simplified access management and security Transparent licensing
26
Cribl.Cloud does not provide TLS encryptionon any Sources
False
27
Cribl.Cloud allows for Stream to Stream communication from Cloud Worker Groups to on-prem Worker Groups
True
28
Cribl.Cloud allows for restricted access to certain IP adresses
True
29
When using Stream in Cribl.Cloud, how do you get data into the cloud?
Using common data sources that are pre-configured (TCP, Splunk, Elastic, etc) Using ports 200000-200100 that are available to receive data
30
Cribl.Cloud has preconfigured ports you can use to bring in data
True
31
Which of the following is not valid for a Cribl.Cloud deployment?
Single Stream instance Distributed Stream instance with Leader on-prem & workers in the Cribl.Cloud
32
Which of the following are benefits when using Cribl.Cloud?
Simplified administration Git preconfigured Automatic upgrades
33
Cribl.Cloud cannot integratte with an on-prem Cribl Worker Group
False
34
Cribl.Cloud allowed ports include
20000-20010
35
Cribl.Cloud does not provide any predefined sources
False
36
What affects performance/sizing?
Event Breaker Rulesets Number of Routes Number of Pipelines Number of Clones Health of Destinations Persistent Queueing
37
Estimating Deployment Requirements
Allocate 1 physical core for each 400GB/day of IN & OUT throughput 100GB in -> 100GB out to 3 destinations=400GB total. 400GB/400GB=1 physical core
38
Which of the following will impact your choice for amount of RAM?
Persistent Queueing requirements
39
Cribl Worker Process default memory is
2GB RAM
40
How many Worker Nodes, each with 16vCPU is needed to Ingest 10TB and Send out 20TB?
11 Worker Nodes
41
Cribl recommends you use the following specifications?
16vCPU per Worker Node
42
How can a Stream deployment be scaled to support high data and processing loads?
Scale up with higher system performance (CPU, Ram, Disk) on a single platform Scale out with additional platforms Add more worker groups
43
With a very large # of sources (UFs), it is possible to exhaust the available TCP ports on a single platform
True
44
Leaders require higher system requirements than workers
False
45
Persistent Queueing (Source & Destination) might impact performance
True
46
Cribl scales best using...
Many medium size Worker nodes
47
Remote Repository Recovery - Overview
1. System Down 2. Install Git on Backup Node 3. Recover configuration from remote repository 4. Restart Leader Node 5. Back Operational :)
48
Setting up and Connecting to Git Hub
1. Set up GitHub 2. Create an empty crypto repository 3. Generate keys to connect Stream to GitHub (Public key>GitHub/Private Key>Stream) 4. Configure Stream UI to connect to Remote Git 5. Once connected, each time a change is made to local to sync with the remote repository
49
When using this commandto generate SSH Public and Private keys: ssh-keygen -ted25519 -C "your_email@example.com", which file contains the public key
id_ed25519.pub
50
A remote repository on GitHub is a mandatory requirement when installing Cribl Stream
False
51
A Remote Git instances is
Optional for all Stream Deployments
52
What are the methods to backup Cribl Leader Node?
Rsync Tar / untar Copy configuration files to S3, rehydrate configuration files from S3
53
Git and Git Hub provides backup and rollback of Cribl Stream configurations
True
54
Cribl Stream fault tolerance requires the use of a remote Git repository
True
55
What is a true statement about GitHub acounts?
Requires manual configuration outside of Cribl Stream configuration
56
Stream disaster recovery requires a dedicated standby backup Leader
False
57
Which Git commands are part of the recovery steps?
Git init Git fetch origin
58
What is the purpose of using Git?
To provide a backup of configuration files To provide a history of changes within Stream
59
./cribl help -a
Displays a list of all the available commands
60
Common Cribl Stream commands
./cribl start ./cribl stop ./cribl restart ./cribl status (shows Stream status) ./cribl diag (manages diagnostic bundles)
61
Cribl Stream CLI
CLI gives you the ability to run commands without needed access to the GUI Helps in creating automated scripts if needed Gives you the ability to run diagnostics and send them to Cribl Support
62
What command is used to configure Cribl Stream to start at boot time?
boot-start
63
What format are the diag files in?
.tar.gz
64
What does the command 'cribl diag' create command do?
Creates a gzip file with configuration information and system state
65
What command is used to configure Cribl Stream as a leader?
./cribl mode-master
66
Once you run 'cribl boot-start enable -m systemd', you will need to use what command to start/stop Stream?
systemctl start cribl
67
The configuration files created with the diag command are in .js format?
False
68
You cannot export packs using the command line
False
69
What types of files are in the diagnostic file?
Files in the local directory Log files State of the system Details about the system running Stream
70
You can use the 'mode' command to configure a Cribl Stream instance into a Cribl Edge Node?
True
71
72
You cannot install Packs using the CLI
False
73
# Troubleshooting Source Issues What is the status of the source?
Sources will have a red status on Leader until they are deployed to a worker group. Status can still be red if there are binding issues
74
# Troubleshooting Source Issues If you do a live cpature on the Source, are there any events?
Make sure JavaScript filter set for the live capture is correct. If no data is returned, the problem is likely with the network or further upstream
75
# Troubleshooting Source Issues Is the Source operational/reachable?
Ping the server? Using nc or telnet command, test the connection source
76
# Troubleshooting Source Issues Is the Destination triggering backpressure?
Check by going to the Destination in Monitoring>Destinations and clicking on Status. If the Source is connected via a Route to a Destination that is triggering backpressure, set to Block to stop sending data.
77
# Troubleshooting Source Issues Check Source config
Typos? Proper authentication?
78
# Stream Sources Summary
Stream can accept data pushed to it, or pull data via API calls Open protocols, as well as select proprietary products, are supported Pulling data falls into two categories * Scheduled pulls for recurring data (think tailing a file) * Collector jobs intended for ad hoc runs as in Replay scenario Push Sources push to us such as Splunk, TCP Internal sources are internal to us such as Datagens or Internal logs/metrics Low-code interface eases management Capture sample data at any stage to validate and test
79
# Stream Syslog Sources Stream Syslog Sources Summary
Stream can process a syslog stream directly Moving to Cribl Stream from existing syslog-ng or rsyslog servers fully replaces those solutions with one that is fully supported and easily managed Optimze syslog events Syslog data is best collected closest to the source Use a load balancer to distribute load across multiple worker nodes Reduce management conplexity while ensuring reliable and secure delivery of Syslog data to chosen systems
80
Configuring Elastic Beats
Beats are open-source data shippers that act as agents. Most popular with Cribl customers: Filebeat - filebeat.yml Winlogbeat - Winlogbeat.yml
81
Change control is built into the system via Git
True
82
Users are independent Cribl Stream objects that you can configure even without RBAC enabled
True
83
URL of the Elastic server that will proxy non-bulk requests
Proxy URL
84
While Splunk Search collector is a powerful way to discover new data in realtime, you should update the Request Timeout Parameter to stop the search after a certain period of time to avoid...
Having the collector stuck in a forever running state
85
Senders with load balancers built in include:
Elastic Beats Splunk Forwarder
86
When considering Filebeat, to ensure data is received at Stream, change the filebeat .yml to
'setup.ilm.enabled: false'
87
If Stream receives an event from Elastic Beats, we can deliver the event to
Any destination
88
Roles are a set of permissions
False
89
Cribl Stream ships with a Syslog Source in_syslog, which is preconfigured to listen for
Both UDP and TCP traffic on Port 9514
90
All syslog senders have built-in load balancing
False
91
Review of Collectors
Stream Collectors are a special group of inputs that are designed to ingest data intermittently rather than continuously. Collectors can be scheduled or run ad-hoc Cribl Stream Collectors supports the following data types
92
Cribl Stream Collectors supports the following data types:
Azure Blob Google Cloud Storage REST S3 Splunk Search Health Check Database File System Script
93
# Collectors in Single Deployments When a Worker node receives the job:
-Prepares the infrastructure to execute a collection job -Discovers the data to be fetched -Fetches the data that match the run filter -Passes the results either through the Routes or into a specific Pipeline
94
# Collectors in Distrubuted Deployments In a distributed deployment, collectors are configured per Worker Group (within the Leader)
-The Worker Node execute the tasks to its entirety -The Leader Node oversees the task distribution and tries to maintain a fair balance across jobs -Cribl Stream uses "Least-In-Flight Scheduling" -Because the Leader manages Collectors' state, if the Leader instance fails, the Collection jobs will fail as well.\
95
Worker Processes
A Worker Node can have multiple worker processes running to collect data. Since the data is spread across multiple worker processes, an alternative like Redis is required to perform stateful suppression and stateful aggregation
96
Discovery Phase
Discovers what data is available based on the collection settings
97
Collection Phase
Collects the data based on the settings of the discovery phase
98
Workers will continue to process in flight jobs if the Leader goes down.
True
99
If skippable is set to yes, jobs can be delayed up to their next run time if the system is hitting concurrency limits.
True
100
Worker Nodes have
Multiple processes that process data independently
101
Worker Nodes keep track of state when processing data?
False
102
What happens after the Worker Node asks the Leader what to run?
The Leader Node sends work to Workers based on previous distributions of work.
103
Workers will stop processing collector jobs that are currently running if the Leader goes down
False
104
Filesystem collectors and Script collectors can only run in a on-prem Stream environment
True
105
What are the ways you can run a collection job?
Scheduled or AdHoc
106
The following collectors are available in Cribl Cloud
S3 Collector and REST Collector
107
You can run a scheduled collection job in preview mode
False
108
Streaming Destinations
Accept events in real time
109
Non-streaming Destinations
accept events in groups or batches
110
Configuring Destinations
For each destination type, you can create multiple definitions, depending on your requirements. Definitions include Block, Drop, Queue
111
# Value of Destinations Support for many destinations
Not all data is of equal value. High volume low value data can be sent to less expensive destinations
112
# Value of Destinations Send data from the same source to multiple destinations
1. Simplify data analytics tools migration 2. Store everything you may need in the future, analyze only what you need now
113
# Value of Destinations No extra agents required
Data collected once can be sent to multiple destinations without extra operations cost to run new agents
114
# Value of Destinations Integrations with common destinations
1. Quick time to value 2. Operations cost reduction
115
# Value of Destinations Live data capture shows what's sent to destinations
Reduce troubleshooting effort
116
# Value of Destinations Persistent Queue
1. Minimize data loss 2. Eliminate/minimize the need to introduce separate buffering/queueing tools
117
Multiple Splunk Streaming Destinations
Splunk Single Instance - Stream data to a single Splunk instance Splunk Load Balanced - Load balance the data it streams to multiple Splunk receivers (indexers) Splunk HEC - Can stream data to a Splunk HEC (HTTP Event Collector) receive through an event endpoint
118
# Splunk Destinations Tips Enabling Multi-Metrics
Multi-metrics is data sent in JSON format which allows for each JSON object to contain measurements for multiple metrics. Takes up less space and improves search performance
119
# Splunk Destinations Tips Adjusting timeouts and Max connections
Adjust timeout settings for slow connections. Increase request concurrenct based on HEC receivers
120
# Splunk Destinations Tips _raw Fields and index Time Fields in Slpunk
-Everything that is in _raw is viewable as event content -outside of _raw is metadata which can be searched with tstats or by including :: instead of = -Fields outside of _raw are viewe when event is expanded -If events do not have a _raw field, they'll be serialized to JSON prior to sending to Splunk
121
# Splunk Destinations Summary
-Cribl Stream can send data to Splunk using a variety of different options -Data can be sent securely over TLS -Enabling multi-metrics can save space and perform better
122
Elastic Destinations
Bulk API - Performs multiple indexing or delete operations in a single API call
123
# Elastic Destinations Data Structure Best Practice
Put all fields outside of _raw. use JSON
124
Elastic Data Stream
1. Create a policy > an index templatw 2. Each data stream's index template must include name or wildcard pattern, data stream's timestamo field, and mappings and settings applied to each 3. Source for data stream 4. Destination for data stream 5. Support for ILM
125
# Elastic Destinations Key Use Cases
-Route data from multiple existing data sources or agents -Migrate data from older versions -Optimize data streams and send data in the right form to Elastic
126
Splunk > Elasticsearch
Step 1: Configure Splunk Forwarder Step 2: Configure Splunk Source in Stream Step 3: Configure Elasticsearch Destination Step 4: Configure Pipeline (regex extract function, lookup function, GeoIP function) Step 5: Results
127
Destination: Amazon S3
Stream does NOT have to run on AWS to deliver data to S3
128
# Destination S3 Partitioning Expression
Defines how files are partitioned and organized - Default is date-based
129
# Destination S3 File Name Prefix Expression
The output filename prefix - Defaults to CriblOut Use only with low cardinality partitions and understand impact to open files & AWS API
130
# Destination S3 Cardinality
=Max Unique Values Number of Staging Sub-directories or S3 Bucket prefixes
131
Cardinality too high?
When writing to S3 - too many open files and directories on worker nodes When reading from S3 - Less chance of hitting S3 read API limits
132
# Destination S3 Cardinality too Low?
When writing to S3 - bigger files written to fewer directories in S3 When reading from S3 - Less filtering ability during replays, more data downloaded so larger data access charges, larger changer of hitting S3 read API limit
133
Cardinality General Guidance
Plan for cardinality of no more than 2000 / partition expression
134
Stream to Stream
Sending data from Stream Worker to Stream Worker, not Worker to Leader
135
Internal Cribl Sources
Receive data from Worker Groups or Edge Nodes Common for Customer-managed (on-prem) Worker sends data to a Worker in Cribl.Cloud Internal Cribl Sources treat internal fields differently than other Sources
136
Internal Cribl Destinations
Enables Edge nodes, and/or Cribl Stream instances, to send data to one or multiple Cribl Stream instances Internal fields loopback to Sources
137
Stream Best Practices
-For maximum compression, it is best to change the data to JSON format -Internal Cribl Destinations must be on a Worker Node that is connected to the same leader as the internal Cribl Source -For minimum data transfer, process data on source workers instead of destination workers -For heavy processing, process data on destination workers
138
When setting up an S3 destination the file name prefix expression:
Can negatively impact both read and write API count Can dramatically increase number of open files Generally avoid unless you've done your due diligence and have low cardinality partition expressions All of the above
139
It is not recommened to enable Round-Robin DNS to balance distribution of events between Elasticsearch cluster nodes
False
140
What are two benefits of a worker group to worker group architecture?
Compress data and reducing bandwidth Reducing Cloud provider egress costs
141
For heavy processing, a recommendation best practice is to process data on
Destination workers
142
When tuning settings for an S3 destination, a good way to avoid any "too many open files" errors is to decrease the number of max open files.
False
143
Which of the following allows you to configure rules that route data to multiple configured Destinations?
Output router Parquet Formation
144
Which is an ideal scenario for worker group to worker group architecture?
Capturing data from overseas sources that is destined to local destinations Reducing the number of TCP connections to a destination Capturing data from a cloud provider and shipping it to an on-prem destination to avoid engress costs all of the above
145
With Exabeam, it is important to figure out what syslog format/content needs to be in place
true
146
What are the two main considerations for S3 Destinations?
Cardinality of partition and file name expressions Max open files on system
147
Stream S3 destination setting raw means
Less processing, smaller events, no metadata
148
Routes
-Allow you to use filters to send data through different pipelines. -Filtering capabilities via JavaScript expression and more control -Data Cloning allows events to go to subsequent route(s) -Data Cloning can be disabled with a switch toggle
149
# Routes Dynamic Output Destinations
-Enable expression > Toggle Yes -Enter JavaScript expresion that Stream will evaluate as the name of the Destination
150
# Routes Final Toggle
Allows you to stop processing the data depending on the outcome. If an event matches the filter, and toggle is set to Yes, those events will not continue down to the next Route. Events that do not match that filter will continue down the Route
151
# Routes Final Flag and Cloning
-Follow "Most Specific First" when using cloning -Follow "Most General First" when not using cloning -At the end of the route, you will see the "endRoute" bumper reminder
152
# Routes Unreachable Routes
Route unreachable waarning indicator: "This route might be unreachable (blocked by a prior route), and might not receive data. Occurs when matching all three conditions: -Previous Route is enabled -Previous Route is final -Previous Route's filter expression evaluates to true
153
# Routes Best Practices
Filter Early and Filter fast! -you want to quickly filter out and data you do not want to process
154
# Routes Best Practices continued
-Certain JavaScript string operators run faster than others -Each of these functions operates similarly to each other, but slighty different: -indexof, includes and startswith use strings as their function parameter -match, search, and test use regular expressions
155
# Routes Best Practices: Most Specific/Most General
Most General: If cloning is not needed at all (all Final toggles stay at default), then it makes sense to start with the broadest expression at the top, so as to consume as many events as early as possible Most Specific: If cloning is needed on a narrow set of events, then it might make sense to do that upfront, and follow it with a Route that consumes those clones immiediately after Object Storage (S3 buckets): Since most data going to object storage is data being cloned, it is best to put routes going to object storage at the top. Filter on common fields. Filter on fields like inputid, and metadata fields, rather than _raw.includes
156
You created a QuickConnect against a source and now you want to create a route against a subset of that source's events - to a different destination. What are the steps you need to take?
Navigate to the Source. Go to 'Connected Destinations'. Click on 'Routes' to revert to using them instead of QuickConnect. Create 2 routes: one to replace the old QuickConnect that was deleted, and a new route with a filter to map to the events of interest.
157
Both QuickConnect and Routes can be used against the same source.
False
158
What's the general rule for having a performant system?
Filter early and filter fast!
159
Which is true?
-Routes have drag and drop capabilities to connect to a source to a destination; QuickConnect doesn't (FALSE) -QuickConnect has advanced capabilities to assign for assigning pre-processing pipelines to a source and post-processing pipelines to a destinations (FALSE) -QuickConnect does not allow mapping a Pack between sources and destinations (FALSE) -Routes map to a filter; QuickConnect maps a source to a destinatiosn (TRUE!!!!)
160
Which is the most performant JavaScript function?
indexOf
161
Which is a good use case for QuickConnect?
-Stream Syslog Source receiving events from hundreds of device types and applications (NOOOOOOOO) -Stream Splunk Source receiving events from Windows and Linux hosts with Splunk Universal Forwarders (NOOOOOO) -REST API Collector polling Google APIs with JWT authentication (NOOOOOO) -Palo Alto devices sending to a dedicated Stream Syslog Source mapping to a different port than other syslog events (YESSSSS)
162
163
Filter Expressions
Filter Expressions are used to decide what events to act upon in a Route or Function. Uses JavaScript language
164
Value Expressions
typically used in Functions to assign a value. Uses JavaScript language
165
There are 3 types of expressions
-Assigning a Value -Evaluating to a Value -Evaluating to true/false
166
Filter Expressions Usage
Filter Expressions can be used in multiple places: -Capture -Routing -Functions within Pipelines -Monitoring Page
167
# Special Use Expressions Rename Function - Renaming Expression
name.toLowerCase(): any uppercase characters in the field name get changed to lowercase name.replace("geoip_src_country", "country"): This is useful when JSON objects have been flattened (as in this case)
168
Filter Expression Methods
Expression methods can help you to help determine true or false. Here is a list of commonly used methods: .startswith: Returns true if a string start with the specified string .endswith: Returns true if a string ends with the specified string .includes: Returns true if a string contains the specified string .match: Returns an array containing the results if the string matches with a regular expression .indexOf: returns the position of the first occurrence of the substring
169
Cribl Expressions Methods
Cribl Expressions are native methods that can be invoked from any filter expression. All methods start with C. Examples: C.Crypto or C.Decode
170
What operators are available to be used in Filter Expressions?
&& || ()
171
The Filter Expression Editor allows you to
Test your expression against sample data Test your expression against data you have collected Test your expression against data to see if it returns true or false Ensure your expresison is written correctly
172
Filter Expressions are only used in Routes
False
173
Select all the Fitler Expression operators you can use
">" "<" "==" "!=="
174
Filter Expressions can be used in the following places
Functions within Pipelines Routes Monitoring Page Capture Page
175
You can combined two Filter expression
True
176
What is the difference between using "==" or "==="
"==" checks that the value is equal but "===" checks that the value and type are equal
177
You can use .startsWith and .beginWith in filter expressions
False
178
Pipelines
Pipelines are a set of functions that perform transformations, reduction, enrichment, etc.
179
Benefits of pipelines
-Can improve SIEMs or analytics platforms by ingesting better data -Reduce costs by reducing the amount of data going into a SIEM -Simplifies getting data in (GDI)
180
Pipelines are similar to
Elastic LogStash Splunk props/transforms Vector Programming
181
Types of Pipelines
Pre-Processing - Normalize events from a Source Processing - Primary pipeline for processing events Post-Processing - Normalize events to a Destination
182
# Type of Pipelines Pre-Processing
This type is applied at the source Used when you want to normalize and correct all the data coming in Examples: -Syslog Pack pre-processing all syslog events coming from different vendors; specific product packs/pipelines can then be mapped to a route -Microservices pack pre-shapes all k8s, docker, container processed logs -Specific application pipeline/packs can then be mapped to routes
183
# Types of Pipelines Processing Pipelines
Most common use of pipelines you can associate pipeline to routes using filters
184
# Types of Pipelines Post-Processing
Maps to Destinations Universally post-shape data before it is routed Examples: -Convert all fields to JSON key value pairs prior to sending to Elastic -Convert all logs to metrics prior to sending to Prometheus -Ensure all Splunk destined events have the required index-time fields (index, source, sourcetype, host)
185
# Pipelines Best Practices!
Name your pipeline and the route that attaches to it similarly -Create different pipelines for different data sets. Creating one big pipeline can substaintially use more resources, become unmanagable, and look confusing and complicated. -Filter early and filter fast! -Do not reuse pipelines. Do not use the same pipeline for both pre-processing and post-processing. Can make it hard to identify a problem and where it stems from -Capture sample events to test. Allows you to visualize the operation of the functions within a pipeline. -Test! Use data set to test and validate your pipeline -Use statistics. Use Basic Statistics to see how well your pipelines are working -Pipeline Profiling - determine performance of a pipeline BEFORE it is in production
186
You should create different pipelines for different data sets
True
187
Pipelines contain Functions, Routes and Destinations
False
188
Stream Functions Overview
-Functions act on received events and transform the received data to a desired output. -Stream ships with several functions that allow you to perform transformations, log to metrics, reduction, enrichment, etc. -Some expressions use JavaScrip -For some functions, knowning Regex will be required
189
5 Key Functions
Eval Sampling Parser Aggregations Lookup
190
# Types of Functions Eval
Evaluate fields - Adds or removed fields from events Keep and Remove Fields - Keep fields take precedence over remove fields
191
# Types of functions Parser
It extracts fields out of events, or can be used to manipluate or serialize events
192
# Types of Functions Parser
Types CSV - splits a field containing comma separated vvalues into fields Delimited Values - Similar to CSV, but using any delimiter Key=Value pairs - Walks through the field looking for key value pairs (key=value) and creates fields from them. JSON Object - Parses out a full JSON object into fields Extended Log Format - Parses a field containing an Apache Extended Log Format event into fields Parses a field containing an Apache Common Log Format event into fields
193
# Types of Functions Lookup
Looks to enrich your events from other data sources. Performs look ups against fixed databased such as CSV,CSV.GZ Theres three match modes: Exact, CIDR, regex Three match types for CIDR and Regex: First Match, Most specific, All GeoIP: Performs looks up against fixed databased like MMDB or Maxmind DNS Lookup: Performs DNS queries and returns the results Redis: Supports the entire REDIS command set
194
# Types of Functions Lookups - Things to look out for
-Exact match will be case sensitive -Results will be added as fields in the event -Order your lookup from most specific to least -Create efficient regex -For DNS enrichment, use local caching DNS
195
# Types of Functions Aggregations
allows you to apply statistical aggregation functions to the data to generate metrics for that data
196
# Aggregation Functions Avg()
which returns the average values of the parameter specified (for example, the parameter is a field that contains the number of bytes in, say a firewall transaction, avg will return the average number seen in the time window
197
# Aggregation functions median()
Will similarly return the median (the "middle" number of the sorted values of the parameter within the time window)
198
# Aggregation functions min() and max()
each returns the minimum or maximum value, respectively, of the parameter within the time window
199
# Aggregation functions perc()
returns the specified percentile of the values of the specified parameter
200
# Aggregation functions per_second()
returns the rate that the different values of the parameter occur at in the event window
201
Aggregate function tips
Stream is a share nothing architecture.
202
# Types of Functions Sampling
duplicates events as they are passing through Stream
203
# Types of functions Mask
Mask/Replace/Redact patterns and events. helpful for masking personal information
204
# Types of functions Regex Extract
Extract using regex named groups
205
# Types of Functions General purpose
Eval, parser, drop, aggregations, rename
206
# Types of functions Enrichment
Lookup, DNS lookup, GeoIP, Redis
207
# Types of functions Statistical
Dynamic sampling, publish metrics, rollup Metrics
208
# Types of functions Advanced
Chain, Clone, Code, Event Breaker, JSON Unroll, Tee, Trim timestampl, Unroll, XML Unroll
209
# Types of functions Formatters
CEF serializer, flatten, serialize
210
Functions Best Practices
-Use typeahead to get a list of functions you can use in JavaScript -You can use tooltips to get help on most fields in the UI by clicking the question mark -Add comments and descriptions to your functions in order to explain what is happening -Function groups allows you to group a set of functions together -Use the three dotes to access additional functions to a pipeline
211
The Parser command can extract fields from the following data types
CSV delimited values JSON Object SQL (incorret, it cannot parser this)
212
Which function allows you to create metrics out of any data set?
Aggregations
213
The Mask function allows you to replace data in events
True
214
Which functions allows you to de-duplicate events as they pass through Stream?
suppress
215
You can add or remove fields using JavaScript expressions with the Parser function
True
216
Which function allows you to easily extract fields out of events?
Parser
217
Which function can add or remove fields from events?
Eval
218
Which functions allows you to enrich your events from other data sources?
Lookup
219
The sampling function allows you to get samples of data for testing purposes?
False
220
Lookups cannot be used to enrich data
False
221
Stream Packs Overview
Packs let you Pack up and share Stream configurations and workflows across Worker Groups, or across organizations
222
What is in a Pack?
Packs contain everything between a Source and a Destination
223
What is not in a Pack?
Sources Source Event Breakers Collectors Destinations Knowledge Objects
224
Packs
Make them useful for the community. Include sample files and lookups to ensure the community can test your pack Make them reusable. Make sure you include details on how to configure any relevant Sources and Destination
225
Good Pack Standards
-start names with cc for community members. use all lower case letters. use dashes for separate words
226
Pack - Best Practices
-There is no concept of a Local directory inside the Data directory -Changes to Pack will create a local copy of that change -Local always wins over default -Making changes to routes will create a local version of route.yml
227
Packs - Best Practices: Deleting Defaults
-Never delete anything in the default folder -If you delete items in default, they will reappear when you reload configs or restart the leader -Workaround: Untar the pack in the CLI, carefully delete things and update the appropriate references in the files, tar up the contents of the Pack from within the pack folder
228
Packs - Best Practices: Updating Knowledge Objects
-Never modify Knowledge objects that ship with the pack -If you modify any knowledge object that ships with the Pack, it will be overwritten. This includes lookups, etc. -Workaround - create a new knowledge object, any new knowledge object will not be overwritten
229
Packs - Best Pratices: Cannot see updates
-Pack was updated but you cannot see any new updates or new features -since local has a higher preference, you will not see any of the new updates that are in the default -Workaround: delete and install the new pack, import the updated pack, import the pacck with a new ID each time you install a PAck update, merge local changes from the older pack into the newer pack
230
Packs - Best Practices: Deleting Pack Routes
-Do not delete routes in a pack -you deleted all the routes in a pack and reinstalled the pack but the routes do not return -Workaround: delete the pack, restart the leader, reinstall the pack again
231
Packs - Best Practice: Tips and Tricks
-review the README to understand Pack updates -Import the Pack with a separate/unique ID to see the new updates -Exporting a Pack with the merge option selected will overwrite defaults and will merge any local changes -The Cribl Knowledge Pack is a great way to learn more advanced functions in Stream
232
In a distributed deployment, Packs are distributed to the worker group level
True
233
Packs can be imported using which of the following ways
-import a file -import from a URL -import from Git -import from https://packs.cribl.io
234
All Packs that are created will automatically be shared with the community
False
235
Packs can....(select all that apply)
-Enable plug and play deployments for specific use cases -Improve time to value by reducing hurdles and providing Cribl Stream users with out of the box pipelines -Target users in medium/large deployments sharing configurations and content across multiple worker groups
236
Without packs, an administrator must do all pipeline configuration manually
True
237
Users are allowed to create packs and can share them with the community, if applicable
True
238
You can find existing Cribl Packs by searching https://packs.cribl.io
True
239
What are Packs?
Pre-built configuration blocks designed to simplify the deployment and use of Cribl Stream
240
Which is the best answer for how packs are created?
-Cribl creates packs and makes them available for Cribl Stream users -Partners and Users can create packs and make them available for Cribl Stream users -Downloaded packs can be edited for specific needs and then shared -ALL OF THE ABOVE IS CORRECT
241
When exporting a Pack, what are the three export mode options?
Merge safe, Merge, and Default only
242
Stream Replay Overview
-Route data to cheap storage, Replay it back later -Search and Replay only the data you need -Send the Replayed data to any destination
243
Object Store vs Alternatives
Recommendation: Use Object Store Cost: Object Store is 70-95% cheaper than alternatives Metadata and searchability: Searching Object Store is a top choice for high volumes of data. Searching File storage is more appropriate for lower volumes of data. Volume: For high volumes of data, object or block storage are best Retrievability: Data is relatively retrievable from all three types of storage, though file and object storage are typically easier to access Handling of metadata: typically, best served by object storage
244
# Replay Worker Group
Recommendation: Use dedicated Worker Group No impact on Production Worker Nodes: Use dedicated Worker Group to process large amount of historical data and avoid impact on other workloads Egress: Place the Worker Group in the same Cloud provider as the Object Store (S3) and Destination Dynamic Scaling: If possible, use Dynamic Scaling, for example in Kubernetes
245
# Replay Partitions
Recommendations: Partitining Expression on Destination should be the same as the Partitioning Expression on the Collector
246
# Replay Enable User Friendly Replays
Recommendation: Enable user friendly replays
247
# Replay Search
Recommendation: Use Partitioning Expression in Search. Do not use content from within the events
248
# Replay Destination
Recommendation: Use a field to mark the data you want to Replay. Send Replayed data to any destination
249
Replay Summary
Replay means jumping into critical logs, metrics and traces as far back in time as you want, and saying "let's see that again." Keep more data for longer retention periods and pay a lot less Replay data to any analytics tools for unexpected investigations Improve the quality and speed of your analytics environment by saving older data somewhere else Using Object Store (S3) is the most effective storage
250
Cribl recommends using a dedicated worker group to process your replay data
True
251
An AWS S3 Key Prefix is the same thing as a Cribl S3 Key Prefix
False
252
Which is a use case for routing data to an Object Storage? Select all that apply
-Reducing Analytics tool or SIEM spend -Making data available for other soultions -Replaying historical data for a threat hunting exercise -Replaying debug logs for a troubleshooting event Correct Answer is all of the above!
253
Replay data should be sent to a dedicated index if the destination is Elastic or Splunk
True
254
Cribl recommends using production worker groups to process your replay data
False
255
To make it easier to identify events that been replayed...
Use a unique Index name
256
When considering replay, which of the following are best served by using an object storage. Select all that apply
Retrievability Handling of metadata Cost NOT permissions
257
Which of the following can be used when Cribl Replays data from an Object Storage?
Partitioning Expression filtering File name Expression filtering
258
For Replay to work, you must put all data in JSON format
False
259
# Cribl Edge Why at the Edge?
-The edge is where we see the most data being generated -Use data directly from the edge without having to move it
260
Installing Cribl Edge Nodes
-Able to install on Docker, Kubernetes, Linux, and Windows Servers -To install, go to Manage > Edge Nodes > Add/Update Edge Node -Provides customizable scripts for each operating system
261
# Cribl Edge Kubernetes Sources
-Kubernetes Logs (collects container logs and system logs from containers on a Kubernetes Node) -Kubernetes Events (collects cluster-level events from a Kubernetes Cluster -Kubernetes Metrics (collects events periodically based on the status and configuration of the Kubernetes cluster)
262
# Cribl Edge Linux Sources
-System Metrics (collects metrics data including messages from CPU, Memory, Network, and Disk) -Journal Files (centralized location for all messages logged by different components in a systemd-enabled system)
263
# Cribl Edge Windows Sources
-Windows Event Logs (collects standard event logs, including Application, Security, and System logs) -Windows Metrics (collects metrics data from Windows hosts)
264
# Cribl Edge Cribl HTTP and Cribl TCP destinations
-Enable Edge Nodes to send data to peer Nodes connected to the same Leader -Cribl HTTP (best suited for: Distributed deployments with multiple workers. Use of load balancers. Valuable in hybrid cloud deployments.) -Cribl TCP (best suited for: medium size deployments. All on prem. Valuable in certain circumstances)
265
# Cribl Edge Cribl HTTP and Cribl TCP continued
-HTTP/TCP Destination must be on Edge Node connected to the same Leader as HTTP/TCP Source -Must specify same Leader Address on Edge Nodes that host Destination and Source -To configure Leader Address via UI > log into Edge Node's UI -Destinations Cribl endpoint must point to peer Address and Port of Source -When configuring hybrid workers, Edge Nodes that host Destination / Source must specify exact same Leader Address
266
# Cribl Edge Setting Up Edge to Stream
1) Cribl Source to receive data from Edge Node 2) Configure Destination on Edge to send data to Stream 3) Configure Route to send your data to Stream
267
# Cribl Edge Summary
-Deploy to a variety of machines using provided scripts (ability to deploy to a wide variety of systems including Linux servers, Windows servers, Docker containers and Kubernetes) -Capture sources from a wide variety of systems (built in sources allows for quick and easy configuration to gather the data you need) -Combine with Cribl Strea (When using Edge with Stream, you unlock the power of Stream by using Workers to process the data)
268
What is AppScope?
-Open source, runtime-agnostic instrument utility for any Linux command or application -Offers APM-like, black-box instrumentation of an unmodified Linux executable and application -Interposes itself between applications and share libraries and system calls -Observe applications from the inside, viewing resource consumption, filesystem traffic and network traffic including clear text payloads
269
# AppScope Data Routing
-AppScope gives you multiple ways to route collected data. The basic operations are: -in a single operation, you can route both events and metrics to Cribl Edge, default configuration -You can also route both events and metrics to Cribl Stream, local instance or in the Cribl.Cloud -Support routing events and metrics to a file, a local Unix socket or any network destination, in addition to Cribl Edge and/or Stream
270
# AppScope Installing AppScope
-Go to Cribl.io, download from the top menu, download your preference. -Installing: Load and execute via CLI, done and ready to start working
271
# AppScope Configuring AppScope
Scope.yml is the sole library configuration file for AppScope. Environment vvariables override configuration settings
272
# AppScope Using AppScope: Scoping your first command
State 'Scoping' - the most basic command: /bin/echo another command: scope metrics scope events scope events 0 (gives info on that event) scope events -j | jq - events in JSON format
273
# AppScope Tracking Scope History
scope hist (defaults to last 20) to scope a specific session use the ID. example: scope hist --id 2
274
# AppScope Scoping Applications
'scope perl' 'scope events' 'scope events --id 1 - fs.open' (file system events) -a says to output all events -j outputs events as JSON -jq filters down to just the file names sort and uniq helps us find only the unique filenames opened
275
# AppScope Log Data
bat log.py scope python3 log.py
276
# AppScope Network Metrics
scope sh -c 'echo "some bytes" | nc -w1 localhost 10001' scope metrics -m net.tx -m net.duration --cols
277
# AppScope Network Events
scope events -t net
278
# AppScope Network Flows
scope flows scope flows ir1JM1 (flowID)
279
# AppScope HTTP Events
scope curl -so /dev/null http://localhost/ scope events
280
# AppScope AppScope Graphics
scope metrics -id 1 -g proc.cpu_perc scope metrics --id 1 -g -m proc.fd
281
# AppScope AppScope Summary
Detailed Telemetry: automatically collects application performance data. Automatically collect log data written by the application Easy Management: Use the CLI when you want to explore in realtime, in an ad hoc way. Use the AppScope library (libscope) for longer-running, planned procedures Platform Agnostic: Offers ubiquitous, unified instrumentation of any unmodified Linux executable. Supports single-user or distributed deployments
282
Cribl Edge allows you to run executables and collect the output of the command
True
283
Cribl Edge cannot auto-discover log files on the system
False
284
By using Cribl Stream Leader, you can tell Cribl Edge what files you want to monitor using the GUI
True
285
Cribl Edge does not allow you to see machine metrics such as CPU, Memory, or IO
False
286
AppScope provides a CLI based dashboard to see the status of AppScope
True
287
Cribl Edge allows you to replace your data ingestion agent with a vendor agnostic agent
True
288
AppScope interposes itself between applications and shared libraries and system calls
True
289
AppScope is an open-source, runtime-agnostic instrumentation utility for any Linux command or application
True
290
Which AppScope command allows you to see a history of scoped commands?
scope hist
291
You cannont send AppScope data to Cribl Stream
False
292
# TLS Keys How to Stat Using Public Key (RSA) Cryptography
Step 1: A private key (a large prime #) is (always) created first using a took like openssl Step 2: Using the private key, a public key (another large prime #) is created and embedded in a Certificate Signing Request. This requires specifying minimum set of info: subject's name (CN=), org name, OU, city, state, country, and possibly subject alternative name (SAN) Step 3: The CSR is signed, either by it's own private key or a CA's key Step 4: You now have a certificate with a private key
293
293
Things to keep in mind when working with Certs and Keys
-a cert cannot exist without being signed -public key (in signed certificate) can encrypt/verifiy data -Private key can decrypt/sign data -Caveat: Entity possessing the private key may not be the rightful owner
294
Certificate Authorities
-CAs are used to sign Cert Signing Requests -Public vs Private - depends on the needs such as vetting levels, cost, cert visibility -The first/top-level CA is the root > assertion of trust -The second CA is an subordinate/intermediate - option but best practice
295
Self-Signed Certificates
-Self signed certificates are not simply ones you sign yourself -self-signed cert is simply one signed by the same entity whose identity it certifies -Every root CA cert is self-signed -Every self-signed cert is also a root but not necessarily a CA -Still provides confidentiality, but authenticity and data integrity are suspect -CA-signed (public or private) certificates mititgates these issues -One step further is having the CA root cert deemed a trusted root by applications
296
Levels of trust
Increasing trust as u go down the list: -Unsigned certs (no such thing) -Self-signed certs -Private CA-sgned certs -Public CA-signed certs -CA-signed certs whereby the CA is deemed trusted
297
Certificate Chains
-Chains exist when a non-self-signed certificate is involved -Many public CAs use chains to protect their root certs -Frequently used within organizations handling their own signing -Validating chains - starts at the bottom and moves up the chain to the root: issuer of each cert matches the subject of the next cert (except for the root) Each cert is signed by the private key corresponding to the next cert up the chain (except root) Last cert (top of the chain) is the trust achor
298
What is a Cipher Suite?
-Client and server applications are configured with a set of ciphers -Consist of multiple categories of algorithms -Many combinations exist as discrete suites -SSL/TLS versions have cipher suites associated with them -When a TLS version is released, new ciphers may be provided -Old ciphers can be deemed insecure > deprecated
299
Components of a cipher suite (
1. Protocol: TLS in this example 2. Key Exchange: During the handshake the keys will be exchanged via ephermeral Ellitic Curve (EC) Diffie Hellman (ECDHE) 3. Authentication: ESDSA is the authentication algorithm 4. Bulk Encryption: AES_128_GCM (symmertric), specficially w/ Galois Counter Mode using a 128-bit key size 5. Hash: SHA-256
300
Working with Certificates Summary
-Asymmetric encryption will be important any time you are looking to encrypt data from sources/destinations to most modern applications, including Stream -PKI involves a public key used to encrypt data and a private key used to decrypt the public key encrypted data -Certificates can be self-signed or signed by a Certificate Authority, self signed can be used for internal to internal encryption
301
What type is encryption utilizies a public/private key pair?
Asymmetric
302
A self signed certificate has a higher level of trust than a public CA signed certificate
False
303
(Select all that apply) TLS utilizes
Symmetric encryption and Asymmetric encryption
304
What does CA stand for in PKI?
Certificate Authority
305
KMS (Key Management Service) Overview
-Cribl Stream encrypts secrets stored on disk -The keys used for encryption (cribl.secret) are managed by KMS -The keys are unique to each Worker Group + Leader -Encryption key can be managed by Cribl Stream or by an external KMS -Secrets encrypted by the Key: Sensitive information stored in configs and data encryption keys stored as configs
306
Benefits of Using External KMS
-Centralized key management for your organization -Change and access audit -High availability key management options -Minimizing key exposure
307
KMS Options
Stream Internal is the default KMS. Changing your KMS is not available with Stream free license -to get to KMS Settings: Settings > Security > Secrets -A System/Leader key; additional keys for each Worker Group -If HashiCorp Vault or AWS KMS are used, Leader and Worker Nodes must have network access to the external KMS
308
Setting HashiCorp Vault as the KMS
-Keys are set up separately at the Leader and each Worker Group levels to contain secrets access to the Worker Groups and the Leader -After KMS configuration is performed in Cribl Stream, the specified Secret Path will be created in the Vault
309
KMS Best Practices
-Backup your cribl.secret files before switching to external KMS -Switching from external to internal KMS while the external KMS is not accessible may render your Cribl Stream environment unusable -If an external KMS is used, Leader AND Worker Nodes must have access to the external KMS to operate -Test your KMS configuration in a non-production Cribl Stream environment
310
Once you configure a Worker Group with a KMS system, it will sync with the other Worker Groups
False
311
Where is KMS configured in a distributed Stream environment?
Separately, in the Leader Node and Worker Groups settings
312
Secrets are not encrypted when stored to disk
False
313
(Select all that apply) What external KMS system, does Stream integrate with?
HashiCorp Vault AWS KMS
314
Worker Groups and Leader have a unique set of keys that are used for encryption
True
315
External KMS is required for Stream to function
False
316
Workers configured with external KMS will function if the KMS cannot be reached from the Workers
False
317
Once you configure the KMS system on the Leader, the Leader will push out the configuration to the Workers
False
318
You can use a KMS provider with the free version of Cribl Stream
False
319
If using an external KMS, the Leader and Workers must have access to the external KMS to operate
True
320
# Stream Cert Validations Configuring settings as a TLS server
-Authenticate Client (mutual auth) - if true, server requests client cert. Off by default -Validate Client - Clients whose certs aren't authorized (i.e. signed by built-in CAs) have connection denied. Off by default -Mutual auth enables optional CN validation via regex
321
# Stream Cert Validations Validation checks (by NodeJS) when client/server validation is enabled
-Leaf cert expiration and validation of CA chain then -CN / SAN checks per RFCs -Only one is checked, regardless of no matches. SAN checked first, if values exist. -IPs are only accepted if theya re in both SAN and Subject attributes
322
Stream Cert Validations
-Stream as a client can validate the remote server certification using Validate server certs toggle -Some destinations (like AWS) allow rejecting unauthorized (example is self-signed certs) -If GUI does not provide a Reject Unauthorized toggle, then a global one can be used (Requires a restart and must be included in systemd unit file)
323
Creating your own certs
generating a self-signed certificate with openssl
324
Configuring Stream Cert & Chain
-For self-signed, simply add the cert to the Certificate field -Preferably, use the CA Certificate field for importing one or more CA certs. Pros: avoids using NODE_EXTRA_CA_CERTS. Cons: not obvious trusted CA certs are associated with this host cert -Sub/root CA certs can be added to the Certificate field
325
Best Practices: Certs in Worker Groups
-Worker nodes should appear identical to external systems -Worker nodes should internally reflect their individuality for better security -API and cluster settings on a node can use the same cert reflecting the worker's name -Subject (CN is hostname) and SAN should be defined -Use the SAN to include all possible names -Manage certs via UI == each worker gets the full cert set
326
Best Practices: Certs in Worker Groups
-Separate (from API/custer) certs can and should be used (managed via GUI) for src/dst configs to reflect the worker group's FQDN -Two Options: Single cert for all workers, or different cert on each worker -Former is more scalable due to wildcard but validationfails if connecting with IPs -Depending on details (example key size) some systems may not accept the configured cert -For both options, trusted root CA (vs internal CA) is preferred and possibly required
327
Single Worker to Leader Traffic (alt. option)
-TLS can also be configured for Worker to LEader comms using the intance yaml file, environment variables, or via CL -yaml config will be done via the $CRIBL_HOME/local/_system/instance.yml file under the distributed section
328
# Certs Troubleshooting
Logs: -$CRIBL_HOME/log/cribl.log -Certs/TLS errors will be logged here. If workers are not showing up on the leader check the worker logs for cert errors. -$CRIBL_HOME/local/_system/instance.yml -Contains TLS settings, helpful if the workers are not connecting Tools -openssl s_client -connect host:9000 -This will give you details of the certificate being presented on the port, can be useful to verify the certificate details
329
Certs/TLS Summary
-TLS can be a complicated feature to enable, proper planning and having a basic understanding of TLS client server architecture can help -There are multiple places that TLS can be used -Worker to Leader, Source to Worker, Worker to Destination, Leader GUI -Have a means to track certificate issuance and expiration -Use the Stream logs to assist in troubleshooting TLS problems
330
The Leader TLS can be disabled/enabled via the CLI
True
331
TLS does not work in containerized environments
False
332
(Select all that apply) With GUI or API access, which components are the server in the client/server model?
-Worker -Leader -NOT Client or Browser
333
A Leader to Worker TLS connection supports Mutual authentication
True
334
Node.JS uses the system certificate store to validate certificates
False
335
# Cribl Stream Projects Configuration Steps
-Configure Cribl Member -Create a Cribl Member user with the correct access to Stream and other products -Provide the new Cribl Member access to their Worker Group -Configure Stream Project -Create a Subscription -Create a Data Project using the Subscription above -Add available destinations to the project -Assign Users -Give a Cribl member permissions to the Stream Project
336
# Cribl Stream Projects Cribl Members
-Provides control over who has access and visibility within Cribl Projects -Compliments current authentication methods but will eventually replace them Settings > Global Settings > Access Management > Members
337
# Cribl Stream Projects Configuring - Worker Group Access
Worker Group > Group Settings > Access Management > Members
338
# Cribl Projects Roles
Admin: Full Access Editor: Can modify resources within the group Read Only: has read only access to resources within the group User: no access unless shared
339
# Cribl Stream Projects Configuring - Subscriptions
Worker Group > Projects > Subscription
339
# Cribl Stream Projects Configuring - Data Projects
Worker Group > Projects > Data Projects
339
# Cribl Stream Projects Summary
-Cribl Admin can provide teams/users with specific data without mdoifying data for other users -Cribl Members provide granular access to Cribl products including Stream, Edge and Search -Stream Projects enable users to have control over their data by providing granular access to data flowing through Cribl Stream
340
Using Projects.....
the team can share complex Cribl Stream data through the subscription
341
What is a Metric?
-Metrics are a number respresentation of data measured over intervals of time -Metrics can be an incredibly useful and important part of your observability strategy -Many logging systems extract and calculate metrics -Cribl Stream can extract metrics that are not always available
342
Logs to Metrics
-Logs can take up a lot of space and come from multiple systems -Metrics tend to be leaner and faster -Solution: Calculate metrics to send to analytics system, and archive the rest
343
Cribl Stream and Metrics
-Cribl Stream pipelines contain functions to aggregate or transform logs to metrics -Extract data from a log line, convert that data to metrics -Three different functions -Aggregate -Publish metrics -Rollup metrics
344
Cribl can only pass on data as metrics if the data is ingested into Cribl as a metric
False
345
Logs tend to be way leaner in terms of storage requirements compared to metrics
False
346
Metrics are a numeric representation of data measured over intervals of time
True
347
Cribl Stream provides three different pipeline functions (aggregate, publish metrics, and rollup metrics) to use to convert your logs to metrics
True
348
Cribl is not able to enrich metrics before they are sent to their destinations
False
349
Metrics offer better analysis experience and faster performance compared to logs
True
350
What exactly is a Trace?
-Traces represent the end to end request flow through a distributed system -the data structure of traces looks almost like an event log -Traces are made up of spans. Spans are events that are apart of a trace
351
Traces and App Monitoring
-In App Monitoring, traces represent what applications spend time on -Used by app developers to measure and identify least performant calls in code -Trace generation and analysis is often done by APM tools
352
Trace Spans
Each span begins with: traceid, name, id
353
Cribl Stream and Traces
Cribl Stream can receive and route data without having to stitch, remove irrelevant data, create metric data Cribl Stream can process raw OptenTelemetry data without app-level changes. Also store raw data indefinitely (such as AWS S3)
354
Cribl Stream needs to stitch traces in order to receive and route data
False
355
When Cribl Stream transforms raw Otel data, it is done at the app-level
False
356
Traces are made up of spans
True
357
Each span begins with an index
false
358
TraceID is shared across all spans in the trace
True
359
Traces are used by app developers to measure and identify least performant calls in code.
True
360
Leader Node Logs
-API/main process in $CRIBL_HOME/log/directory -Config Helper process in $CRIBL_HOME/log/group/GROUPNAME directory
361
Worker Node Logs
-API process in $CRIBL_HOME/log/directory -Worker process in $CRIBL_HOME/log/worker/WP#/directory
362
Who is watching the watcher?
Pro: Easy to use Cribl Stream to send its own logs Cons: if something isn't working, logs might not get sent
363
Leader Node Logs
-Leader itself doesn't process data, so it can't forward its own logs -You can use any file collection option, such as Elastic Filebeat, Splunk Universal Forwarder, Cribl Edge, etc. -Logs can be collected from the leader via /system/logs API endpoint
364
365
# Logging Summary
-Logs can be viewed on disk, Leader UI, or Forwarding -You have control over logging level or redaction -Forwarding can be convienent but has trade offs
366
Leader Node logs are located in
$CRIBL_HOME/log
367
Notifications.log contains alerts
False
368
There is a cribl.log for the Leader Node and for the Worker Node
True
369
(Select all that apply) What log files are in Cribl Stream?
-cribl.log -access.log -audit.log -notifications.log
370
Worker Node logs are created in
$CRIBL_HOME/log
371
Access.log contains API calls
True
372
Cribl.log will contain information on bundle deployments
True
373
Worker Process logs are located in
$CRIBL_HOME/log/worker/[wp#]/
374
Worker Nodes will log when they attempt to connect to the Leader Node
True
375
There are logs for the Leader Node and the Worker Node
True
376
# Upgrading Upgrade Sequence
-Single-Instance (upgrade the instance) -Distributed Deployment: Upgrade the leader, then the Workers, Commit and Deploy
377
Preparing for an Upgrade
-Default files will be overwritten (check for modifications and custom functions) -Download package and checksum files if not using CDN
378
Manual Upgrade
Step 1: Stop Stream Step 2: Back up $CRIBL_HOME (optional) Step 3: Uncompress new version over the old one Step 4: Start Stream Step 5: Validate your Stream environment
379
Distributed Deployment Upgrade
Step 1: Commit and Deploy (git push to remote repo (optional)) Step 2: Upgrade the Leader (stop Stream, back up $CRIBL_HOME, uncompress new version over the old one, Start Stream Step 3: Upgrade the Worker Nodes (wait for all the Workers to report to the leader, stop, uncompress new version over the old one, Start Stream) Step 4: Commit new software version changes (ensure that all workers have reported with new version, commit & Deploy after verifying all workers are upgrade)
380
Upgrading Leader Node through the UI
Stream Settings > System > Upgrade
381
Cribl Cloud Upgrade: Cribl-Managed Cloud or Hybrid Deployment
-Cloud Leader and Workers will be automatically upgraded -Disable Automatic upgrades only applies to customer-managed workers
382
Upgrading Summary
-Upgrade is an install of a new version over the old -You have the option of manual, UI, or automatic upgrade -UI Upgrade of workers can be done separately for each worker grou -You can control how each worker group is upgrade -Cribl-managed cloud leader and workers upgrade automatically
383
Worker Nodes will stop processing data while the leader is being upgraded
False
384
During an upgrade, changes to default files will be
Overwritten
385
If you are using a Cribl managed leader in a hybrid environment, all workers will be upgraded automatically
False
386
Worker nodes will report to the leader if they are running a different version
True
387
A possible upgrade sequence is:
Stop > Uncompress > Start
388
For manual upgrade, you can decide to upgrade only a portion of your worker nodes at a given time
True
389
If "disable automatic upgrades" is set to Yes, your cloud leader will not be upgraded
False
390
(Select all that apply) Your options for Package source are
-Cribl CDN -Local path on the server -HTTP URL
391
When performing an upgrade, on-prem work must be upgraded first
False
392
(Select all that apply) UI upgrade allows for worker nodes to be
-Upgraded after the leader -Automatically upgraded -Upgraded by worker group
393
# Git Without Local Git
-Single instance deployment can run without Git -No change tracking or rollbacks -Mandatory on the leader node for distributed deployments
394
Local Git
-Track configuration changes -Compare configuration versions -Selective commits -Restore previous configuration version
395
# Git Things to keep in mind
-Make your repository private -Use .gitignore to exclude wht gets pushed to Git
396
Git Summary
Git -Single-instance is option -Distributed is mandatory -Diff/Commit/Undo/Rollback Setting up and using Git remote repository -Make your repository private -exclude large files
397
Example Workflow with GitOps
1: Make changes in the Development system UI 2: Commit and push changes to remote repository (dev branch) 3: When ready to push changes into Production, create Pull request to move changes from the dev branch to the production branch 4: Merge Pull Request 5: Send notification to Stream to "sync" changes
398
Setting up the Git Repo
-Follow instructions located at docs.cribl.io -Set up remote git repo as normal on dev -Push initial config from dev -Create dev and prod branches
399
Git Remote Respository Authentication
-Use secure protocols such as HTTPS or SSH -HTTPS using username/password authentication -SSH uses public/private keys -Ensure your user accounts are only scoped for least priviledge acces
400
Keys and Known hosts
-When using SSH, the private key is stored as $CRIBL_HOME/local/cribl/auth/ssh/git.key -SSH uses a known_hosts file located at /home/cribl/.ssh/known_hosts -Import server public keys using the following command (as the cribl user): ssh-keyscan -H >> ~/.ssh/known_hosts
401
Git SSL Certificate Validation
-Git will validate SSL certificates when using HTTPS transport -You should leave this validation enabled -Self-signed or internal PKI will result in validation failure -Import non-public CA signed certs for SSL validation
402
Scheduled Commit and Push
-Stream allows for automatic commits and push to remote repository on a scheduled basis -At a minimum you should set up automatic push -you can find this configuration under Leader>Git Settings> Scheduled actions
403
Excluding Files from the Git Repo
-Git can be problematic with large files -Disable tracking of large lookups by adding files to the .gitignore file in $CRIBL_HOME -Excluding SSL certificates managed by Stream may cause issues on workers -Only add exclusions below the CUSTOM SECTION header
404
Backing Up Everything
-Stream's remote Git push is not a replacement for a comprehensive server backup strategy -Items outside of $CRIBL_HOME are not tracked inside the Git repository -Sync files to an S3 bucket for example
405
Git Summary
-Use secure protocols for transport -Protect authentication keys and use least privileged access -Add certificates for SSL validation (if required) -Set up a scheduled push to the remote repository -Exclude large lookup files -Git is not a comprensive backup strategy for the Leader node
406
Stream Administrators should enable automatic GIT push on a scheduled basis
True
407
Secure protocols should be used when setting up the remote repository
True
408
Server validation is an important security measure and should be enabled
True
409
Stream Administrators should store large lookups inside their GIT repository
False
410
When using GIT SSH authentication, where does the known_hosts file reside?
$CRIBL_HOME/.ssh/known_hosts
411
GIT back ups all server files
False
412
GIT SSH keys or tokens should be able to access other respositories besides Stream
False
413
Top Support Challenges
1. Binding to a priviledge port 2. Too many open files 3. Out of memory 4. Cloning workers 5. resetting lost passwords 6. pipeline profiling
414
# Support Challenges Binding to a Privileged Port
-Stream should be running as a non root using -If Cribl Stream is required to listen on ports 1-1024, it will need privileged access. You can enable this on systemd by adding this configuration key to your override.conf file: AmbientCapabilities=CAP_NET_BIND_SERVICE
415
416
# Support Challenges Too many open files
EMFILE too many open files -When creating partitions avoid high cardinality fields in your expression Raise the number of files -For the following destinations, configure Max File options to avoid errors: Filesystem/NFS, Azure Blob, Google Cloud, Amazon S3 Increase Ulimit for Max Open Files (NOFILE) -Edit systemd file to contain a line similar to the one here: LimitNOFILE=20248
417
# Support Challenges Out of Memory
Out of Memory (OOM) errors are shown in the cribl_stderr.log file Lookups Aggregations
418
# Support Challenges Cloning Workers
Worker GUID -When you first install and run the software, Cribl Stream generate a GUID which it stores in a .dat file located in $CRIBL_HOME/local/cribl/auth -When deploying Cribl Stream as part of a host image or VM, be sure to remove this.dat file, so that you do not end up with duplicate GUIDs. Cribl Stream will regenerate the file on the next run
419
420
# Support Challenges Resetting Lost Password
Cribl.secret file is located in $CRIBL_HOME/local/cribl/auth.cribl.secret
421
422
# Support Challenges Pipeline Profiling
blah blah blah
423
# Support Challenegs Summary
Privileged Port Binding -lower level port privleges Too many open Files -high cardinality path naming Out of Memory -Aggregations overloading memory Cloning Workers -Removal of DAT file containing the GUID Lost Passwords -Plaintext Password replacement in Users.json Pipeline Profiling -Helps with troubleshooting pipeline related issues
424
(Select all that apply) What methods can be used to bind to privileged ports?
-Run as root (ONE IS WRONG) -IPtables (ONE IS WRONG) -Systemctl settings THIS IS CORRECT
425
The default memory allocation for each worker is set to what value?
2GB
425
426
Stream User Passwords are stored in what file on disk?
$CRIBL_HOME/local/auth/users.json
427
(Select all that apply) What things can cause a spike in the number of open files on a Stream Worker?
-High Cardinality Naming -High number of incoming connections -Large amount of persistent queuing
428
(Select all that apply) What features consume memory in Cribl Stream?
Lookups and Aggregations
429
Where does the Cribl Stream GUID live on a worker?
$CRIBL_HOME/local/cribl/auth/*.dat
430
(Select all that apply) What settin controls the max number of open file processes on a Linux system?
-/proc/sys/fs/file-max -systemd/system/cribl,service -/etc/sysctl.conf -/etc/security/limits.conf
431
You can clone workers but make sure to remove the .dat file located in $CRIBL_HOME/local/cribl/auth
True
432
(Select all that apply) What ports are considered privileged ports?
anything lower than 1024
433
434
Which of the following are responsibilities of a Worker?
-Run collector jobs -Receives data from sources -sends data to destinations NOT backs up to Git (local only)
435
Deploying a high performance single Stream instance is just as effective as using multiple workergroups.
436
What are Cribl.Cloud allowed ports?
20000-20010
437
A best practice when designing and planning is 200GB per vCPU @ 3GHz
True
438
Which will impact system CPU requirements? (Select 3)
-Persistant Queuing -Volume of data incoming? number of destinations??? I think this answer is wrong i think correct answer might be type of data processing required
439
Which two choices are valid for Cribl Stream in Cribl.Cloud?
-Distributed Stream instance with Leader in Cribl.Cloud and Workers on prem -Distributed Stream instance with Leader and workers in Cribl.Cloud
440
How many Worker Nodes, each with 32vCPU, is needed to ingest 25TB and send out 15TB?
7 Worker Nodes
441
Cribl Single Instance deployments supports which two of the following?
-Integration with Okta for Authentication -GitHub Integration
442
Which two protocols can be used for Worker Group to Worker Group communication? (Select 2)
Stream TCP and Stream HTTP
443
Which of the following are advantages of a distributed deployment over a single instance? (Select 2)
Higher reliability unlimited scalability
444
Filter Expressions can be used in Functions to determine if that Function should be executed
True
445
What is data being sent from Worker Group to Worker Group called?
Stream to Stream
446
What are two use cases for routing data to Object Storage?
-Reducing Analytics tool or SIEM spend -Replaying historical data for threat hunting exercise
447
The Leader Node is required to send scheduled jobs to the Worker Nodes
True
448
If you are using Elastic ingest Pipelines, specify an extra parameter whose name is Pipeline and whose value is:
the name of your pipeline
449
What does Stream's Elasticsearch destination support? (Select 2)
Splunk Logstash WRONG
450
When configuring Splunk HEC, what setting should be turned on if the user want acks returned to the endpoint that is sending data?
Splunk HEC TLS (WRONG ANSWER)
451
What is the most popular Elastic Beats used by Cribl Customers?
Filebeats and Winlogbeats
452
What port does Splunk typically set its HEC collector on?
8088
453
The user can install Packs using the CLI
True
454
When backpressure behavior is set to drop events, backpressure causes outgoing events to get blocked
455
When sending data to Elasticsearch Destination, Cribl recommends that _raw should be empty.
True
456
What are sources called that data is collected from intermittently, either ad hoc or on a preset schedule?
Collectors
457
What does capturing data within the Pipeline editor ensures?
Data is captured prior to sending to a destination
458
Why is JSON a preferred option for Nested Field Serialization in a Splunk Destination?
Easier to report in Splunk (WRONG ANSWER)
459
Which two are ideal use cases for an Output Router? (Select 2)
-Sending a full-fidelity copy of an event to S3 and a transformed copy of the event to Splunk -Sending a filter of events to a Splunk instance, and a filter of other events to an Elastic Instance ONE OF THESE IS WRONG
460
How can you monitor the health of your Cribl Instances? (select 2)
-Setup a notification when destinations are unhealthy -Poll the REST API to see if any pipelines are dropping events WRONG ANSWER
461
Which statement describes the discovery process for the S3 and file collectors?
Leader sends a request to the first available Worker node, Worker node sends a list of files back to the leader
462
If no data is reaching the destination, which two things should a user do first within Cribl or on the Cribl systems? (Select 2)
-Netcat or wget from a worker to destination -run a capture and select 'before destination' within Cribl
463
Any changes made to a Knowledge Object will be preserved when updating the Pack.
False
464
When writing data out to S3, which statement is true?
All files will remain open until timeout or max file size is reached
465
When tuning settings for an S3 destination to avoid any 'too many open files' errors, decrease the number of max open files
False
466
Which type of Encryption utilizes a public/private key pair?
Asymmetric
467
A self signed certificate has a higher level of trust than a public CA signed certificate
False
468
Audit.log contains changes to files
True
469
What log files are in Cribl Stream? (Select all that apply)
cribl.log notifications.log