Behind the Scenes Flashcards Preview

Udemy: Elastisearch Masterclass > Behind the Scenes > Flashcards

Flashcards in Behind the Scenes Deck (19):
1

Everything is in:

shards

2

Every shard is replicated by

Pairing the connected primary shard to the other's replica and stored in its node

3

What happens when a document is added to a cluster?

The document id is hashed and sent to any node, the node will based on the hash determine the node with the primary shard, send it there, then the primary node will add its replicates to all the other nodes.

4

What happens when querying for a document ID?

The id is hashed, sends to any node that will return the data

5

What is round robining?

Splitting client calls to different nodes instead of hammering one single node

6

What is a shard known as?

It's basically a lucene indice

7

What is elastisearch in construct of a lucene?

Elastisearch is distributed shards.

8

Tell me about shards

Each shard is a container of inverted indices stored in segments

9

Why does adding data take so long in elastisearch?

Creating the inverted index takes a long time

10

What is analysis?

The process of taking text and converting it into tokens and putting it into an inverted index. This then gets added to the buffer.

11

What is a buffer?

A temporary storage form of indexed data that will eventually form into an immutable index segment.

12

Things that happen in text analysis stage

Tokenization (breaking the sentences into words, keeps into account the original location)
Filters
Then these get indexed.

13

Text analysis filters

Remove stop words
Lowercasing
Stemming
Synonyms (skinny vs thin)

14

Settings for an index you can change

number_of_shards
number_of_replicas

15

To get an idea of how your document will tokenize

POST _analyze
{
"analyzer": "standard",
"text": "Your text here"
}

16

The 3 building blocks of an analyzer

character filters
tokenizers
token filters

17

Character filters

Receives original text and removes or converts them

18

Tokenizer

Breaks the string up into sections

19

Token filters

filters for strings eg lowercase, stop, synonym