Flashcards in Behind the Scenes Deck (19):
Everything is in:
Every shard is replicated by
Pairing the connected primary shard to the other's replica and stored in its node
What happens when a document is added to a cluster?
The document id is hashed and sent to any node, the node will based on the hash determine the node with the primary shard, send it there, then the primary node will add its replicates to all the other nodes.
What happens when querying for a document ID?
The id is hashed, sends to any node that will return the data
What is round robining?
Splitting client calls to different nodes instead of hammering one single node
What is a shard known as?
It's basically a lucene indice
What is elastisearch in construct of a lucene?
Elastisearch is distributed shards.
Tell me about shards
Each shard is a container of inverted indices stored in segments
Why does adding data take so long in elastisearch?
Creating the inverted index takes a long time
What is analysis?
The process of taking text and converting it into tokens and putting it into an inverted index. This then gets added to the buffer.
What is a buffer?
A temporary storage form of indexed data that will eventually form into an immutable index segment.
Things that happen in text analysis stage
Tokenization (breaking the sentences into words, keeps into account the original location)
Then these get indexed.
Text analysis filters
Remove stop words
Synonyms (skinny vs thin)
Settings for an index you can change
To get an idea of how your document will tokenize
"text": "Your text here"
The 3 building blocks of an analyzer
Receives original text and removes or converts them
Breaks the string up into sections