ElasticSearch Flashcards

1
Q

Elasticsearch is an open-source _______ built on top of _________

A

search engine; Apache Lucene

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Apache Lucene

A

Apache Lucene is a fulltext search-engine library

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does REST stand for?

A

representational state transfer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is RESTful service?

A

A RESTful service is one that implements REST pattern.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is REST?

A

REST, or REpresentational State Transfer, is an architectural style for providing standards between computer systems on the web, making it easier for systems to communicate with each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

There are 4 basic HTTP verbs we use in requests to interact with resources in a REST system:

A
  1. GET — retrieve a specific resource (by id) or a collection of resources
  2. POST — create a new resource
  3. PUT — update a specific resource (by id)
  4. DELETE — remove a specific resource by id
  5. For ES - HEAD
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is ElasticSearch

A
  1. Enables full-text search
  2. A distributed real-time document store where every field is indexed and searchable
  3. A distributed search engine with real-time analytics
  4. Capable of scaling to hundreds of servers and petabytes of structured and
    unstructured data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a node?

A

A node is a running instance of Elasticsearch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a cluster

A

A cluster is a group
of nodes with the same cluster.name that are working together
to share data and to provide failover and scale, although a single
node can form a cluster all by itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

All other languages can communicate with Elasticsearch over port _______ using a
_______, accessible with your favorite web client.

A

9200 ; RESTful API

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A request to Elasticsearch consists of the same parts as any HTTP request:

A

curl -X ‘:///?’ -d ‘’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a protocol

A

Either http or https

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is HOST

A

The hostname of any node in your Elasticsearch cluster, or localhost for a node on your local machine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is PORT

A

The port running the Elasticsearch HTTP service, which defaults to 9200.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is QUERY_STRING

A

Any optional query-string parameters (for example ?pretty will pretty-print the
JSON response to make it easier to read.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is BODY

A

A JSON-encoded request body (if the request needs one.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

For instance, to count the number of documents in the cluster, we could use

A
curl -XGET 'http://localhost:9200/_count?pretty' -d '
{
 "query": {
 "match_all": {}
 }
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

shorthand format:

A
GET /_count
{
 "query": {
 "match_all": {}
 }
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Elasticsearch is _______- oriented, meaning that it stores entire ___________

A

document; objects or documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How does elastic search makes a document searchable?

A

It indexes the contents of the documents to make it searchable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

In Elasticsearch, you index, search, sort, and filter ________; not __________
This is a fundamentally different way of thinking about
data and is one of the reasons Elasticsearch can perform complex _________

A

documents; rows of columnar data; full-text search.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Relational DB ⇒ Databases ⇒ Tables ⇒ Rows ⇒ Columns

A

ElasticSearch => Indices => Types => Documents => Fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is an Index?

A

Index is like a databases. A place to store the documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is indexing (verb)

A

To index a document in a index(noun) so it can be retrieved or queried

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Insert a document
``` PUT /meijer/households/1 { "primary_tender" : "12312321" "primary_customer": "23123213" } ```
26
How to retrieve a document?
GET /meijer/households/1
27
Simplest search?
GET /meijer/households/_search //retrieves everything
28
GET /meijer/households/_search displays how many results.
By default it retrieve the top 10 results.
29
Lightweight search to get the customer = 1234
GET /meijer/households/_search?q=primary_customer:1234
30
ElasticSearch DSL search to get the customer = 1234
``` GET /meijer/households/_search { "query": { "match" : { "primary_customer":"1234" } } } ```
31
ElasticSearch DSL search history to get the customer = 1234 and dm_flag = true
``` GET /meijer/households/_search { "query": { "filtered": { "filter": { "range":{ "age":{ "gt":30 } } }, "match": { "customer":"1234" } } } } ```
32
What is relevance score?
How well the document matches the query
33
By default, Elasticsearch sorts matching results by their _________
relevance score
34
Elasticsearch vs RDMS
Relevance score is the major difference. In RDBMS the term either matches or not
35
"Complete phrase" search
``` GET /meijer/households/_search { "query": { "match_phrase": { "hobby":"rock climbing" } } } ```
36
Highlight searches/search phrases as google
``` GET /meijer/households/_search { "query":{ "match_phrase":{ "hobby":"rock climbing" } }, "highlight":{ "fields":{ "hobby":{} } } } ``` ``` OP { ... "hits": { "total": 1, "max_score": 0.23013961, "hits": [ { ... "_score": 0.23013961, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] }, "highlight": { "about": [ "I love to go rock climbing" ] } } ] } } ```
37
Elasticsearch has functionality called _______, which allow you to generate sophisticated analytics over your data.
aggregations
38
One node in the cluster is elected to be the ______ node, which is in charge of managing cluster-wide changes
master
39
Master node is responsible for
1. Creating and deleting indices | 2. Adding and remove nodes from the cluster.
40
Master node doesn't need to be involved in __________
Document-level searches or changes.
41
Users can talk to ______ node
Any node, including master node.
42
________ node knows where the document lives
Every node
43
__________ node is responsible for gathering the response from node or nodes holding the data and returning the final response to the client.
The node the user is talking to..which can be any node including the master node.
44
Get cluster health
GET /_cluster/health
45
Status colors
Green, Yellow, Red
46
Green status
Primary and replica shards are active
47
Yellow status
All primary shards are active but not all replica shards are not active
48
Red Status
Not all primary shards are active.
49
An index is a ___________ namespace that points to one or more ___________
Logical; Physical shards
50
What is a shard?
A shard is a low level worker that holds just a slice of all data that is stored in an index.
51
A shard is a single instance of ________
Lucene
52
Our documents are stored and indexed in _______
Shards.
53
Applications don't talk to _______ but talk to _______
Shards; Index
54
How is an ElasticSearch cluster balanced?
As the data grows or shrinks, the shards are moved across the nodes to maintain the balance.
55
A shard is either a __________ shard or _________ shard
Primary or replica
56
Each document in your index belongs to a ___________
Primary shard.
57
__________ determines the maximum amount of data your index (limited by hw constraints) can hold.
Number of primary shards.
58
What is a replica shard?
A replica shard is a copy of the primary shard.
59
When is the number of primary shards fixed?
The number of primary shards are fixed when the index is created. The number of replica shards can change at any time.
60
By default indices are assigned __________ shards
5
61
Query to assign three primary shards
``` PUT /shard_setting { "settings":{ "number_of_shards": 3 "number_of_replicas" : 1 } } ```
62
Change the number of replicas for a index
PUT /shard_setting/_settings { "number_of_replicas": 2 }
63
Which data in elasticsearch is indexed by default?
All data in a document is indexed by default
64
What metadata does a document consist of
1. Index 2. Type 3. ID
65
Index naming constraints
1. Lowercase 2. cannot begin with underscore 3. Cannot contain commas
66
Type naming constraints
1. Lower case or uppercase 2. cannot begin with underscore 3. Cannot contain commas
67
___________ uniquely identifies a document.
ID when combined with index and type
68
Documents are indexed—stored and made searchable—by using the ________.
index API
69
Every time a change is made | to a document (including deleting it), the________ is incremented.
_version number
70
Pretty pull
GET /meijer/households/123?pretty
71
How to get the response code 404 or 200 OK
``` By passing -i in curl command curl -i -XGET http://localhost:9200/meijer/households/1234?pretty OP HTTP/1.1 404 Not Found Content-Type: application/json; charset=UTF-8 Content-Length: 83 { "_index" : "website", "_type" : "blog", "_id" : "124", "found" : false } ```
72
Retrieving Part of a Document
``` GET /meijer/households/1234?_source = title,text OP { "_index" : "website", "_type" : "blog", "_id" : "123", "_version" : 1, "exists" : true, "_source" : { "title": "My first blog entry" , "text": "Just trying this out..." } } ```
73
Get only data without any metadata
GET /meijer/households/1234/_source
74
Checking Whether a Document Exists
curl -i -XHEAD https://localhost:9200/meijer/households/1234 OP HTTP/1.1 200 OK Content-Type: text/plain; charset=UTF-8 Content-Length: 0
75
Documents in Elasticsearch are ______-- (mutable/immutable)
immutable; if we | need to update an existing document, we reindex or replace it
76
Create only if it doesn't exist
PUT /meijer/households/123?op_type=create or PUT /meijer/households/123/_create
77
Deleting a Document
DELETE /meijer/households/123
78
ElasticSearch uses _________ concurrency control
Optimistic Concurrency control
79
What is optimistic concurrency control
Used by Elasticsearch, this approach assumes that conflicts are unlikely to happen and doesn’t block operations from being attempted. However, if the underlying data has been modified between reading and writing, the update will fail. It is then up to the application to decide how it should resolve the conflict. For instance, it could reattempt the update, using the fresh data, or it could report the situation to the user
80
n the cluster. Elasticsearch is also asynchronous and concurrent, meaning that these
replication | requests are sent in parallel, and may arrive at their destination out of sequence
81
We can take advantage of the ______ to ensure that conflicting changes made by our application do not result in data loss.
_version number; We want this update to succeed only if the current _version of this document in our index is version 1.
82
We want this update to succeed only if the current _version of this document in our index is version 1.
PUT /meijer/households/1234?version = 1
83
Partial Updates to Documents
``` POST /meijer/households/1234/_update { "doc":{ "dm_flag" = "true" "mp_dlag" = ["false"] } } ```
84
Retrieving Multiple Documents
Multi get API, mget
85
mget example
``` GET /_mget { "docs" :[ { "_index":"meijer" "_type": "households" "_id": "1234" }, { "_index" : "Rayleys" "type":"offers" "id": ["1","2","3","4"] } ] } ```