spark1 Flashcards
(42 cards)
Spark is best suited for_____data.
* Real time
* virtual
* structured
* All of the above
All of the above
Which of the following is a feature of Apache Spark?
* Speeds
* Supports multiple languages
* Advanced Analytics
* All of the above
All of the above
What does Spark Engine do?
* Scheduling
* Distributing data across cluster
* Monitoring data across cluster
* all of the above
all of the above
RDD can NOT be created from data stored on?
* LocalFS
* Oracle
* S3
* HDFS
Oracle
For resource management spark can use?
* Yam
* Mesos
* Standalone cluster manager
* All of the above
All of the above
Fault Tolerance in RDD is achieved using?
* Immutable nature of RDD
* DAG(Directed Acyclic Graph) or Data Lineage
* Both A&B
* Neither A nor B
Both A&B
What is transformation in Spark RDD?
* Takes RDD as input and produces one or more RDD as output
* Return final results of RDD computations
* The way to sent results from executors to the driver
* None of the above
Takes RDD as input and produces one or more RDD as output
Which of the following is a feature of Spark RDD?
* In-memory computation
* Lazy evaluations
* Fault Tolerance
* All of the mentioned
All of the mentioned
Four main component built in top of spark core
- Spark ML
- Spark SQL
- Spark streaming
- Spark GraphX
Describe Spark ML
Spark ML provides simple APIs for execute the functions (classifications , clustering , regression) and creating execution pipelines
Describe spark SQL
spark module for working with structured data
Describe spark streaming
large-scale near-real-time stream processing framework
Describe spark GraphX
spark API for graphs-parallel computation, include
-growing collection of graph algorithms
-builders to simplify graph analytics tasks
features of HIVE
good abstraction
declarative language
less error prone
easier to learn & analyze
compile to java map-reduce code
Four key component at Hive architecture
meta store
thrift server
driver
Hive QL
Hive CLI
Different mode of execution in Apache pig
4 Pig vs sql
3 Key component of HBase
HBase RegionServer
HBase Master
ZooKeeper
6 HBase vs DBMS
Role of the zookeeper in HBase architecture
managing the state and configuration of the HBase cluster, providing distributed coordination, leader election, and synchronization and locking services. !!!!!!!!!!!!!!!!!!!!!!
4 How Zookeeper achieves constantly, and how it achieves performance
__ is a distributed graph processing framework on top of Spark.
* MLlib
* Spark streaming
* GrapghX
* None of the above
GrapghX
Spark is 100x faster than MapReduce due to?
* In-memory computing
* Development in scala
* Stream processing
* Spark SQL
In-memory computing
creating RDD
load from external RDD
create RDD from another RDD
parallelizing a centralized collection