Introduction Flashcards
(26 cards)
What launch the Big Data era ?
The combination of growing data and on demand cloud computing
What steps are required for processing unstructured data ?
Data Acquisition, Storage, Retrieval, Cleaning, Processing
What are the 4 technologies helping handle unstructured data ?
Hadoop, Storm, Spark, NoSQL
What is Hadoop ?
Open source framework designed to handle big amount of unstructured data
What are Storm and Spark ?
Frameworks for real time processing of a big amount of data
What is Neo4J ?
A graph database
What is Cassandra ?
A Key-Value Pairs database
Where does the value of Big Data come ?
Value comes from integrating different types of data source and analysing them at scale
What are 3 advantages of integrating data sources leading to an increased data collaboration ?
It reduces complexity, it increases data availability, it unifies the data systems
What are the 5 Vs - characteristics of Big Data ?
The 5 Vs are Volume, Variety, Velocity, Veracity, Valence
What could be the 6th V completing the 5 characteristics of Big Data coined by Doug Laney of Gartner?
It could be Value.
What are the original 3 Vs - Characteristics of Big Data ?
The 3 first V’s are Volume, Variety, Velocity
What does Valence refer to ?
This refers to how big data can be bond with each other, forming connections between otherwise disparate datasets. It also refers to the connectiveness of big data in the form of Graphs.
What does Volume refer to ?
This refers to the vast amounts of data that is generated every second/minute/hour/day in our digitized world. Dimension of Big Data related to its size and its exponential growth.
What does Variety refers to ?
This refers to the ever-increasing different forms that data can come in, e.g., text, images, voice, geospatial. The variety refer to the additional complexity related to different kinds of data that needed to store, combine and process
What does Velocity refer to ?
This refers to the speed at which data is being generated and the pace at which data moves from one point to the next.
What does Veracity refer to ?
This refers to the quality of the data, which can vary greatly. It sometimes gets referred to as validity or volatility referring to the lifetime of the data.
What does Value refer to ?
Processing big data must bring value from insights gained to support decision-making.
What are the challenges related to the Volume ?
The challenges of the Volume include the costs, scalability and performance related to there storage, access and processing
What should be considered to assess a situtation ?
Risks, Benefits, Contigencies, Regulations, Resources, Requirements
How can you define goals ?
Define objectives and success criteria
What are the 5 P’s ?
Purpose, People, Process, Platforms, Programmability
What are the 5 steps in the data science process?
Acquire, prepare, analyse, report, act
Give 5 graph types for visualizing data
Heat map, histogramm, Boxplot, line graphs, scatter plots