DATA Flashcards
(25 cards)
DATA ARCHITECTURE
Standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use in data
systems and in organizations.
data scientist ( 🧐 to find insight and deals with…)
- perform an exploratory analysis to discover insights from the data. Deals with an enormous mass of structure/unstructured data and use their skills in math, statistics, programming, machine learning, etc.
data engineers 🏛
Develops, constructs, tests & maintains the complete architecture of large-scale processing systems.
data analyst
Takes data and uses it to help companies make better business decision: - Analyze and translate to the “English language.” - This data is used by upper management to make business decisions.
DATA LAKE:
Is a storage that holds a vast amount of raw data in its natural form until it is needed.
DATA PROCESSING: (us des)
Is the conversion of data into a usable and desired form
APACHE Hadoop:
Is an open-source framework that is used to efficiently store and process large datasets
APACHE Spark:
Is a data processing framework that can quickly perform processing tasks on very large data sets
and can also distribute data processing tasks across multiple computers.
APACHE Hive:
Is an open-source data warehouse software for reading, writing and managing large data set files
that are stored directly in either the Apache Hadoop Distributed File System (HDFS) or other data storage systems
such as Apache HBase.
SQL:
Is a domain-specific language used in programming and designed for managing data held in a relational
database management system, or for stream processing in a relational data stream management system
NoSQL
NoSQL databases (aka "not only SQL") store data differently than relational tables . They provide flexible schemas and scale easily with large amounts of data and high user loads.
DATA WAREHOUSE:
A Data Warehousing (DW) is process for collecting and managing data from varied sources to
provide meaningful business insights.
SOCIAL DATA
IS THE INFORMATION ABOUT YOU, SUCH AS YOUR
MOVEMENTS, BEHAVIOR, AND INTEREST, AS WELL
AS INFORMATION ABOUT YOUR RELATIONSHIPS
WITH OTHER PEOPLE, PLACES, PRODUCTS, EVEN
IDEOLOGIES.
BIG DATA is a phrase used that means
massive volume of both structured and
unstructured data that is so large it is difficult to process using
traditional database and software techniques.
BIG DATA has the potential to help
companies improve operations and make
faster, more intelligent decisions.
DATA SCIENCE IS A (🔨 💹 🥅🏘 in data)
blend of various tools, algorithms, and machine
learning principles with the goal to discover hidden patterns
from the raw data.
DATA ANALYTICS IS THE SCIENCE OF
examining
raw data with the purpose of drawing
conclusions about that information.
BIG DATA-> 4V
Volumen
Velocidad
Variedad
Veracidad
BIG DATA PROFESSIONAL (🃏w/ , from, at high )
Dealing with huge amount of heterogeneous data, which is gathered from various sources coming in at a high velocity.
TYPES OF DATA
STRUCTURED UNSTRUCTURED QUALITATIVE AND QUANTITATIVE
QUALITATIVE DATA
Qualitative data is descriptive and conceptual. Qualitative data can be categorized based on
traits and characteristics.
✓ Is non-statistical and is typically unstructured or semi-structured in nature.
QUANTITATIVE DATA
can be counted, measured, and expressed using numbers.
STRUCTURED DATA
is highly-organized and formatted in a way so it’s easily searchable in
relational databases.
UNSTRUCTURED DATA
has no pre-defined format or organization, making it much more
difficult to collect, process, and analyze.
UNSTRUCTURED
• Is most often categorized as qualitative data, and it cannot be processed and
analyzed using conventional tools and methods. (VIDEO AUDIO, MOBILE ACTIVITY ETC)