Introduction to Big Data Mgnt Flashcards

(16 cards)

1
Q

What is the primary challenge of Big Data that Veracity addresses?

A

Ensuring data authenticity and reliability

Veracity refers to the accuracy and trustworthiness of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following best describes ‘Variety’ in Big Data?

A

The different formats and types of data

Variety encompasses structured, semi-structured, and unstructured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which of the following is an example of Big Data Analytics?

A

Predicting customer preferences using machine learning

This illustrates the application of advanced analytics techniques.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What framework is widely used for distributed storage and processing of Big Data?

A

Hadoop

Hadoop is an open-source framework that allows for scalable storage and processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does ‘Smart Data’ differ from ‘Big Data’?

A

Big Data is raw and massive, while Smart Data is processed, meaningful, and actionable

Smart Data refers to data that has been analyzed and is ready for decision-making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define Big Data and explain how it differs from traditional data.

A

Big Data is large-scale, high-speed, and diverse data that requires advanced tools for storage, processing, and analytics. It differs from traditional data in its scale, complexity, and need for real-time analysis.

Traditional data often involves smaller datasets that can be processed with basic tools.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

List the 5Vs of Big Data.

A
  • Volume
  • Velocity
  • Variety
  • Veracity
  • Value

Each ‘V’ represents a crucial aspect of Big Data that needs to be managed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Identify three sources of Big Data and explain their significance.

A
  • Social media
  • IoT devices
  • E-commerce

These sources generate vast amounts of data that can be analyzed for insights and decision-making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Big Data Management, and why is it important?

A

The process of formatting, storing, integrating, and retrieving Big Data efficiently. It ensures data is clean, structured, and ready for analysis.

Effective management is critical for making reliable business decisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Compare and contrast statistical analysis with data science.

A
  • Statistical Analysis: Based on small sample data, focuses on inference
  • Data Science: Uses large datasets, employs machine learning to find patterns and predictions

Data science encompasses a broader range of techniques and tools compared to traditional statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Discuss the role of Big Data Analytics in business.

A

Examples include customer retention and personalization, fraud detection, and marketing optimization.

Companies like Netflix and Amazon utilize analytics to enhance user experiences and drive sales.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain the Data Science Process and its four key stages.

A
  • Data Engineering
  • Data Analysis
  • Predictive Modeling
  • Deployment

These stages outline the workflow from data collection to actionable insights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Compare Machine Learning and Traditional Statistics.

A
  • Machine Learning: Focuses on pattern recognition, handles large data, computationally expensive
  • Statistics: Focuses on inference, smaller datasets, mathematically principled

Machine learning is often preferred for complex and high-dimensional data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Fill in the blank: The Big Data Pipeline consists of _______.

A

Data Collection → Storage (Data Lakes, Hadoop) → Processing (Spark, SQL) → Analytics (ML, AI) → Actionable Insights

This pipeline illustrates the flow of data from generation to actionable insights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the key differences between Machine Learning and Statistics?

A
  • Machine Learning: Focus on predictions, handles large data
  • Statistics: Focus on inference, smaller datasets

Machine learning techniques are increasingly applied in various fields, including finance and healthcare.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What study tips are suggested for achieving a 90%+ score?

A
  • Revise the 5Vs, Big Data Pipeline, and ML vs. Statistics
  • Practice short-answer questions and diagrams for quick recall
  • Apply real-world examples like Netflix, Amazon, Healthcare AI
  • Use flashcards for memorization
  • Get hands-on with Hadoop, Spark, Python, SQL
  • Understand AI and ML applications in industry

These strategies enhance retention and understanding of complex concepts.