Introduction to Big Data Mgnt Flashcards
(16 cards)
What is the primary challenge of Big Data that Veracity addresses?
Ensuring data authenticity and reliability
Veracity refers to the accuracy and trustworthiness of data.
Which of the following best describes ‘Variety’ in Big Data?
The different formats and types of data
Variety encompasses structured, semi-structured, and unstructured data.
Which of the following is an example of Big Data Analytics?
Predicting customer preferences using machine learning
This illustrates the application of advanced analytics techniques.
What framework is widely used for distributed storage and processing of Big Data?
Hadoop
Hadoop is an open-source framework that allows for scalable storage and processing.
How does ‘Smart Data’ differ from ‘Big Data’?
Big Data is raw and massive, while Smart Data is processed, meaningful, and actionable
Smart Data refers to data that has been analyzed and is ready for decision-making.
Define Big Data and explain how it differs from traditional data.
Big Data is large-scale, high-speed, and diverse data that requires advanced tools for storage, processing, and analytics. It differs from traditional data in its scale, complexity, and need for real-time analysis.
Traditional data often involves smaller datasets that can be processed with basic tools.
List the 5Vs of Big Data.
- Volume
- Velocity
- Variety
- Veracity
- Value
Each ‘V’ represents a crucial aspect of Big Data that needs to be managed.
Identify three sources of Big Data and explain their significance.
- Social media
- IoT devices
- E-commerce
These sources generate vast amounts of data that can be analyzed for insights and decision-making.
What is Big Data Management, and why is it important?
The process of formatting, storing, integrating, and retrieving Big Data efficiently. It ensures data is clean, structured, and ready for analysis.
Effective management is critical for making reliable business decisions.
Compare and contrast statistical analysis with data science.
- Statistical Analysis: Based on small sample data, focuses on inference
- Data Science: Uses large datasets, employs machine learning to find patterns and predictions
Data science encompasses a broader range of techniques and tools compared to traditional statistics.
Discuss the role of Big Data Analytics in business.
Examples include customer retention and personalization, fraud detection, and marketing optimization.
Companies like Netflix and Amazon utilize analytics to enhance user experiences and drive sales.
Explain the Data Science Process and its four key stages.
- Data Engineering
- Data Analysis
- Predictive Modeling
- Deployment
These stages outline the workflow from data collection to actionable insights.
Compare Machine Learning and Traditional Statistics.
- Machine Learning: Focuses on pattern recognition, handles large data, computationally expensive
- Statistics: Focuses on inference, smaller datasets, mathematically principled
Machine learning is often preferred for complex and high-dimensional data.
Fill in the blank: The Big Data Pipeline consists of _______.
Data Collection → Storage (Data Lakes, Hadoop) → Processing (Spark, SQL) → Analytics (ML, AI) → Actionable Insights
This pipeline illustrates the flow of data from generation to actionable insights.
What are the key differences between Machine Learning and Statistics?
- Machine Learning: Focus on predictions, handles large data
- Statistics: Focus on inference, smaller datasets
Machine learning techniques are increasingly applied in various fields, including finance and healthcare.
What study tips are suggested for achieving a 90%+ score?
- Revise the 5Vs, Big Data Pipeline, and ML vs. Statistics
- Practice short-answer questions and diagrams for quick recall
- Apply real-world examples like Netflix, Amazon, Healthcare AI
- Use flashcards for memorization
- Get hands-on with Hadoop, Spark, Python, SQL
- Understand AI and ML applications in industry
These strategies enhance retention and understanding of complex concepts.