geelecds_notes Flashcards
What is Data Science?
A multi-disciplinary field that uses scientific methods, algorithms, and systems to extract knowledge and insights from data.
Data Science involves data gathering, analysis, and decision-making, impacting various daily activities.
What are the core elements of Data Science?
- Computational - Algorithmic methods and code.
- Statistical - Statistical inference for predictions.
- Real-world Problems - Solving actual world issues, not theoretical models.
What is the equation that represents the components of Data Science?
Data Science = Statistics + Data Collection + Data Preprocessing + Machine Learning + Visualization + Business Insights + Scientific Hypotheses + Big Data.
In which industries is Data Science commonly used?
- Banking
- Consultancy
- Healthcare
- Manufacturing.
What are some applications of Data Science in transport?
- Route planning
- Predictive analysis for delays
- Driverless cars.
What skills are required for a Data Scientist?
- Machine Learning
- Statistics
- Programming (Python or R)
- Mathematics
- Databases.
Fill in the blank: Data Science helps companies make _______.
[better decisions, predictive analysis, pattern discoveries].
What was one of the key milestones in the development of Data Science in 1962?
John Tukey’s influential paper, ‘The Future of Data Analysis,’ which shifted the focus to a more exploratory approach in data analysis.
What is Apache Spark?
An open-source data processing and analytics engine capable of handling large datasets, known for its fast data processing.
Originally developed as a faster alternative to MapReduce for Hadoop clusters.
What is D3.js used for?
Creating custom data visualizations in the web browser using web standards like HTML, SVG, and CSS.
What is IBM SPSS?
A suite of software tools for managing and analyzing complex statistical data.
Components include SPSS Statistics for statistical analysis and SPSS Modeler for predictive analytics.
What are the key features of the Julia programming language?
- High-performance for numerical computing
- Combines simplicity with C/Java-like performance
- Supports multiple dispatch for fast execution.
What is the purpose of Jupyter Notebook?
An open-source web-based application for interactive collaboration among data scientists, supporting multiple programming languages.
What is Keras?
A high-level deep learning API designed for easy experimentation with neural networks.
What is the Matlab programming language used for?
Numerical computing and data visualization, supporting machine learning and predictive modeling.
What does Matplotlib do?
A Python plotting library for creating static, animated, and interactive visualizations.
What was the significance of the term ‘data scientist’ in 2008?
It became a buzzword popularized by DJ Patil and Jeff Hammerbacher of LinkedIn and Facebook.
True or False: Data Science is confined to one discipline.
False.
What has been a recent trend in Data Science programming?
A shift toward conservative programming with a focus on simpler, less risky algorithms.
What is a major use of Data Science in healthcare?
- Detecting tumors
- Drug discoveries
- Medical image analysis.
Fill in the blank: Data Science is integral to business and academic research, encompassing areas like _______.
[machine translation, robotics, speech recognition, digital economy].
What does predictive modeling in finance allow companies to do?
Predict customer lifetime value and stock market moves.
What is the purpose of the autocomplete feature in Data Science?
To complete user input based on previously typed text.
What is the significance of the Knowledge Discovery in Databases workshop started in 1989?
It played a key role in the evolution of data science.