C1 : What is Data Science? Flashcards
Understand Introductory concepts.
What is MLOps?
- Machine learning operations.
- Tools that provide ongoing monitoring of models and automated retraining of drifted models.
What is a Algorithm?
A set of step-by-step instructions to solve a problem or complete a task.
What is a Model?
A representation of the relationships and patterns found in data.
* They are useful for making predictions or when analyzing complex systems.
* They retain the essential elements of the data needed for analysis.
What’s an Outlier?
A data point that differs significantly from other observations.
Potentially indicating anomalies, errors, or unique phenomena that could impact statistical analysis or modeling.
What is Structured Data?
Data is organized and formatted into a predictable schema, usually related tables with rows and columns.
What is Unstructured Data?
- Unorganized data that lacks a predefined data model.
- Which are harder to analyze using traditional methods.
- This data type often includes text, images, videos, and other content that doesn’t fit neatly into rows and columns like structured data.
What does .CSV stand for?
Comma seperated values.
What does .XLSX stand for?
Microsoft Excel Open XML Spreadsheet.
What does .XML stand for?
Extennsible Markup Language.
What does .PDF stand for?
Portable document format. (Adobe)
What does .JSON stand for?
JavaScript Object Notation.
What does .TSV stand for?
Tab Seperated Values.
What are some of the benfits of .JSON file format?
- Language-independent data format.
- Is considered as one of the best tools for sharing data of any size and type, even audio and video.
What are some of the benifits of .XLSX file format?
- XLSX uses the open file format.
- It can use and save all functions available in Excel.
- Is known to be one of the more secure file formats as it cannot save malicious code.
What are some of the benifits of the .XML file format?
- Readable by humans and machines.
- It is a self-descriptive language.
- Does not use predefined tags like .HTML does. * XML is platform independent.
What is a Data Visualization?
A visual way of representing data and it’s trends that is easily comprehensible.
What defines a Delimited Text File?
It is a plain text file where a specific character separates the data values.
What is Hadoop?
An open-source framework designed to store and process large datasets across clusters of computers.
What are Jupyter Notebooks?
An IDE and type of computational notebook that allows reserchers create to share code, equations, visualizations, and explanatory text.
(AKA, Python notebooks.)
What is the Nearest Neighbor algorithm?
An algorithm that uses proximity to make classifications or predictions about how to group an individual data point.
aka., KNN or k-NN.
What is a Neural Network?
A computational model used in deep learning that mimics the structure and functioning of the human brain’s neural pathways. It takes an input, processes it using previous learning, and produces an output.
What is Pandas?
- An open-source Python library that provides tools for working with structured data.
- It is often used for data manipulation and analysis.
What is R?
An open-source programming language used for statistical computing, data analysis, and data visualization.
What is a recommendatoin engine?
A computer program that analyzes user input, such as behaviors or preferences, and makes personalized recommendations based on that analysis.