data science Flashcards

(246 cards)

1
Q

What is Statistics?

A

The science of conducting studies to collect, organize, summarize, analyze, and draw a conclusion out of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two main types of Statistics?

A
  • Descriptive
  • Inferential
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does Descriptive Statistics do?

A

Helps organize and summarize data using numbers and graphs to look for a pattern in the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the Measures of Central Tendency?

A
  • Mean
  • Median
  • Mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Measure of Variability?

A
  • Standard Deviation
  • Variance
  • Range
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Inferential Statistics?

A

To make an inference or draw a conclusion from the population using sample data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a Population in statistics?

A

A collection, or set, of individuals or objects or events whose properties are to be analyzed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a Sample?

A

A subset of the population that should be representative of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a Variable?

A

A characteristic of each element of a population or sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does Data (singular) refer to?

A

The value of the variable associated with one element of a population or sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does Data (plural) refer to?

A

The set of values collected for the variable from each of the elements belonging to the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an Experiment in statistics?

A

A planned activity whose results yield a collection of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a Parameter?

A

A numerical value summarizing all the data of an entire population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a Statistic?

A

A numerical value summarizing the sample data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does Descriptive Statistics involve?

A

Organizing, summarizing, and presenting data in an informative way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the two types of data?

A
  • Categorical
  • Numerical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does Categorical Data represent?

A

Groups or categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does Numerical Data represent?

A

Numbers, divided into Discrete and Continuous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Discrete Data?

A

Data that can be usually counted finitely.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is Continuous Data?

A

Data that is infinite and impossible to count.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the two qualitative levels of measurement?

A
  • Nominal
  • Ordinal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is Nominal data?

A

Categories that cannot be put in any order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Ordinal data?

A

Categories that can be ordered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the two quantitative levels of measurement?

A
  • Interval
  • Ratio
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What distinguishes Ratio from Interval data?
Ratios have a true zero, while intervals do not.
26
What is the formula for Mean?
x̄ = (Σx) / n
27
How do you calculate the Median?
Middle value if n is odd, midpoint between two middle values if n is even.
28
What is the Mode?
The most commonly occurring value in a data set.
29
What is Range?
The difference between the smallest and largest observations in the sample.
30
What is Variance?
A measure of dispersion in a data set, calculated as the average of squared deviations from the mean.
31
What is Standard Deviation?
The square root of the variance, indicating concentration of data around the mean.
32
What does a small Standard Deviation indicate?
The data has little spread, with most points falling near the mean.
33
What is a Random Variable?
A variable whose possible values are outcomes of a random phenomenon.
34
What are Discrete Random Variables?
Variables that may take on only a countable number of distinct values.
35
What are Continuous Random Variables?
Variables that take an infinite number of possible values, usually measurements.
36
What is the definition of a probability pi in the context of random variables?
P(X = xi) = pi. ## Footnote The probabilities must satisfy: 0 <= pi <= 1 for each i and p1 + p2 + ... + pk = 1.
37
What is a continuous random variable?
A variable that takes an infinite number of possible values, often represented by measurements. ## Footnote Examples include height, weight, and amounts.
38
How is a set defined?
A well-defined collection of objects.
39
What is a null set?
A set that contains zero elements.
40
What is a subset?
A set A is a subset of set B if every element of A belongs to B.
41
How is the union of two sets defined?
The set of elements that are present in one or both sets.
42
What is the intersection of two sets?
The common elements that are present in both sets.
43
What does the complement of a set represent?
The set of all elements not in the event.
44
What is skewness?
A measure of the dataset’s symmetry.
45
What is the skewness of a perfectly symmetrical dataset?
Zero.
46
What indicates positive skewness in a dataset?
Sabove is larger than Sbelow, resulting in a longer right-hand tail.
47
What indicates negative skewness in a dataset?
Sabove is smaller than Sbelow, resulting in a longer left-hand tail.
48
When is skewness considered fairly symmetrical?
If skewness is between -0.5 and 0.5.
49
What is the expected value of a constant?
E(C) = C.
50
What is a binomial distribution?
Used when there are two outcomes labeled as 'Success' and 'Failure'.
51
What parameters define a binomial distribution?
p (probability of success) and n (number of trials).
52
What is the formula for binomial probability?
b(x; n, P) = nCx * Px * (1 - P)n - x.
53
What is a probability density function?
A mathematical function that calculates the probability of different outcomes.
54
What is normal distribution?
One of the most common continuous probability distributions.
55
What happens to the skewness if Sabove equals Sbelow?
The skewness will be zero, indicating symmetry.
56
Fill in the blank: A set that contains elements written in brackets is defined as a _______.
[set].
57
True or False: Two sets are equal only if they have identical elements.
True.
58
What is the complement of set X = {2, 3, 4} if the sample space is {0, 2, 3, 4, 8, 9}?
X' = {0, 8, 9}.
59
What does a positive skewness indicate about a dataset?
The right-hand tail is longer than the left-hand tail.
60
What is the probability of getting a head or a tail when tossing a fair coin?
50% or 0.5.
61
What is the probability that four patients will recover if the probability of recovery is 25%?
Calculated using binomial distribution.
62
What is the probability of no successes in n trials of a binomial experiment?
P(X=0) = (1 - p)^n.
63
What is the probability of getting 5 heads in a binomial distribution?
0.2035 ## Footnote This value is calculated from a simulation of 10,000 runs.
64
What is Normal Distribution also known as?
Gaussian Distribution ## Footnote It is one of the most common continuous probability distributions.
65
What are the characteristics of Normal Distribution?
Symmetric, mean = median = mode ## Footnote It is represented as N~(μ, σ²).
66
What does the empirical rule state about Normal Distribution?
68%, 95%, 99.7% ## Footnote These percentages represent data falling within 1, 2, and 3 standard deviations from the mean respectively.
67
What is the formula for the empirical rule for 1 standard deviation?
μ ± 1σ ## Footnote This indicates that approximately 68% of the data falls within this range.
68
What is the process of transforming a distribution to have a mean of 0 and standard deviation of 1 called?
Standardization ## Footnote It results in a Standard Normal Distribution denoted as N(0, 1).
69
What is the formula to calculate the z-score?
z = (x - μ) / σ ## Footnote This formula helps determine how far a value is from the mean.
70
What does the Cumulative Distribution Function (CDF) describe?
The distribution of random variables ## Footnote It sums the frequencies of random variables.
71
How is the Bernoulli distribution defined?
Discrete probability distribution with outcomes 1 (success) and 0 (failure) ## Footnote Here, n = 1 occurs with probability p and n = 0 occurs with probability q = 1 - p.
72
What is the probability density function (PDF) in relation to random variables?
Defines a probability distribution for a random variable ## Footnote It shows how data is distributed around the mean.
73
What is the average number of life insurance policies sold by a salesman using Poisson Distribution?
μ = 3 ## Footnote This is the average number of policies sold per week.
74
What is the expected probability of selling 2 or more policies but less than 5 policies?
0.61611 ## Footnote This is calculated using Poisson's law.
75
What is the average number of flaws for twenty aluminum alloy sheets?
μ = 2.3 ## Footnote This is calculated based on the frequency of flaws.
76
What does the Central Limit Theorem state?
The means of samples will approximate a normal distribution ## Footnote This holds true as the sample size increases.
77
What does the Standard Normal Distribution represent?
N(0, 1) ## Footnote It is the result of standardizing a normal distribution.
78
What is the z-score for a test score of 940, with a mean of 850 and a standard deviation of 100?
0.90 ## Footnote This indicates how many standard deviations Molly's score is above the mean.
79
How do you calculate the probability of more than 1 failure in a week with an average of 3 failures every 20 weeks?
Use Poisson Distribution ## Footnote The average per week is μ = 3/20 = 0.15.
80
What is the probability that a particular person who takes an IQ test will score between 90 and 110?
68% ## Footnote This is based on the empirical rule for normal distribution.
81
What is the height of the uniform distribution curve defined by the interval (a,b)?
1/(b-a) ## Footnote This ensures the area under the curve equals 1.
82
How is the probability of getting a specific outcome in a uniform distribution calculated?
Equal chance for all outcomes ## Footnote For example, rolling a die numbered 1 to 8.
83
What is the formula for the Poisson probability?
P(x; μ) = (e^-μ)(μ^x) / x! ## Footnote This calculates the probability of a given number of events in a fixed interval.
84
What is P(Z < 0.90)?
0.8159
85
How do you calculate P(z > 0.90)?
1 - P(z < 0.90) ## Footnote This results in 0.1841.
86
What percentage of students tested had a higher score than Molly?
18.41%
87
What does the Central Limit Theorem state about sample means?
The sample means will approximate a Normal Distribution.
88
What is the formula for the standard deviation of the sample mean in the Central Limit Theorem?
σ/√n
89
What does 'σ' represent in the Central Limit Theorem?
The population standard deviation
90
What is the minimum number of observations for the samples to be significant, according to the Central Limit Theorem?
More than 30 observations
91
True or False: The distribution of the original dataset matters when applying the Central Limit Theorem.
False
92
What happens to the distribution of sample means as the number of samples increases?
It gets closer to a Normal Distribution.
93
What is the mathematical explanation of the Central Limit Theorem for the mean?
The mean follows approximately the Normal distribution with mean µ and standard deviation σ.
94
Given µ = 10 and σ = 4, what is the sample size in the example provided?
100
95
What is the probability that the sample mean 'µ' of 100 observations is less than 9?
P(z < -2.5) = 0.0062
96
What is the maximum weight a large freight elevator can transport?
9800 pounds
97
What are the mean and standard deviation of the boxes in the cargo example?
µ = 205 pounds, σ = 15 pounds
98
What is the probability that all 49 boxes can be safely loaded onto the freight elevator?
P(T < 9800) = P(z < -2.33) = 0.0099
99
In the ticket purchase example, what is the mean number of tickets a student purchases?
µ = 2.4
100
What is the probability that all 100 students will be able to buy the tickets they desire given 250 tickets available?
P(T < 250) = P(z < 0.5) = 0.6915
101
What is the probability that the sample mean will be in the interval (490, 510) given µ = 500 and σ = 80?
0.7888
102
What is the mean amount of gasoline purchased weekly at the gas station?
50,000 gallons
103
What is the standard deviation of gasoline purchased weekly?
10,000 gallons
104
What is the total supply of gasoline after 11 weeks given a starting supply of 74,000 gallons and weekly delivery of 47,000 gallons?
591,000 gallons
105
What is the probability that the supply of gasoline will be below 20,000 gallons after 11 weeks?
P(T > 571000) = P(z > 0.63) = 0.2643
106
What is the unknown scheduled delivery 'A' that ensures a 0.5% probability the supply is below 20,000 gallons?
52854.88 gallons
107
What is Data Science?
An interdisciplinary field that uses scientific methods, algorithms, and systems to extract knowledge and insights from structured and unstructured data. ## Footnote Combines principles from statistics, computer science, mathematics, and domain knowledge.
108
What are the key steps involved in Data Science?
* Data Collection * Data Cleaning * Data Analysis * Data Visualization * Decision-Making
109
What is the expected growth rate of data science jobs by 2025?
35%
110
What is structured data?
Highly organized and easily searchable data, often stored in tabular format, adhering to a strict schema. ## Footnote Examples include relational databases and spreadsheets.
111
What are the characteristics of structured data?
* Fixed Format * Schema-Dependent * Easily Searchable
112
What is semi-structured data?
Data that does not conform to a rigid schema but includes tags or markers for organization. ## Footnote Examples include XML files and JSON documents.
113
What are the characteristics of semi-structured data?
* Flexible Schema * Hierarchical * Interoperability
114
What is unstructured data?
Data that lacks a predefined format or structure, making it challenging to store and analyze. ## Footnote Examples include text, images, and videos.
115
What are the characteristics of unstructured data?
* No Fixed Format * Complex Data Types * Difficult to Analyze
116
What is time-series data?
Data points collected or recorded at specific time intervals.
117
What skills are essential for a successful career in data science?
* Programming * Statistics and Mathematics * Data Wrangling * Big Data Tools * Critical Thinking * Communication * Machine Learning * Data Visualization
118
What is the average salary for a data scientist in the United States according to Glassdoor?
$116,000 base pay
119
True or False: Data science can automate tasks like image recognition and fraud detection.
True
120
What is data visualization?
The ability to present data insights through charts, graphs, and dashboards.
121
Fill in the blank: Data science is crucial for generating insights from _______.
data
122
What are the applications of data science in healthcare?
* Analyze patient data * Predict diseases * Develop personalized treatments * Optimize hospital operations
123
What do data scientists do?
Combine skills from data analysis, engineering, and machine learning to derive insights and build predictive models.
124
What is the role of machine learning in data science?
Creating intelligent data-driven solutions.
125
What industries are being transformed by data science?
* Healthcare * Finance * Retail * Technology * Transportation * Manufacturing * Energy * Agriculture
126
What is data cleaning?
Ensuring that data is accurate, consistent, and usable by handling missing values, outliers, and errors.
127
What is the importance of data science in decision-making?
Data-driven decisions are more objective and often lead to better outcomes.
128
What are outliers in data science?
Data points that deviate significantly from the rest of the data.
129
What are missing values in data science?
Data points that are incomplete or unavailable.
130
What is the primary role of a Data Engineer?
Develops and maintains data infrastructure and architecture
131
What does a Machine Learning Engineer do?
Designs and implements machine learning models
132
What is the main responsibility of a Data Scientist?
Combines skills from data analysis, engineering, and machine learning to derive insights and build predictive models
133
List three programming languages essential for data science.
* Python * R * SQL
134
Which libraries are commonly used for data analysis and visualization?
* Pandas * NumPy * Matplotlib * Seaborn
135
Name three machine learning frameworks mentioned.
* Scikit-learn * TensorFlow * Keras
136
What big data technologies are listed?
* Hadoop * Spark * Hive * NoSQL databases
137
What is one way to prepare for a career in data science?
Take courses in programming (Python, R), statistics, and databases
138
Fill in the blank: One must gain proficiency in tools like __________ and Tableau.
[Jupyter Notebooks]
139
What are key soft skills needed in data science?
* Critical Thinking * Communication * Domain Knowledge
140
Define the term 'Variables' in the context of data.
Age, city, and salary are variables
141
What does each row represent in a dataset?
Each row represents an individual
142
What is the average age calculated from the dataset?
31.6 years
143
What is the average salary calculated from the dataset?
77,400 dollars
144
How do salaries vary across cities according to the dataset?
* New York: 80,000 dollars * Los Angeles: 80,000 dollars * Chicago: 73,500 dollars
145
What statistical measure is the middle value when sorted?
Median
146
What is the range of ages in the dataset?
15 years
147
What does a positive correlation indicate between two variables?
As one increases, the other tends to increase too
148
What technique would you use to predict salary based on age?
Linear regression
149
What is mean imputation?
Replacing missing data with the average of available data
150
List one pro and one con of mean imputation.
* Pro: Simple and quick to apply * Con: It can distort relationships in the data
151
What is one alternative to mean imputation for handling missing data?
Deleting the row with missing data
152
What is the benefit of standardizing data?
Prevents errors and improves analysis
153
Define standardization in data handling.
Ensuring that data is consistent across all entries
154
What is the primary goal of handling missing and inconsistent data?
To ensure that our dataset is clean, reliable, and ready for analysis
155
True or False: Deleting rows with missing data can lead to bias if missing values are common.
True
156
What is Big Data?
Big Data refers to datasets that are too large, complex, or fast-changing to be processed by traditional data management tools. ## Footnote These datasets often contain structured, semi-structured, and unstructured data, requiring special techniques to process and analyze them.
157
What are the 5 V's of Big Data?
Volume, Velocity, Variety, Veracity, Value. ## Footnote Each characteristic highlights a different aspect of Big Data.
158
Define Volume in the context of Big Data.
Large amounts of data (terabytes, petabytes). ## Footnote Volume refers to the sheer quantity of data being collected and analyzed.
159
Define Velocity in the context of Big Data.
Speed at which data is generated and processed (e.g., real-time data). ## Footnote Velocity emphasizes the need for rapid processing of incoming data.
160
Define Variety in the context of Big Data.
Different types of data (structured, unstructured, semi-structured). ## Footnote Variety indicates the diverse formats and sources of data.
161
Define Veracity in the context of Big Data.
Quality and reliability of data. ## Footnote Veracity addresses the trustworthiness of the data being analyzed.
162
Define Value in the context of Big Data.
The usefulness of data for decision-making. ## Footnote Value emphasizes the importance of deriving actionable insights from data.
163
What are some examples of Big Data?
Social media interactions, e-commerce transactions, healthcare records, Internet of Things (IoT) devices generating real-time data. ## Footnote These examples illustrate how Big Data manifests in various sectors.
164
What is Hadoop?
A framework for distributed storage and processing of large datasets. ## Footnote Hadoop is essential for handling Big Data in a scalable manner.
165
What is Spark?
A fast processing engine for Big Data. ## Footnote Spark enhances data processing speed and efficiency.
166
What are NoSQL Databases?
Databases like MongoDB and Cassandra for handling unstructured data. ## Footnote NoSQL databases provide flexible data models suited for Big Data.
167
Why is there a hype around Data Science?
Media attention, high job demand, breakthrough innovations. ## Footnote These factors contribute to the perception of Data Science as a critical field.
168
True or False: Data Science is a one-size-fits-all solution.
False. ## Footnote Data Science requires tailored approaches for different business challenges.
169
What is Datafication?
The process of turning various aspects of life and business into data that can be measured, analyzed, and acted upon. ## Footnote Datafication transforms everyday activities into quantifiable data.
170
What are some areas where Datafication is increasing?
Social Media, Healthcare, Retail. ## Footnote These areas exemplify the growing trend of data collection and analysis.
171
What factors are driving the increase in Datafication?
Technology advancements, cloud computing, AI and machine learning. ## Footnote These developments facilitate the generation and processing of large datasets.
172
What is the current industry demand for data professionals?
Growing demand for data scientists, data engineers, and analysts across industries. ## Footnote This trend is prevalent in healthcare, finance, retail, and tech sectors.
173
What are universities offering in relation to Data Science?
Dedicated programs and certifications in Data Science, Big Data, and AI. ## Footnote Academic institutions are responding to the demand for skilled professionals.
174
What research areas are being explored in the field of Data Science?
Deep learning, natural language processing, data ethics. ## Footnote These areas reflect the evolving landscape of Data Science research.
175
What type of data warehouse is Vertica?
On-Premises/Cloud Data Warehouse ## Footnote Provider: Micro Focus
176
What is a key feature of Vertica?
Columnar storage for faster data retrieval ## Footnote Designed for high-speed querying and large-scale data analytics.
177
What type of data warehouse is Apache Hive?
Open-Source Data Warehouse (Big Data Focus) ## Footnote Provider: Apache Software Foundation
178
What is a primary feature of Apache Hive?
Supports SQL-like querying using HiveQL ## Footnote Built on Hadoop for handling big data in distributed systems.
179
What is Cloudera Data Platform (CDP)?
On-Premises/Cloud Data Warehouse (Big Data Focus) ## Footnote Provider: Cloudera
180
What is a notable feature of Cloudera Data Platform?
Built for managing large volumes of unstructured data ## Footnote Allows for integration with Hadoop and Spark.
181
What are data warehousing appliances?
Pre-configured hardware and software systems designed for data warehousing ## Footnote Examples include Oracle Exadata and IBM Netezza.
182
What is the purpose of a data mart?
A smaller, department-specific data store for focused analysis ## Footnote Tailored to the needs of specific departments like sales or marketing.
183
What is an Enterprise Data Warehouse (EDW)?
A centralized, company-wide data warehouse for all business units ## Footnote Provides a holistic view of the business.
184
What is an Operational Data Store (ODS)?
A real-time, short-term data store used for operational purposes ## Footnote Holds detailed transactional data for current operations.
185
What is a key characteristic of a data warehouse?
Time-variant ## Footnote Contains historical data for trend analysis.
186
What does non-volatile mean in the context of data warehousing?
Once data is stored, it typically doesn’t change ## Footnote Unlike operational databases that may be updated continuously.
187
What does the term 'subject-oriented' mean in data warehousing?
Data is organized around key subjects or areas ## Footnote Examples include sales, marketing, or finance.
188
What is the main goal of a data warehouse?
To consolidate and organize data from different sources for analysis ## Footnote Helps businesses make better decisions.
189
What is Amazon Redshift?
A fully managed, petabyte-scale cloud data warehouse service ## Footnote Provider: Amazon Web Services (AWS)
190
What is a feature of Google BigQuery?
Serverless architecture ## Footnote Allows for high-speed processing of big data queries.
191
What is unique about Snowflake's architecture?
It separates compute and storage for enhanced performance ## Footnote Allows for scalable, elastic resources.
192
What is a data warehouse similar to?
A central storage system for data ## Footnote Comparable to a library storing books on various topics.
193
What does the term 'hybrid cloud support' refer to?
Support for both on-premises and cloud deployments ## Footnote Seen in platforms like Teradata and IBM Db2 Warehouse.
194
What is the purpose of advanced analytics in data warehousing?
To provide insights and facilitate data-driven decision-making ## Footnote Integrated with machine learning in some solutions.
195
True or False: Data marts are larger than enterprise data warehouses.
False ## Footnote Data marts are smaller and focused on specific areas.
196
Fill in the blank: The __________ is designed for real-time analytics with HANA in-memory processing.
SAP BW/4HANA
197
What is a query?
A query is a request for data or information from a database.
198
What language is used to communicate with databases?
SQL
199
What is syntax in the context of SQL?
Syntax is the predetermined structure of a language that includes all required words, symbols, and punctuation.
200
What is the purpose of the SELECT keyword in SQL?
To choose the columns you want to return.
201
What does the FROM keyword specify in an SQL query?
It specifies the tables where the columns you want are located.
202
What is the function of the WHERE clause in an SQL query?
To filter for certain information.
203
Fill in the blank: A SQL query can be initiated with the keywords SELECT, FROM, and _______.
WHERE
204
What does an Operational Data Store (ODS) typically hold?
Up-to-date, detailed transactional data.
205
True or False: An ODS is meant for long-term, complex analytics.
False
206
What is a data mart?
A smaller, department-specific data store for focused analysis.
207
List three characteristics of a data mart.
* Focused on a specific area * Smaller and faster * Subset of EDW
208
What is a data warehousing appliance?
A pre-configured hardware and software solution designed to simplify and accelerate data warehousing tasks.
209
Name one example of a data warehousing appliance.
Oracle Exadata
210
What is Amazon Redshift?
A fully managed, petabyte-scale data warehouse service in the cloud.
211
What type of architecture does Google BigQuery use?
Serverless architecture
212
What is the primary feature of Snowflake's architecture?
It separates compute and storage for enhanced performance.
213
What type of data warehouse is IBM Db2 Warehouse?
On-Premises/Cloud Data Warehouse
214
What does SAP BW/4HANA leverage for fast data processing?
HANA in-memory database
215
True or False: Cloudera Data Platform is focused exclusively on structured data.
False
216
What is the primary function of Apache Hive?
To provide data summarization, querying, and analysis.
217
Fill in the blank: Vertica is designed for high-speed _______.
querying
218
What is Artificial Intelligence (AI)?
The branch of computer science that aims to create machines that can mimic human intelligence.
219
What are the key features of AI?
* Perceive * Reason * Learn * Act
220
Who laid the theoretical foundations of AI in the 1950s-1960s?
Alan Turing and John McCarthy.
221
What is Narrow AI?
AI designed for a specific task and cannot think beyond it.
222
What is an example of Narrow AI?
Google Search.
223
What defines General AI?
AI that has human-like cognitive abilities and can learn and apply knowledge to different tasks.
224
What is Super AI?
Hypothetical AI that surpasses human intelligence.
225
What are Reactive Machines in AI?
AI that reacts based on predefined rules but cannot learn from experience.
226
What is Limited Memory AI?
AI that can remember past data and make better decisions.
227
What is the foundation of AI?
Data.
228
What role do algorithms play in AI?
They are sets of rules that process data to make decisions.
229
What is one application of AI in healthcare?
AI diagnoses diseases using medical imaging.
230
What is one advantage of AI?
* Automation * Efficiency * Accuracy * Personalization
231
What is a disadvantage of AI?
* High Cost * Job Loss * Bias in Data * Lack of Human-like Thinking
232
What are the two main components of an AI project?
* Data * Model * Compute
233
What inspired Artificial Neural Networks (ANNs)?
The human brain.
234
What is the purpose of the activation function in a neural network?
To map the input between (0, 1).
235
What is an Epoch in AI training?
When AI has been fed with the entire dataset once.
236
What is a Batch Size in AI training?
Smaller batches of training data fed to the model individually.
237
What does the Learning Rate represent in AI training?
The size of the steps taken to update the weights in the model.
238
Fill in the blank: AI can perform tasks such as _______.
speech recognition, image processing, decision-making, and language translation.
239
True or False: Self-Aware AI is fully developed.
False.
240
What is the goal of AI training?
To find the optimal values of weights.
241
What is the purpose of a feedback mechanism in AI?
To continuously learn from past experiences to improve accuracy.
242
What is an example of AI application in retail?
AI recommends products based on user preferences.
243
What is a core challenge in AI development?
Ensuring ethical and responsible use.
244
What does the term 'Theory of Mind AI' refer to?
AI that can understand human emotions and interact accordingly.
245
What is one industry where AI is transforming processes?
Healthcare, Finance, Retail, Education.
246
What is a key characteristic of AI applications in autonomous vehicles?
AI helps self-driving cars navigate roads.