Chapter 1 Continued Flashcards
(102 cards)
What is the purpose of data transformation?
Data transformation aims to map the values of an attribute to new replacement values.
What are the common techniques used in data transformation?
Common techniques for data transformation include data normalization, standardization, and conversion.
Why might you want to combine attributes during data transformation?
Combining attributes can create more useful ratios or relationships between them.
How does scaling data benefit data mining algorithms?
Scaling attributes to the same approximate scale improves the performance of many data mining algorithms and results in better models.
What is the purpose of data conversion in data transformation?
Data conversion can involve converting categorical data to numeric values and discretizing continuous data, making it more intuitive and improving algorithm performance.
Why is feature scaling important in data mining?
Feature scaling is essential in data mining because variables with widely varying ranges can lead to biases in the results, favoring attributes with larger ranges.
What is the primary goal of feature scaling in data mining?
The main objective of feature scaling is to ensure that all variables or features are within the same scale, preventing attributes with large ranges from dominating those with smaller ranges.
What is feature scaling?
Feature scaling is essential in data mining because variables with widely varying ranges can lead to biases in the results, favoring attributes with larger ranges.
What is the primary goal of feature scaling in data mining?
The main objective of feature scaling is to ensure that all variables or features are within the same scale, preventing attributes with large ranges from dominating those with smaller ranges.
What are the two common methods of feature scaling?
Normalization and standardization are two common methods of feature scaling.
What is normalization?
Normalization scales the values of a feature to a range between 0 and 1.
What is standardization?
Standardization scales the values to have a mean of 0 and a standard deviation of 1.
When is normalization useful?
Normalization is useful when the distribution of the feature is not Gaussian.
When is standardization useful?
Standardization is useful when the distribution of the feature is Gaussian.
Why are normalization and standardization used?
Both techniques are used to improve the performance of machine learning algorithms by ensuring that all features have equal importance.
How is Min-Max Normalization calculated for a value like $73,000 in the income range of $12,000 to $98,000?
To normalize $73,000 using Min-Max Normalization, you calculate it as (73,000 - 12,000) / (98,000 - 12,000), which results in 0.716.
What is the formula for Z-Score Standardization?
The formula for Z-Score Standardization is (x - mean) / standard deviation (sd), where x represents the data point.
What is data conversion?
Changing data from one format to another
What are some DM techniques that can handle categorical variables without transforming them?
Naïve Bayes and decision tree
Other techniques (such as neural nets and regression) require only numeric inputs.
What is data conversion encoding?
Ordinal to Numeric
How is a single categorical variable with m categories typically transformed?
m-1 dummy variables
Why is data conversion important?
To make data usable across different systems or applications
Why do we need to convert nominal fields into numeric values for techniques like neural nets and regression?
These techniques require only numeric inputs
How can ordinal data be converted to numbers?
Preserving natural order