Terms Flashcards
(23 cards)
What are Common tasks?
Tasks that Data mining algorithms address
What are the types of Common tasks?
Classification Regression Similarity matching Clustering Co-occurrence grouping Profiling Link prediction Data reduction Causal modelling.
What are some techniques for data analytics tasks?
Statistics Database query Data warehouse Machine learning Data mining
What is Big Data?
Most used definition of big data: Volume,
Velocity, Variety
Big data is high-volume, high-velocity and/or high-variety information assets that demand
cost-effective, innovative forms of information processing that enable enhanced insight,
decision making, and process automation.
What does Big Data consist of?
Web data Text data Time/Location data Smart grid and sensor data Social network data
How is Big Data different from traditional data?
(1) Big Data can be an entirely new source of data
(2) The speed of data feed has increase to such an extent that it qualifies as a new data source
(3) Increasingly more semi-structured and unstructured data
What is the Paradigm shift in terms of analytic focus?
From descriptive to predictive and prescriptive analytics when using Big Data
What is the Business value of Big Data?
(1) To draw insight from data
(2) To make better decision based on the insight
(3) To automate the decision and bake it into a business process
What are some applications of Big Data across industry sectors?
Segmentation and prediction Churn prediction Recommender systems and targeted marketing Sentiment analysis Operational analytics
What is a Data warehouse?
Data warehouse collect and combine data from across an enterprise, often from multiple processing systems, each with its own database
How do you store Big Data
Hadoop framework. Traditional databases and warehouses fall short when dealing with big data.
What is Machine learning?
Is computers to learn and act like humans do, and improve their learning over time in autonomous fashion, by feeding them data and information in the form of observations and real-world interactions.
What is Data mining?
Data mining is to extracting knowledge from a large amount of data. It spun off from Machine Learning. We often describe data mining as the process of building models.
Classification algorithms
Classification is the most frequently used data mining method for real world problems to create models from
Cluster analysis
Is a data mining method for grouping items to create models from
Association rule (Co-occurrence grouping)
Is a data mining method widely used in retail industry. Association rule mining aims to find interesting relationship between items in large datasets to create models from.
Classification matrix
Estimating the true accuracy of classification models. True Positive/Negative rate, Accuracy etc
Classification algorithms
A number of algorithms are used for classification modelling, fex KNN
KNN
K Nearest Neighbour is a data mining algorithm mainly used for classification task.
K Means
The k-means algorithm (where k stands for the predetermined number of clusters) is one of
the most referenced clustering algorithms.
Supervised learning (methods)
We have X (data) and we use this in a calculation in order to get Y
Unsupervised learning (methods)
We have X (data) but we don’t have any predictions (Y) about what the answer will be. It is up to the algorithm to come up with new data and an answer we can’t predict.
Support, Confidence, Lift
Association rules provide information in the form of if-then statements. The S, C and L calculation gives a percentage of the S, C, L of the information