A-kassen Flashcards
(88 cards)
What is a label?
What we want to predict.
What is the label in the paper?
Churn/loyal members.
Why did you use Jupiter?
Jupiter is like a notebook that comprise different code languages and python code is one of the languages. This is a powerful tool that enabled us to do the data mining through specify, program our own process steps.
What was the main purpose of the data preparation?
To take a critical look on the dataset. Make it so clean and small as possible because this gives more prediction power.
What is the first step in the data preparation?
Exploration phase
What does the exploration phase entail?
Three-step process. Missing data, Data types, Outliers
Why was it important to make predictions about customers intentions or tendencies to churn?
- Enable efficiency in regards to planning of retaining customers - for example through specific promotional activities
- Comparison between predictions and actual results allows to spot meaningful indicators, useful for improving performance
How is predictive modelling and data mining useful for companies?
Instruments for improving the decision making process within companies.
The more accurate and timely the knowledge is, the increases the likeliness for the company to improve their business performance and industry position.
Why do companies use machine learning algorithms and big data?
To reduce uncertainty of for example unforeseen market changes or customers behaviour.
What did you do in your paper?
We analysed customer data from Akademikernes A-kasse in an effort to create a predictive model of which attributes are applicable when customers chose to churn. However the model we created does not predict how likely the customer is to churn but gives a binary churn/no churn result.
What do you mean about a binary churn/no churn result?
It means that we classified the two elements of a given set, churn/no churn into two on the basis of a classification rule. And we classified them on the basis of attribute selection in the predictive model.
What could be the next step for Akademikernes A-kasse?
To use regression to calculate the likelihood of churn within the predicted group of churners we have identified. A heat map for example, could visualise which customers and what attributes where more likely to churn in the future.
Why did you chose to use the CRISP model in your paper?
We decided to use it as a guiding principle for the paper, as we saw correlations of what the model present and what we wanted to do in our paper. However we were aware that the model is an exploratory tool that emphasise approaches and strategy rather than software design.
What are the six stages in the data mining process that the CRISP model defines?
Business understanding, data understanding, data preparation, modeling, evaluation, and deployment.
What does the business understanding stage present?
Concern the practical business goals that the organisation wants to achieve. This goal were converted into a problem that the data mining seek to solve. AK seek to uncover the reasoning behind the churn of their members and in turn ultimately reduce the number of churners.
Could you mention some critical perceptions of your paper?
To begin with, we thought we would use decision trees as they are high quality models generate simple rules and make it easy to understand the impact.
Decision trees will be very complex and big if there are many attributes so we were really harsh in the attribute selection as we wanted few attributes. If we did not have this initial plan the number of attributes might be higher .
How can AK use your results?
Knowledge of the customers. Attributes that are linked previous churners, and therefore these tendencies can be used to spot current members that are linked to the same attributes.
What is data mining?
A process used for discovering meaningful relationships, trends, and patterns between large amounts of data collected in a dataset.
What is clustering?
Clustering is gathering groups of people with common attributes in a K number of clusters. Clustering (also called unsupervised learning) is the process of dividing a dataset into groups such that the members of each group are as similar (close) as possible to one another, and different groups are as dissimilar (far) as possible from one another.
How can clustering create value for AK?
Goal is to group together similar instances using some metric of similarity - so create groupings where the members of a given group are similar to each other. For example group similar customers together and design different campaigns.
It is light classification but the groupings are not predefined.
Could find a way to group similar customers together. May or may not relate to the churn question.
What can you do to present results from data mining as informative as possible?
You can sacrifice details - subjective decision.
Switching from ROC (receiver operating characteristics) with AUC (area under curve) and lift curves/cumulative response curve.
ROC curves are not the most intuitive visualisation for many business stakeholders who really ought to understand the results. One of the most common examples of the use of an alternate visualisation is the use of the “cumulative response curve, which is more intuitive
What are lift curves?
Visualisation framework that might not have all of the nice properties of ROC curves, but are more intuitive. So, conceptually as we move down the list of instances ranked by the model, we target increasingly larger proportions of all the instances.
Why is your data affected by selection bias?
Only access to Akademikernes A-kasse dataset, and not other A-kasse organisations. Therefore the dataset does only comprise information about AK´s members and is not a representable representation of the entire population.
What makes machine learning algorithms supervised?
We know the target class, and more specifically what we are looking at. The opposite is unsupervised which does not provide a purpose or target information.