Eksamen Flashcards
(30 cards)
What contributes BA to?
Company strategy. Core compenticies, Predict, internal processes and understand customers. Relation between BA and decision makers.
What is the goal of BA?
To implement it as a strategic ressource. Think it on every level, and not something you do on the side.
What are the three steps of analytics?
Deskriptiv - What is there?
Predictive - what will happen?
Preskriptiv - What we need to do to optimize?
Which 3 skills do we need as a data scientist?
Hacking - programming
Math - statistician and math
Substantive expertise - viden om emneområdet
What is the difference between data analyst and business analyst?
Business analyst is a bit more pragmatic, and have more domain knowledge.
What is machine learning?
They learn without saying exactly how to. Learn by experience.
What is the difference between supervised learning an unsupervised learning?
Supervised learning has a predefined relation between input and for example regression. Good for purpose pull.
Unsupervised learning do not where it ends. Can be clustering. Good for Datapush
Describe the data-purpose paradox?
Either you have the data or the purpose first.
Datapush - you have the data first, no purpose. Unsupervised learning, clustering.
Purpose pull - you have the purpose, but no data. Supervised learning.
Describe the linear regression
Overall use:
Next step after correlation. Used for predictions, and relationship between variables.
Dependent variable - the ones that get affected by the others. For example grade.
Independe variable - The ones that affect the others. For example study hours.
Relationship between the variables can be either positive or negative.
Continous outcome, so it can be everything, price, age and so on.
Describe the logistic regression.
Same as linear regression, but has a binary outcome, so it can be used to categorise. Do you pass the exam, should we invest or not?
Describe a decision tree
Decisionmaking, and binary outcome. If you do this, then this can happen. Predictions. See how big the probability is that the one will happen and the other ting. Make general rules for next time doing something.
Describe regression tree
Same as decision tree but continuous outcome. Predictions and what will happen if you do this, and this. General rules.
Describe neural network.
Alghoritms that is trained like a human brain. Can be used to predict what a image is, and predict if it is a triangle or a square.
It needs to be trained, so it will become better and better.
Describe support vector machine
Good for classifying. You find the maximal margin classifier, but that does not allow for misclassification. If there is misclassification and, you find a soft margin classifier, if there is misclassification. You find that by using cross validation, that compares the diffrent soft margin classifiers.
But sometimes the dataset is not possible to make a linear line, and you move it up to another dimension so you can make a linear line. You use a so-called kernel function, which there is many different of. Then you can make a linear line.
Effective when the dataset is not linear.
What is the different measuring scale elements?
Nominal - just categories without able to compare. blue green
Ordinal - you can compare, but not exactly in which degree. Good, very good.
Interval - You can compare, but there is no 0. For example weather, you can’t say that 20 is double as hot as 10.
Ratio - weight. You can compare directly. The best for statistics.
Explain mean, median and standard deviation
Mean - average
Median - Mid number
Standard deviation - how far is a number for the mean.
Explain covariance and correlation
Both of them explain the relationship between two variables. Covariance is best when the numbers are on the same scale, and can be hard to compare.
Correlation can only go from -1 to 1. Easier to compare to different versions.
Explain sample from a distribution
You want to say something broader of a population based on a sample.
Can help you to verify your statistics.
Explain statistical significance
Whether you can say your research is true. Often a p value on 5 %. Says that it is under 5 %, that the result you will have happened by a coincidence.
Which ML methods for classification?
K-nearest neighbour, support vector machine and logistic regression.
Which ML methods for continuous outcome?
Linear regression, regression tree
Which issues can you have with digital data and analytics?
Pricing model - if you have many customers, high prices. uber
Customer classification - if they shop baby clothe, you can say she is pregnant
Surveillance - monitor people like china.
Positive forces with digital data
Citizens can become powerful
Alghoritms can sometimes make better decisions that human.
Which ethical issues can there be with digital data?
Legal - discirimated gender or race.
Homogenization of workspace - hire to many of the same people based om AI.
Impaxt on business partners - the trust can be less.