Entropy Flashcards
(6 cards)
Data Science
Data Science comprises principles, processes and tools which facilitate the understanding and interpretation
of phenomena based on (automated) data analysis techniques.
Data Science methods represent key components for the creation of a new „revolutionary“
development of the discipline of management accounting
Predictive Analytics (PA)
Methods and techniques which help making predictions regarding the occurrence of future events and the
individual probabilities for the occurrence of these events.
Ability to extract useful information from data will represent an important comparative advantage
in the future
Entropy
Entropy is an expression of the disorder, or randomness of a system, or of our lack of information about it.
Information entropy is defined as the average amount of information produced by a stochastic source of
data.
Reduce Entropy
Having LESS information (less density of information) in the container is an advantage because it
makes the information better predictable
Entropy = 0
Completely predictable value -> we do not have to focus on this information, it is already
predictable
For example only people in black shirts can enter the room, ofc 100% of people in the room are in black shirts
Entropy = 1
Completely unpredictable value -> we have to focus on this information and try to reduce
entropy -> to make the information better predictable
For example coin, two sides are equally likely, 50/50 means entropy is 1
High entropy is not helpful if we want to predict future events before they occur
High entropy is not helpful when we want to classify objects / subjects in sets that are
homogeneous / pure
Splitting
We can reduce the „disorder“ by splitting the original set into subsets
Splitting the set: -> allows to define how much more information is provided by the set A
compared to the set B
Information Gain = Reduction of entropy by splitting a set on all values of an attribute