Proctor Exam 2- Secondary Data Syndicated and Big Data Flashcards

(94 cards)

1
Q

Machine learning can be applied anywhere there is a

need for quick automatic decisions based on ongoing feedback from patterns in the environment.

A

Machine learning can be applied anywhere there is a

need for quick automatic decisions based on ongoing feedback from patterns in the environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Problems that lend themselves to machine learning:

  1. The data causes problems for traditional analytic techniques, such as where variables
    are highly correlated, data is non-linear, or where there are far more variables than
    records (so called “wide and shallow” datasets)
  2. Accuracy is more important than understanding
  3. Potential outputs are defined, but the action is dependent on conditions which themselves cannot be easily predicted or identified before the event happens.
  4. Rules and associations might be perceived or deduced, but are not easily described by logical rules
A

Problems that lend themselves to machine learning:

  1. The data causes problems for traditional analytic techniques, such as where variables
    are highly correlated, data is non-linear, or where there are far more variables than
    records (so called “wide and shallow” datasets)
  2. Accuracy is more important than understanding
  3. Potential outputs are defined, but the action is dependent on conditions which themselves cannot be easily predicted or identified before the event happens.
  4. Rules and associations might be perceived or deduced, but are not easily described by logical rules
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What makes this a good machine-learning problem is that the decisions and
the variables constantly change and the value of one variable and the right decision may depend
on the values of many more variables. Humans instinctively make these assessments, but it is
impossible to discretely list every rule and situation for a computer to look up and evaluate.

A

What makes this a good machine-learning problem is that the decisions and
the variables constantly change and the value of one variable and the right decision may depend
on the values of many more variables. Humans instinctively make these assessments, but it is
impossible to discretely list every rule and situation for a computer to look up and evaluate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Machines learn by studying data to detect patterns or by applying known rules (algorithms) to:
Categorize like or unlike people or things
Identify patterns and relationships that were unknown before analysis
Predict likely outcomes or actions based on identified patterns
Detect anomalous or unexpected behaviors

A

Machines learn by studying data to detect patterns or by applying known rules (algorithms) to:
Categorize like or unlike people or things
Identify patterns and relationships that were unknown before analysis
Predict likely outcomes or actions based on identified patterns
Detect anomalous or unexpected behaviors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Machines learn through essentially an exhaustive process of trial and error, sifting through information, comparing the information to a goal, making adjustments, and trying again

A

Machines learn through essentially an exhaustive process of trial and error, sifting through information, comparing the information to a goal, making adjustments, and trying again

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Within Machine Learning The traditional advanced analytic techniques you will learn about later in this course are not well
suited for the unstructured nature of some big data

A

Within Machine Learning The traditional advanced analytic techniques you will learn about later in this course are not well
suited for the unstructured nature of some big data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Machine learning, however, takes advantage of a computer’s ability to
follow rules and execute swift comparisons as a fast way to understand patterns and meaning in data.

The algorithms automatically sort data, testing and comparing what it has seen in the past to what it is seeing in the present. The learning may lead to a new understanding of behavior or it might serve as automatic input to an action executed by another computer process.

A

Machine learning, however, takes advantage of a computer’s ability to
follow rules and execute swift comparisons as a fast way to understand patterns and meaning in
data.

The algorithms automatically sort data, testing and comparing what it has seen in the past to
what it is seeing in the present. The learning may lead to a new understanding of behavior or it might serve as automatic input to an action executed by another computer process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Supervised learning always has a predetermined outcome provided by the programmer. The machine seeks faster, more efficient, or more accurate ways to meet the goal based on the data and the programmers input.

A

Supervised learning always has a predetermined outcome provided by the programmer. The machine seeks faster, more efficient, or more accurate ways to meet the goal based on the data and the programmers input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Supervised Learning

Process
Machine is given pre-classified data and discovers a pattern associated with the classification. As more data becomes available, the machine adjusts its associations and gets better at classifying

Example
Sorting out junk emails from wanted content

Limitations
Only works on one task at a time.
User may not be able to interpret the associations behind the sorting.

Traditional stat tool
Regression
Classification
Decision Trees
Random Forests
Bayesian statistics
A

Supervised Learning

Process
Machine is given pre-classified data and discovers a pattern associated with the classification. As more data becomes available, the machine adjusts its associations and gets better at classifying

Example
Sorting out junk emails from wanted content

Limitations
Only works on one task at a time.
User may not be able to interpret the associations behind the sorting.

Traditional stat tool
Regression
Classification
Decision Trees
Random Forests
Bayesian statistics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Unsupervised Learning

Here the data determines the outcome. The algorithm’s mission is to extract structure from the data, and to present the structure in a way that is useful to us. Data is segmented and scored based on what the computer itself decides is relevant or related.

A

Unsupervised Learning

Here the data determines the outcome. The algorithm’s mission is to extract structure from the data, and to present the structure in a way that is useful to us. Data is segmented and scored based on what the computer itself decides is relevant or related.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Unsupervised Learning

Process
Machine is given a lot of data and told to hunt for patterns and “clusters” of things related to each other. It draws its own conclusions about relationships.

Example
Recommendation engines
Loyalty card targeting

Limitation
Usually requires human input after the fact

Traditional Statistics Tool
Factor Analysis
Cluster Analysis
Multidimensional Scaling
Principle Component
A

Unsupervised Learning

Process
Machine is given a lot of data and told to hunt for patterns and “clusters” of things related to each other. It draws its own conclusions about relationships.

Example
Recommendation engines
Loyalty card targeting

Limitation
Usually requires human input after the fact

Traditional Statistics Tool
Factor Analysis
Cluster Analysis
Multidimensional Scaling
Principle Component
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Reinforcement Learning

This type of learning has no supervisor, but instead it has a reward signal that defines success. Similar to human learning, when success is rewarded, the machine tries to learn the patterns that result in receiving the reinforcement signal. The machine’s decisions affect the subsequent data it receives.

A

Reinforcement Learning

This type of learning has no supervisor, but instead it has a reward signal that defines success. Similar to human learning, when success is rewarded, the machine tries to learn the patterns that result in receiving the reinforcement signal. The machine’s decisions affect the subsequent data it receives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Reinforcement Learning

Process
Machine not only analyzes data but uses the output to improve efficiency or create new strategies. Learns how to apply a set of rules toward an outcome in the most efficient way.

Example
Game playing bots
War Game Simulations

Limitation
Strategies may not be understandable by humans so may be limited to one situation.

Traditional Statistics Tool
Game Theory
Linear Programming

A

Reinforcement Learning

Process
Machine not only analyzes data but uses the output to improve efficiency or create new strategies. Learns how to apply a set of rules toward an outcome in the most efficient way.

Example
Game playing bots
War Game Simulations

Limitation
Strategies may not be understandable by humans so may be limited to one situation.

Traditional Statistics Tool
Game Theory
Linear Programming

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Unsupervised Learning works well if we have little or limited knowledge of the data. The best examples of this application are so-called targeting engines or recommendation engine . When a
supermarket checkout machine issues you a coupon at checkout

A

Unsupervised Learning works well if we have little or limited knowledge of the data. The best examples of this application are so-called targeting engines or recommendation engine . When a
supermarket checkout machine issues you a coupon at checkout

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The best examples of reinforcement machine learning are machines that play games. Typically the
machine is taught the rules of the game and given a goal to win.

A

The best examples of reinforcement machine learning are machines that play games. Typically the
machine is taught the rules of the game and given a goal to win.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Big Data is “The record of all interactions with people, institutions, and things recorded and stored digitally.”
Big data, then, is the digital trail left by humans and their connected machines.

A

Big Data is “The record of all interactions with people, institutions, and things recorded and stored digitally.”
Big data, then, is the digital trail left by humans and their connected machines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The 7 V’s of Big Data

Volume 
Velocity
Variety
Variability
Visualization
Veracity
Value
A

The 7 V’s of Big Data

Volume 
Velocity
Variety
Variability
Visualization
Veracity
Value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Within the Healthcare industry, pharmaceutical syndicated services tracks sales, price, and
distribution of most pharmaceuticals.

A

Within the Healthcare industry, pharmaceutical syndicated services tracks sales, price, and
distribution of most pharmaceuticals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The majority of pharmaceutical data comes from patient billing and processing at every point of
the drug supply chain.

A

The majority of pharmaceutical data comes from patient billing and processing at every point of
the drug supply chain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Unlike most CPG categories however, the pharmaceutical sales and
distribution chain involves many government regulations, physicians, other providers, and
insurance companies as mediators of sales to the patient as the ultimate consumer. As we have
said, drug products historically have not been easily tracked using industry standard digital codes
such as the UPC .

A

Unlike most CPG categories however, the pharmaceutical sales and
distribution chain involves many government regulations, physicians, other providers, and
insurance companies as mediators of sales to the patient as the ultimate consumer. As we have
said, drug products historically have not been easily tracked using industry standard digital codes
such as the UPC .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Big Data Researcher Skills
These are the most common skills found on big data analytic teams:

Programming 
Data Manipulation
Exploratory Data Analytics 
Mathematics Statistics 
Business Skills domain Expertise
People Skills communication Skills
A

Big Data Researcher Skills
These are the most common skills found on big data analytic teams:

Programming 
Data Manipulation
Exploratory Data Analytics 
Mathematics Statistics 
Business Skills domain Expertise
People Skills communication Skills
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The evolution of big data has produced an entirely new field called data science, an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms.

A

The evolution of big data has produced an entirely new field called data science, an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

In addition to the typical and historical skills of qualitative techniques and traditional statistical skills of quantitative surveys and analytics, modern researchers are concerned with:
Data Curation
Data Governance
Data Provenance

A

In addition to the typical and historical skills of qualitative techniques and traditional statistical skills of quantitative surveys and analytics, modern researchers are concerned with:
Data Curation
Data Governance
Data Provenance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Data Curation

Right data is assembled for the right question

A

Data Curation

Right data is assembled for the right question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Data Governance | Data is secure & accurate
Data Governance | Data is secure & accurate
26
Data Provenance | Data from reputable sources & tracked through all potential uses
Data Provenance | Data from reputable sources & tracked through all potential uses
27
Machine Learning is also being used to understand the return on investment of marketing itself ( MROI ), that is, measuring how much money is generated by investing in marketing
Machine Learning is also being used to understand the return on investment of marketing itself ( MROI ), that is, measuring how much money is generated by investing in marketing
28
The datasets that track marketing spending are large, have more variables than cases, and contain relationships that are often non-linear mixed with responses that are well behaved and straightforward.
The datasets that track marketing spending are large, have more variables than cases, and contain relationships that are often non-linear mixed with responses that are well behaved and straightforward.
29
Prediction and understanding are related but independent goals of market research; It is simply a decision of the business regarding the problem at hand whether one goal might be favored over the other.
Prediction and understanding are related but independent goals of market research; It is simply a decision of the business regarding the problem at hand whether one goal might be favored over the other.
30
striking the right balance between prediction and understanding is still required
striking the right balance between prediction and understanding is still required
31
The complexity of big data and the algorithms that read it have increased the frequency with which predictive models take precedence as more and more marketing occurs in digital environments where automation is possible and desirable.
The complexity of big data and the algorithms that read it have increased the frequency with which predictive models take precedence as more and more marketing occurs in digital environments where automation is possible and desirable.
32
data that is collected for any purpose subsequently used in research other than to meet the needs of your particular study is called “secondary” data.
data that is collected for any purpose subsequently used in research other than to meet the needs of your particular study is called “secondary” data.
33
secondary data in some detail: any purpose other than to meet the needs of your particular study. non-specific research purposes, called “syndicated” or multi-client data. another purpose and subsequently used in research.
secondary data in some detail: any purpose other than to meet the needs of your particular study. non-specific research purposes, called “syndicated” or multi-client data. another purpose and subsequently used in research.
34
The second most important question in your research design, just behind the original purpose of the research, is “What is already known about your research goal?”
The second most important question in your research design, just behind the original purpose of the research, is “What is already known about your research goal?”
35
To reiterate and re-emphasize: All market research designs for every research project should begin with an assessment of what is already known about your research problem. The answer to that question almost always involves the search for and use of secondary data in its many forms, whether you are merely searching the Internet or buying needed data from a broker.
To reiterate and re-emphasize: All market research designs for every research project should begin with an assessment of what is already known about your research problem. The answer to that question almost always involves the search for and use of secondary data in its many forms, whether you are merely searching the Internet or buying needed data from a broker.
36
the advantage of each imagined example of secondary research is that it costs less time, money and effort.
the advantage of each imagined example of secondary research is that it costs less time, money and effort.
37
No single organization outside of governments could fund entire population studies such as censuses or large -scale public health studies, for example
No single organization outside of governments could fund entire population studies such as censuses or large -scale public health studies, for example
38
Secondary data may be older than you like or might be an annual estimate rather than the monthly trend information that would be best for your project; or the best information that you can nd may be at a regional level and you might need data about a city in the region
Secondary data may be older than you like or might be an annual estimate rather than the monthly trend information that would be best for your project; or the best information that you can nd may be at a regional level and you might need data about a city in the region
39
Two key elements to review to assess the potential for bias are the original purpose of the research or data and the methodology of the original research.
Two key elements to review to assess the potential for bias are the original purpose of the research or data and the methodology of the original research.
40
If possible, it is always a good idea to evaluate the original sources rather than a summary or an article describing the research
If possible, it is always a good idea to evaluate the original sources rather than a summary or an article describing the research
41
The best way to evaluate accuracy is to examine multiple sources of the same estimates
The best way to evaluate accuracy is to examine multiple sources of the same estimates
42
Finally, you should evaluate the credibility and reputation of the source of your data
Finally, you should evaluate the credibility and reputation of the source of your data
43
One part of the syndicated data industry builds these pro les and makes them available to buyers.
One part of the syndicated data industry builds these pro les and makes them available to buyers.
44
The bene t from knowing more about your customer is pretty obvious, and there are many ways to know them. One part of the syndicated data industry builds these pro les and makes them available to buyers.
The bene t from knowing more about your customer is pretty obvious, and there are many ways to know them. One part of the syndicated data industry builds these pro les and makes them available to buyers.
45
Psychographics pertain to people’s values, attitudes, lifestyles , and personalities. These deeper, more personal and less obviously observed criteria are helpful in understanding consumer motivation.
Psychographics pertain to people’s values, attitudes, lifestyles , and personalities. These deeper, more personal and less obviously observed criteria are helpful in understanding consumer motivation.
46
Demographics: Grouping people by age, gender, ethnicity, income, education, geographic or other physical and structural characteristics.
Demographics: Grouping people by age, gender, ethnicity, income, education, geographic or other physical and structural characteristics.
47
Behavioral: Grouping people by common behaviors, e.g., products purchased, websites or stores visited, shopping behaviors.
Behavioral: Grouping people by common behaviors, e.g., products purchased, websites or stores visited, shopping behaviors.
48
Psychographics: Grouping people by shared values, | principles, personality, interests, and lifestyle.
Psychographics: Grouping people by shared values, | principles, personality, interests, and lifestyle.
49
Market Share is one of the oldest and simplest forms of measuring performance and seems so basic that many do not realize that Arthur Nielsen, Sr. actually invented it as a tracking measure in his first Nielsen syndicated database in the 1930s
Market Share is one of the oldest and simplest forms of measuring performance and seems so basic that many do not realize that Arthur Nielsen, Sr. actually invented it as a tracking measure in his first Nielsen syndicated database in the 1930s
50
Simply tracking the sales of a product relative | to total category sales is still a simple and powerful measure of performance
Simply tracking the sales of a product relative | to total category sales is still a simple and powerful measure of performance
51
A Development Index compares the sales of a product to either population (as sales per capita) or to total possible ACV in a market.
A Development Index compares the sales of a product to either population (as sales per capita) or to total possible ACV in a market.
52
The Developmental Indices allow you to quickly spot performance gaps and detect any regional issues that might result from retail performance, sales personnel performance or even product preference
The Developmental Indices allow you to quickly spot performance gaps and detect any regional issues that might result from retail performance, sales personnel performance or even product preference
53
High Category Indices show reasonable demand for products similar to ours, but for some reason, we are not appealing to the market or there is a barrier to our sales that bears further investigation.
High Category Indices show reasonable demand for products similar to ours, but for some reason, we are not appealing to the market or there is a barrier to our sales that bears further investigation.
54
The BDI/CDI chart can help visualize where our new product might have high potential (High CDI) but face strong competition (High Competitive BDI). When over 50 markets are commonly available, this matrix helps organize and visualize the information.
The BDI/CDI chart can help visualize where our new product might have high potential (High CDI) but face strong competition (High Competitive BDI). When over 50 markets are commonly available, this matrix helps organize and visualize the information.
55
Advanced analytic techniques can precisely estimate and predict the effects of the change for every item, product, brand, or category, but often simple observational techniques and arithmetic can provide much real information to inform price decisions
Advanced analytic techniques can precisely estimate and predict the effects of the change for every item, product, brand, or category, but often simple observational techniques and arithmetic can provide much real information to inform price decisions
56
Every company generates and retains data as part of | conducting their business and much of it is useful in answering market research questions.
Every company generates and retains data as part of | conducting their business and much of it is useful in answering market research questions.
57
Most modern businesses compile information in customer databases, data warehouses, or enterprise decision support systems. Increasingly this information is augmented by external information obtained from sources we will discuss later so that massive amounts of information are available internally.
Most modern businesses compile information in customer databases, data warehouses, or enterprise decision support systems. Increasingly this information is augmented by external information obtained from sources we will discuss later so that massive amounts of information are available internally.
58
Modern computing power is enabling businesses to track and manage millions of customers, whether consumers or businesses, in a practice called Customer Relationship Management (CRM
Modern computing power is enabling businesses to track and manage millions of customers, whether consumers or businesses, in a practice called Customer Relationship Management (CRM
59
Data mining is the application of usually automated analytic techniques to nd patterns in data that can be used to grow a business
Data mining is the application of usually automated analytic techniques to nd patterns in data that can be used to grow a business
60
two types of | external data: Syndicated Services and Big Data.
two types of | external data: Syndicated Services and Big Data.
61
Technologies that promise to automate and discover new insights about customers and consumers are transforming the market research industry as well. Technology and the Internet have now automated many market research processes that replace what was considerable human effort. Data coding, text analysis, sample selection, and questionnaire management and creation are all examples Automated facial recognition algorithms can now detect emotional response to advertising real time. Advanced data fusion techniques interconnect and link attitudes and opinions from thousands of surveys that may share only the minimum of actual questionnaire content
Technologies that promise to automate and discover new insights about customers and consumers are transforming the market research industry as well. Technology and the Internet have now automated many market research processes that replace what was considerable human effort. Data coding, text analysis, sample selection, and questionnaire management and creation are all examples Automated facial recognition algorithms can now detect emotional response to advertising real time. Advanced data fusion techniques interconnect and link attitudes and opinions from thousands of surveys that may share only the minimum of actual questionnaire content
62
So it makes sense that primary and secondary research often work together in the same project
So it makes sense that primary and secondary research often work together in the same project
63
The use of multiple sources validates your data and investigation by cross verifying the same ideas and information. This process is sometimes known as data triangulation , taken from the navigation process of locating an unknown point in space through geometric relationships among other known points. You can validate by triangulating data sources, research methods, or even theories
The use of multiple sources validates your data and investigation by cross verifying the same ideas and information. This process is sometimes known as data triangulation , taken from the navigation process of locating an unknown point in space through geometric relationships among other known points. You can validate by triangulating data sources, research methods, or even theories
64
Data triangulation is simply using evidence from many di erent types of data sources, such as interviews, documents, public records, social media conversations, or observations.
Data triangulation is simply using evidence from many di erent types of data sources, such as interviews, documents, public records, social media conversations, or observations.
65
``` Theory triangulation is a bit more complicated because the theories are helping you understand your data better rather than the sorts of integration required in data and methodology triangulation. Here you are applying different theories to the data to help make sense of it. One theory might support your data and another might undermine it. ```
``` Theory triangulation is a bit more complicated because the theories are helping you understand your data better rather than the sorts of integration required in data and methodology triangulation. Here you are applying different theories to the data to help make sense of it. One theory might support your data and another might undermine it. ```
66
It is important to remember when working with secondary data that it often contains personal data; that is, data that can be used either directly or indirectly (e.g., by combining it with other data) about a specific individual.
It is important to remember when working with secondary data that it often contains personal data; that is, data that can be used either directly or indirectly (e.g., by combining it with other data) about a specific individual.
67
Two key responsibilities that apply to secondary data: 1. Researchers must ensure that personal data used in research is thoroughly protected from unauthorized access and never disclosed without the consent of the data subject. 2. Researchers must always behave ethically and not do anything that might cause harm to a data subject or damage the reputation of market, opinion, and social research.
Two key responsibilities that apply to secondary data: 1. Researchers must ensure that personal data used in research is thoroughly protected from unauthorized access and never disclosed without the consent of the data subject. 2. Researchers must always behave ethically and not do anything that might cause harm to a data subject or damage the reputation of market, opinion, and social research.
68
This data is collected or purchased by research companies and then the curated data is sold to multiple buyers to help track or interpret their businesses. The companies collect the data once, but sell it many times to multiple buyers as a subscription. The research companies also leverage the data for research products and consulting services that may be customized for their customers.
This data is collected or purchased by research companies and then the curated data is sold to multiple buyers to help track or interpret their businesses. The companies collect the data once, but sell it many times to multiple buyers as a subscription. The research companies also leverage the data for research products and consulting services that may be customized for their customers.
69
Many research companies syndicate what are called “ omnibus surveys ”, where questions on many topics are conducted during the same interview
Many research companies syndicate what are called “ omnibus surveys ”, where questions on many topics are conducted during the same interview
70
By keeping track of unique but anonymous identi ers and using advanced data fusion techniques, CivicScience builds a massive database of opinions, preferences, attitudes, and demographic information that it provides to its customers by subscription.
By keeping track of unique but anonymous identi ers and using advanced data fusion techniques, CivicScience builds a massive database of opinions, preferences, attitudes, and demographic information that it provides to its customers by subscription.
71
Household Panel Sales Data The types of syndicated sales tracking data discussed so far are very accurate estimates of sales and share in a market, but they hide some very important dynamics of purchasing necessary to understand trends, diagnose performance, and grow products with the right marketing plans
Household Panel Sales Data The types of syndicated sales tracking data discussed so far are very accurate estimates of sales and share in a market, but they hide some very important dynamics of purchasing necessary to understand trends, diagnose performance, and grow products with the right marketing plans
72
Market Measurement What the business does not know, however, is how much its competitors are selling, and the need for that knowledge drives the largest scale type of syndicated data: sales tracking for consumer goods.
Market Measurement What the business does not know, however, is how much its competitors are selling, and the need for that knowledge drives the largest scale type of syndicated data: sales tracking for consumer goods.
73
The markets for consumer package goods (CPG) known as Fast Moving Consumer Goods (FMCG) outside the United States, is particularly well suited for this kind of tracking. The distribution channels are well known, products are identi ed easily, and there are lots of CPG businesses that can pay for the information and support the costs of collecting the data.
The markets for consumer package goods (CPG) known as Fast Moving Consumer Goods (FMCG) outside the United States, is particularly well suited for this kind of tracking. The distribution channels are well known, products are identi ed easily, and there are lots of CPG businesses that can pay for the information and support the costs of collecting the data.
74
The Universal Product Code or UPC, developed in the early 1970s to help automate inventory tracking, proved also to be an essential factor in automating collection of market information and almost instantly ampli ed our ability to understand and optimize price, promotion, and distribution.
The Universal Product Code or UPC, developed in the early 1970s to help automate inventory tracking, proved also to be an essential factor in automating collection of market information and almost instantly ampli ed our ability to understand and optimize price, promotion, and distribution.
75
The UPC scanner that allowed quick check-out and payment at the supermarket also automatically entered each transaction into an electronic database that grew with each transaction
The UPC scanner that allowed quick check-out and payment at the supermarket also automatically entered each transaction into an electronic database that grew with each transaction
76
By the late 1970s, a research company named Information Resources Incorporated (IRI) realized the potential for using UPC’s as the basis for automated and powerful data collection for market research.
By the late 1970s, a research company named Information Resources Incorporated (IRI) realized the potential for using UPC’s as the basis for automated and powerful data collection for market research.
77
The basic structure of all syndicated scanner databases is similar no matter who builds or sells it. Databases consist of five essential elements: Product, Markets, class of Trade, Time Periods and measures
The basic structure of all syndicated scanner databases is similar no matter who builds or sells it. Databases consist of five essential elements: Product, Markets, class of Trade, Time Periods and measures
78
The basic measure of audience size is the audience rating .
The basic measure of audience size is the audience rating .
79
rating is de ned as the number of households with their radio/TV sets tuned to a particular station/channel or program for a speci ed length of time divided by the total number of households that have radio/TV.
rating is de ned as the number of households with their radio/TV sets tuned to a particular station/channel or program for a speci ed length of time divided by the total number of households that have radio/TV.
80
The other fundamental measure is audience share , which is the number of TV sets in use tuned to a program or commercial.
The other fundamental measure is audience share , which is the number of TV sets in use tuned to a program or commercial.
81
Nielsen has had to expand its service to cover this now very fragmented audience and other research companies have stepped up to o er di erent versions of so-called “ cross-platform ” audience measurement.
Nielsen has had to expand its service to cover this now very fragmented audience and other research companies have stepped up to o er di erent versions of so-called “ cross-platform ” audience measurement.
82
atistically, when a sample cannot or does not measure something that is real, it is called sampling error . Sampling error happens when a sample is not representative of a population or is not large enough to accurately measure a population condition.
atistically, when a sample cannot or does not measure something that is real, it is called sampling error . Sampling error happens when a sample is not representative of a population or is not large enough to accurately measure a population condition.
83
``` Although the datasets are large, they may not be representative, and therefore not projectable to wider situations The datasets build so fast they may accumulate incorrect information. Sometimes the underlying process behind the accumulation of data makes the data in the past less predictive of the future. A predictive process that works well enough once may not work in the future Sometimes the variables of big data are not the things that are closest to what we want to measure directly ```
``` Although the datasets are large, they may not be representative, and therefore not projectable to wider situations The datasets build so fast they may accumulate incorrect information. Sometimes the underlying process behind the accumulation of data makes the data in the past less predictive of the future. A predictive process that works well enough once may not work in the future Sometimes the variables of big data are not the things that are closest to what we want to measure directly ```
84
We are going to focus on one type of this research Sentiment Analysis , as an example, because of its popularity and promise. We will also see how big data can supplement and amplify primary research, and at least speculate on the value of the sensor data from connected devices as a secondary research application.
We are going to focus on one type of this research Sentiment Analysis , as an example, because of its popularity and promise. We will also see how big data can supplement and amplify primary research, and at least speculate on the value of the sensor data from connected devices as a secondary research application.
85
Sentiment Analysis tries to analytically determine the attitude of a speaker or writer with respect to some topic. The attitude may be an opinion, or an intended or actual emotional state.
Sentiment Analysis tries to analytically determine the attitude of a speaker or writer with respect to some topic. The attitude may be an opinion, or an intended or actual emotional state.
86
Computers read words through a process called Natural Language Processing (NLP) and they infer sentiment by a combination of NLP and machine learning. Both NLP and sentiment analysis are very complex computer processes to accomplish a task that seems obvious to humans, but the sheer scale of the data requires a machine. New algorithms are continuously improving our ability to accurately read this data in a field of computer science that is relatively young.
Computers read words through a process called Natural Language Processing (NLP) and they infer sentiment by a combination of NLP and machine learning. Both NLP and sentiment analysis are very complex computer processes to accomplish a task that seems obvious to humans, but the sheer scale of the data requires a machine. New algorithms are continuously improving our ability to accurately read this data in a field of computer science that is relatively young.
87
Three Sentiment Analysis Approaches Knowledge Based Statistically Based Hybrid
Three Sentiment Analysis Approaches Knowledge Based Statistically Based Hybrid
88
Knowledge Based Pre-classify text by categories based on the presence of unambiguous words such as happy, sad, angry, or bored. Assign scores to more ambiguous terms based on previous search Compare analysis text to pre-scored reference database
Knowledge Based Pre-classify text by categories based on the presence of unambiguous words such as happy, sad, angry, or bored. Assign scores to more ambiguous terms based on previous search Compare analysis text to pre-scored reference database
89
Statistically Based Use machine learning to train algorithms against known outcomes. For example, a machine can sift through reviews and learn which words in the reviews are associated with accompanying star ratings Compare analysis text to a reference database built from the statistical analysis
Statistically Based Use machine learning to train algorithms against known outcomes. For example, a machine can sift through reviews and learn which words in the reviews are associated with accompanying star ratings Compare analysis text to a reference database built from the statistical analysis
90
Hybrid Both methods combined, for example pre- coding unambiguous words and modeling more ambiguous terms against star ratings Compare analysis text to a reference database
Hybrid Both methods combined, for example pre- coding unambiguous words and modeling more ambiguous terms against star ratings Compare analysis text to a reference database
91
Given a target subject, then, sentiment analysis can analyze an enormous amount of text and classify whether the text is generally positive, negative, or neutral about the subject. The best techniques are pretty good at this task relative to human classi cation. Many studies have shown that there is generally about 80% agreement between human readers in classifying the same text. The best machines analysis achieves about 70%. That level is pretty good compared to 80%.
Given a target subject, then, sentiment analysis can analyze an enormous amount of text and classify whether the text is generally positive, negative, or neutral about the subject. The best techniques are pretty good at this task relative to human classi cation. Many studies have shown that there is generally about 80% agreement between human readers in classifying the same text. The best machines analysis achieves about 70%. That level is pretty good compared to 80%.
92
Machines currently are not as accurate when assessing finer degrees of positivity or negativity, as in the difference between a 4- and a 5-star rating in a review.
Machines currently are not as accurate when assessing finer degrees of positivity or negativity, as in the difference between a 4- and a 5-star rating in a review.
93
Probably the biggest limitation of sentiment analysis is that all social listening data, no matter how much of it exists, is essentially qualitative. Most comments are by de nition non-representative. People passionate enough to comment generate these comments, and they are people with access to social media. The large number of comments does not change this inherent fact.
Probably the biggest limitation of sentiment analysis is that all social listening data, no matter how much of it exists, is essentially qualitative. Most comments are by de nition non-representative. People passionate enough to comment generate these comments, and they are people with access to social media. The large number of comments does not change this inherent fact.
94
Internet of Things analysis is not yet a common form of data | for marketing organizations. It is being tested and we are all learning.
Internet of Things analysis is not yet a common form of data | for marketing organizations. It is being tested and we are all learning.