Lecture 11 Flashcards

Text as data (29 cards)

1
Q

When does text become text-data?

A

Text becomes text-data when it is collected for use, analysis, recording

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Examples of text structure

A

*plain sentences
*list of words
*structured table (one row per speech)
*tags/;labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Heterogenous Structure

A

The way we store text data depends on how we plan to analyze it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Qualitative Coding of Texts

A

It is when researchers defines set of categories (policy topic, sentiment) and assigns documents/text into categories by hand
*also called “manual content analysis”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Quantitative Text Analysis

A

Translating text into numbers (eg. word frequency)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Examples of Quantitative Text Analysis

A

*Text classification methods
*Scaling methods
*Text Similarity
*Text Reuse Method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why Text as Data Exist?

A

Volume and accessibility of text data increased, making it easier to use text for quantitative analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What sort of questions can we answer?

A

*Descriptive
*Inferential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Descriptive Text Analysis

A

To describe content of text in clear factual way. Summarizing/categorizing what is in the text
eg. count how many times politician says “security”, “border”, “asylum”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Inferential Text Analysis

A

Relies on text to draw broader generalizable conclusions about political actors, society, hidden attitudes
*eg. use speech to guess politician’s ideology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a corpus in text analysis?

A

A structured collection of texts in a machine-readable format used for analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

5 steps to quantitative text analysis

A
  1. decide research question/objective
  2. acquire documents
  3. create a corpus or dataset (machine-readable format)
  4. pre-process text
  5. perform analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does a Document-Term Matrix (DTM) represent?

A

A table where rows = documents, columns = terms, and cells = word frequency in each document.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why do we pre-process text data?

A

To reduce noise (irrelevant or meaningless information) and prepare the text for analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are common pre-processing steps in text analysis?

A

*remove stop words (and, the, in)
*standardize terms (US and USA in same format)
*lowercase words
*lemmatization (reduce word to root format) eg. “running” to “run”

note: step taken is context specific to research objective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a “text data frame” under tidy data principles?

A

A table where each row represents a unit of observation such as a document, sentence, or paragraph

17
Q

Two common methods of quantitative text analysis

A
  1. Dictionary Analysis
  2. Topic Modelling
18
Q

What is dictionary analysis in text classification?

A

An automated method that classifies text based on a predefined list of keywords (dictionary)
1. pre-existing
2. custom made

19
Q

What are the pros and cons of pre-existing dictionaries?

A

Easy to use, but may lack validity in different contexts (e.g., movie reviews ≠ political speeches)

20
Q

What’s the benefit of creating custom dictionaries?

A

Higher validity since it’s tailored to your specific texts, but more time-consuming.

21
Q

What is topic modeling?

A

A method where the computer discovers topics in the text based on word patterns—no predefined categories.

22
Q

In topic modeling, who assigns meaning to the topics?

A

The researcher, based on the keywords the computer groups together.

23
Q

What is text similarity and how is it measured?

A

It measures how similar documents are, often using cosine similarity based on word usage

24
Q

What is text reuse?

A

Identifying identical or copied chunks of text across documents—used in plagiarism detection and tracking info flow.

25
What determines the analysis method used in text analysis?
The goals of the research question (e.g., classification, topic discovery, comparison).
26
What is the purpose of combining methods in quantitative text analysis?
To gain richer insights by using multiple techniques (e.g., querying, clustering, dictionary analysis, and frequency analysis) that each reveal different dimensions of the text data.
27
What was Lawlor (2015)'s main research question?
To analyze how immigration is framed in Canadian and British news media from 1993 to 2013, and whether focusing events (e.g., 9/11, 2005 London bombings) changed that framing.
28
Why is validation important in text analysis?
Validation ensures your methods and results are accurate, reliable, and meaningful, and that they reflect the concepts you aim to study.
29
Types of validation techniques in text analysis?
1. Content validity – do the categories reflect the concept? 2. Inter-coder agreement – do humans agree on text labels? 3. Comparison - does automated output match what humans would do?