Lecture 11 Flashcards by Diya Ajwani

When does text become text-data?

Text becomes text-data when it is collected for use, analysis, recording

How well did you know this?

Not at all

Perfectly

Examples of text structure

*plain sentences
*list of words
*structured table (one row per speech)
*tags/;labels

How well did you know this?

Not at all

Perfectly

Heterogenous Structure

The way we store text data depends on how we plan to analyze it

How well did you know this?

Not at all

Perfectly

Qualitative Coding of Texts

It is when researchers defines set of categories (policy topic, sentiment) and assigns documents/text into categories by hand
*also called “manual content analysis”

How well did you know this?

Not at all

Perfectly

Quantitative Text Analysis

Translating text into numbers (eg. word frequency)

How well did you know this?

Not at all

Perfectly

Examples of Quantitative Text Analysis

*Text classification methods
*Scaling methods
*Text Similarity
*Text Reuse Method

How well did you know this?

Not at all

Perfectly

Why Text as Data Exist?

Volume and accessibility of text data increased, making it easier to use text for quantitative analysis

How well did you know this?

Not at all

Perfectly

What sort of questions can we answer?

*Descriptive
*Inferential

How well did you know this?

Not at all

Perfectly

Descriptive Text Analysis

To describe content of text in clear factual way. Summarizing/categorizing what is in the text
eg. count how many times politician says “security”, “border”, “asylum”

How well did you know this?

Not at all

Perfectly

Inferential Text Analysis

Relies on text to draw broader generalizable conclusions about political actors, society, hidden attitudes
*eg. use speech to guess politician’s ideology

How well did you know this?

Not at all

Perfectly

What is a corpus in text analysis?

A structured collection of texts in a machine-readable format used for analysis

How well did you know this?

Not at all

Perfectly

5 steps to quantitative text analysis

decide research question/objective
acquire documents
create a corpus or dataset (machine-readable format)
pre-process text
perform analysis

How well did you know this?

Not at all

Perfectly

What does a Document-Term Matrix (DTM) represent?

A table where rows = documents, columns = terms, and cells = word frequency in each document.

How well did you know this?

Not at all

Perfectly

Why do we pre-process text data?

To reduce noise (irrelevant or meaningless information) and prepare the text for analysis

How well did you know this?

Not at all

Perfectly

What are common pre-processing steps in text analysis?

*remove stop words (and, the, in)
*standardize terms (US and USA in same format)
*lowercase words
*lemmatization (reduce word to root format) eg. “running” to “run”

note: step taken is context specific to research objective

How well did you know this?

Not at all

Perfectly

What is a “text data frame” under tidy data principles?

Study These Flashcards

A table where each row represents a unit of observation such as a document, sentence, or paragraph

Two common methods of quantitative text analysis

Study These Flashcards

Dictionary Analysis
Topic Modelling

What is dictionary analysis in text classification?

Study These Flashcards

An automated method that classifies text based on a predefined list of keywords (dictionary)
1. pre-existing
2. custom made

What are the pros and cons of pre-existing dictionaries?

Study These Flashcards

Easy to use, but may lack validity in different contexts (e.g., movie reviews ≠ political speeches)

What’s the benefit of creating custom dictionaries?

Study These Flashcards

Higher validity since it’s tailored to your specific texts, but more time-consuming.

What is topic modeling?

Study These Flashcards

A method where the computer discovers topics in the text based on word patterns—no predefined categories.

In topic modeling, who assigns meaning to the topics?

Study These Flashcards

The researcher, based on the keywords the computer groups together.

What is text similarity and how is it measured?

Study These Flashcards

It measures how similar documents are, often using cosine similarity based on word usage

What is text reuse?

Study These Flashcards

Identifying identical or copied chunks of text across documents—used in plagiarism detection and tracking info flow.

What determines the analysis method used in text analysis?

The goals of the research question (e.g., classification, topic discovery, comparison).

What is the purpose of combining methods in quantitative text analysis?

To gain richer insights by using multiple techniques (e.g., querying, clustering, dictionary analysis, and frequency analysis) that each reveal different dimensions of the text data.

What was Lawlor (2015)'s main research question?

To analyze how immigration is framed in Canadian and British news media from 1993 to 2013, and whether focusing events (e.g., 9/11, 2005 London bombings) changed that framing.

Why is validation important in text analysis?

Validation ensures your methods and results are accurate, reliable, and meaningful, and that they reflect the concepts you aim to study.

Types of validation techniques in text analysis?

1. Content validity – do the categories reflect the concept? 2. Inter-coder agreement – do humans agree on text labels? 3. Comparison - does automated output match what humans would do?

Lecture 11 Flashcards

Text as data (29 cards)