corpus linguistics Flashcards
(21 cards)
The ability to apply research results to a larger group of people.
Generalizability
A method that summarizes data using averages
Descriptive Analysis
The process of analyzing data using mathematical methods.
Statistical Analysis
The outcome we measure in a study (e.g. test scores).
Dependent Variables
The factor that influences the outcome (e.g. study time)
Independent Variables
Factors that stay the same to ensure accurate results.
Control Variables
Intervening Variables
Hidden factors that might affect the dependent variable.
Nominal Scale
- A scale used for categories without a numerical value (e.g. gender
Numeric Scale -
A scale that uses numbers (e.g. weight
Interval Scale
- A numerical scale with equal spacing but no true zero (e.g. temperature).
Boxplot -
A graph that shows the spread of data including the median and outliers.
Mean (Average)
- The sum of all values divided by the number of values.
Variance -
A measure of how much data points differ from the mean.
Standard Deviation -
A number that shows how spread out the data is.
KWIC (Key Word in Context)
- A way to search for a word and see it in different sentences.
Collocates
- Words that commonly appear together (e.g. “make a decision”).
N-grams
- Sequences of words: 1-gram (one word) 2-gram (two words)
POS-tags (Parts of Speech tags)
- Labels for words based on their function (noun, verb
Token
- A single word in a text or corpus.
Type of Token
- The category of a word in a corpus.
TTR (Type-Token Ratio) - A measure of word variety in a text (higher TTR = more unique words).