10 Evaluation Flashcards
Why Evaluate?
Well-designed products sell
To ensure that system matches the users‘ needs
To discover unforeseen problems
To compare your solution against competitors ( „We are x % better than…“)
Where to Evaluate?
Naturalistic Approach: Field Studies
Usability Lab
When to Evaluate and who evaluates when?
Evaluation should happen throughout the entire software development process
Early designs: evaluated by the design team, analytically and informally
Later implementations: evaluated by users, experimentally and formally
Evaluation Methods
- Determine the Goals
- Explore the Questions
- Choose the Approach and Methods
- Evaluate, Interpret & Present Data
Important aspects in creating an evaluation process?
Reliability: can the study be replicated?
Validity: is it measuring what you expected?
Biases: is the process creating biases?
Scope: can the findings be generalized?
Ethics: are ethics standards met?
External vs Internal Validity
External validity
-> confidence that results apply to real situations
-> usually good in natural settings
Internal validity
-> confidence in our explanation of experimental results
-> usually good in experimental settings
Ethics Approval
Researchers must respect the safety, welfare, and dignity of human participants in their research and treat them equally and fairly*
Criteria for approval:
- research methodology
- risks or benefits
- the right not to participate, to terminate participation, etc.
- the right to anonymity and confidentiality
Ethics - Before the test (5 things)
Only use volunteers Inform the user Maintain privacy Make users feel comfortable Don’t waste the user’s time
Ethics - During the test (4 things)
Maintain privacy
Make users feel comfortable
Don’t waste the user’s time
Ensure participant health and safety
Ethics - After the test
Inform the user
Maintain privacy
Make users feel comfortable
Usability Testing
Focus on: how well users perform tasks with the product (time to complete task and number & type of errors)
-> Controlled environmental settings
Signal & Noise Metaphor
Experiment design seeks to enhance the signal (variable of interest),
while minimizing the noise (everything else (random influences))
Controlled Experiment: Steps
- Determine the goals, explore the questions, then formulate hypothesis
- Design experiment, define experimental variables
- Choose subjects
- Run pilot experiment
- Iteratively improve experiment design
- Run experiment
- Interpret results to accept or reject hypothesis
Experimental Variables
- Independent Variables
- Dependent Variables
- Control Variables
- Random Variables
- Confounding Variables
Independent Variable - Definition & Examples
An independent variable is under your control
Independent because it is independent of participant behavior
Interface, device, button layout, visual layout, feedback mode, age, gender, background noise, expertise, etc.
Must have at least two levels (values/settings) -> test conditions
Dependent Variable - Definition & Examples
measured human behavior, depends on what the participant does
is measured during the experiment
Task completion time, speed, accuracy, error rate, throughput, target re-entries, task retries, presses of backspace, etc.
Control Variable - Definition & Examples
a circumstance that is kept constant
more control -> less variability, less generalizable
Random Variable - Definition & Examples
circumstance that is allowed to vary randomly -> more variability (bad), but more generalizable
Confounding Variable - Definition & Examples
circumstance that varies systematically with an independent variable
Experiment Task - Good Task Qualities:
Represent activities people do with the interface
Discriminate among the test conditions
Hypothesis vs Claim
A claim predicts the outcome of an experiment
Example: Reading a text in upper case takes longer than reading it in sentence case
A hypothesis claims that changing independent variables influences dependent variables
Example: Changing the case (independent variable) influences reading time (dependent variable)
- > Experiment goal: Confirm hypothesis
- > Statistical approach: Reject null hypothesis
Statistical Tests - 2 Types
Parametric
-> Data are assumed to come from a distribution, such as the normal distribution, t-distribution, etc.
Non-parametric
-> Data are not assumed to come from a distribution
Statistical Tests - Which test for nominal and ordinal (gender, age groups, …)
Non-parametric tests (e.g., Chi-square test)
Statistical Tests - Which test for Interval and Ratio (temperature in C or K, …)
Parametric tests (e.g., t-test, ANOVA), or Non-parametric tests