1. Determine the Goals 2. Explore the Questions 3. Choose the Approach and Methods 4. Evaluate, Interpret & Present Data

Researchers must respect the safety, welfare, and dignity of human participants in their research and treat them equally and fairly Criteria for approval: - research methodology - risks or benefits - the right not to participate, to terminate participation, etc. - the right to anonymity and confidentiality

10 Evaluation Flashcards by Mina Paulus

Why Evaluate?

Well-designed products sell
To ensure that system matches the users‘ needs
To discover unforeseen problems
To compare your solution against competitors ( „We are x % better than…“)

How well did you know this?

Not at all

Perfectly

Where to Evaluate?

Naturalistic Approach: Field Studies

Usability Lab

How well did you know this?

Not at all

Perfectly

When to Evaluate and who evaluates when?

Evaluation should happen throughout the entire software development process

Early designs: evaluated by the design team, analytically and informally
Later implementations: evaluated by users, experimentally and formally

How well did you know this?

Not at all

Perfectly

Evaluation Methods

Determine the Goals
Explore the Questions
Choose the Approach and Methods
Evaluate, Interpret & Present Data

How well did you know this?

Not at all

Perfectly

Important aspects in creating an evaluation process?

Reliability: can the study be replicated?
Validity: is it measuring what you expected?
Biases: is the process creating biases?
Scope: can the findings be generalized?
Ethics: are ethics standards met?

How well did you know this?

Not at all

Perfectly

External vs Internal Validity

External validity
-> confidence that results apply to real situations
-> usually good in natural settings
Internal validity
-> confidence in our explanation of experimental results
-> usually good in experimental settings

How well did you know this?

Not at all

Perfectly

Ethics Approval

Researchers must respect the safety, welfare, and dignity of human participants in their research and treat them equally and fairly*

Criteria for approval:

research methodology
risks or benefits
the right not to participate, to terminate participation, etc.
the right to anonymity and confidentiality

How well did you know this?

Not at all

Perfectly

Ethics - Before the test (5 things)

Only use volunteers
Inform the user
Maintain privacy
Make users feel comfortable
Don’t waste the user’s time

How well did you know this?

Not at all

Perfectly

Ethics - During the test (4 things)

Maintain privacy
Make users feel comfortable
Don’t waste the user’s time
Ensure participant health and safety

How well did you know this?

Not at all

Perfectly

Ethics - After the test

Inform the user
Maintain privacy
Make users feel comfortable

How well did you know this?

Not at all

Perfectly

Usability Testing

Focus on: how well users perform tasks with the product (time to complete task and number & type of errors)

-> Controlled environmental settings

How well did you know this?

Not at all

Perfectly

Signal & Noise Metaphor

Experiment design seeks to enhance the signal (variable of interest),
while minimizing the noise (everything else (random influences))

How well did you know this?

Not at all

Perfectly

Controlled Experiment: Steps

Determine the goals, explore the questions, then formulate hypothesis
Design experiment, define experimental variables
Choose subjects
Run pilot experiment
Iteratively improve experiment design
Run experiment
Interpret results to accept or reject hypothesis

How well did you know this?

Not at all

Perfectly

Experimental Variables

Independent Variables
Dependent Variables
Control Variables
Random Variables
Confounding Variables

How well did you know this?

Not at all

Perfectly

Independent Variable - Definition & Examples

An independent variable is under your control
Independent because it is independent of participant behavior

Interface, device, button layout, visual layout, feedback mode, age, gender, background noise, expertise, etc.

Must have at least two levels (values/settings) -> test conditions

How well did you know this?

Not at all

Perfectly

Dependent Variable - Definition & Examples

measured human behavior, depends on what the participant does
is measured during the experiment

Task completion time, speed, accuracy, error rate, throughput, target re-entries, task retries, presses of backspace, etc.

Control Variable - Definition & Examples

a circumstance that is kept constant

more control -> less variability, less generalizable

Random Variable - Definition & Examples

circumstance that is allowed to vary randomly -> more variability (bad), but more generalizable

Confounding Variable - Definition & Examples

circumstance that varies systematically with an independent variable

Experiment Task - Good Task Qualities:

Represent activities people do with the interface

Discriminate among the test conditions

Hypothesis vs Claim

A claim predicts the outcome of an experiment
Example: Reading a text in upper case takes longer than reading it in sentence case
A hypothesis claims that changing independent variables influences dependent variables
Example: Changing the case (independent variable) influences reading time (dependent variable)

> Experiment goal: Confirm hypothesis
> Statistical approach: Reject null hypothesis

Statistical Tests - 2 Types

Parametric
-> Data are assumed to come from a distribution, such as the normal distribution, t-distribution, etc.
Non-parametric
-> Data are not assumed to come from a distribution

Statistical Tests - Which test for nominal and ordinal (gender, age groups, …)

Non-parametric tests (e.g., Chi-square test)

Statistical Tests - Which test for Interval and Ratio (temperature in C or K, …)

Parametric tests (e.g., t-test, ANOVA), or Non-parametric tests

too few vs too many participants?

Too few: experimental effects fail to achieve statistical significance Too many: statistical significance even for very small effect sizes

Within-subjects, Between-subjects

Within-subjects: each participant is tested on each condition Between-subjects: each participant is tested on one condition only

Order Effects and how to avoid them

Order effects / learning effects can occur when the same participant is doing a similar task multiple times -> only relevant for within-subject factors Avoid them by: - participants divided into groups, with different orders for test conditions (latin square)

Longitudinal Studies

research that seeks to promote and investigate learning -> practice is the independent variable

Analytical Evaluation Methods (2)

heuristic evaluation | cognitive walkthrough

Golden rules of UI design

1. Keep the interface simple 2. Speak the user‘s language 3. Be consistent and predictable 4. Make things visible and provide feedback 5. Minimize the user‘s memory load 6. Design for error: Avoid errors, help to recover from errors, offer undo 7. Design clear exits and closed dialogs 8. Include help and documentation 9. Offer shortcuts for experts 10. Make the system responsive

Heuristic Evaluation- How many evaluators?

3-5 evaluators

Cognitive Walkthrough

Experts “walk” through the design prototype with usage scenario(s) Experts analyze each task following 3 questions: 1. Will the correct action be sufficiently evident to the user? 2. Will the user notice that the correct action is available? 3. Will the user associate and interpret the response from the action correctly?

Model-Based Evaluation - 3 Examples

GOMS Keystroke Level Model („daughter“ model of GOMS) Fitt‘s Law

GOMS - Name and Main princile

use model of execution time for basic tasks to predict how long a sequence of actions takes GOMS = Goals, Operators, Methods, Selection rules (Selection rules decide which method to select when there is more than one)

Keystroke Level Model

refinement of GOMS that provides a quantitative model about execution times assigns each operator a context-independent average duration