L7- Evaluating interactive systems Flashcards

1
Q

What is the difference between formative and summative evaluation

A

Formative evaluation is used in the early stages of a project to compare, assess and refine design ideas.
Formative evaluation often involves OPEN research questions where the researcher is interested in learning further information that may inform the design
Summative evaluation is more likely to be used in the later stages of a project and involve CLOSED research questions with the purpose of testing and evaluating systems according to predefined criteria

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s the difference between analytical and empirical evaluation methods

A

Analytical : based on applying a theory to analysis and discussion of the design, in the absence of real world users
Empirical : making observations and measurements of users

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s the difference between quantitative and qualitative evaluation

A

Numbers versus words/pictures/audio/video

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why are analytical methods useful for formative evaluation?

A

Analytical methods are useful for formative evaluation, because if the system design has not yet been completed, it may be difficult to observe how it is used (although low fidelity prototypes can be helpful here)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give some examples of qualitative evaluation methods

A

Qualitative analytic methods include cognitive walkthrough (useful for closed research questions), and the cognitive dimensions of notations framework (useful for open research question).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Give examples of quantitative analytic methods

A

The Keystroke Level Model is a quantitative analytic method, which can be used to create numerical comparisons of closed research questions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Give examples of qualitative empirical methods

A

think-aloud, interviews, and field observation –> ethnographic approaches
They are usually associated with open research questions, where the objective is to learn new information relevant to system design or use

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Give examples of quantitative empirical methods

A

Quantitative empirical methods generally require a working system, so are most often summative
Examples include the use of analytics and metrics in A/B experiments, and also controlled laboratory trials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain how to run RCTs

A

Decide on a performance measure
Find a representative sample of the target population (who have given informed consent to participate)
Find an experimental task that can be used to collect performance data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How might we measure the results of an RCT?

A

Effect size – impact on the mean performance
Measure correlation with factors that might improve performance
Report significance measures to check whether the observed effects might have resulted from random variation or other factors than the treatment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What problems are associated with RCTs?

A

Overcoming natural variation needs large samples
RCTs don’t provide understanding of why a change occurred
This means that it is hard to know whether the effect will generalise (for example to commercial contexts)
If there are many relevant variables that are orthogonal to each other, such as different product features or design options, many separate experiments might therefore be required to distinguish between their effects and interactions
Thus RCTs aren’t often used for design research in commercial products
A more justifiable performance measure is profit maximisation, but sales/profit are often hard to measure with useful latency
Companies therefore tend to use PROXY MEASURES such as the number days that customers continue actively to use the product

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is internal validity?

A

What the study done right
Reproducibility
Scientific integrity
Refutability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is external validity?

A

Does the study tell us useful things
Focussing on whether the results can be generalisable to real world situation, including factors such as representativeness of the sample population, the experimental task and the application context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe two ways of analysing qualitative data

A

While we can use statistical comparison of quantitative measures from controlled experiments; interviews and field studies require analysis of qualitative data
Qualitative data is often recorded and transcribed as written text, so the analysis can proceed using a reproducible scientific method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is categorical coding, explain how to do it.

A

Categorical coding is a qualitative data analysis method that can be used to answer ‘closed’ questions, for example, comparing different groups of people or users of different products
The first step is to create a “coding frame” of expected categories of interest
The text data is then segmented (for example on phrase boundaries)
Each segment is assigned to one category, so that frequency and correspondence can be compared
In a scientific context, categorical coding should incorporate some assessment of inter-rater reliability, where two or more people make the coding decisions independently to avoid systematic bias or misinterpretation
Compare how many decisions agree relative to chance using a statistical measure such as Cohen’s Kappa for 2 people, or Fleiss’ Kappa for more and comparing to typical levels (0.6 - 0.8 is considered substantial agreement)
Inter-rater reliability may take account of how many decisions still disagreed after discussion, which may involve refining and iterating the coding frame to resolve decision criteria
It is often useful to ‘prototype’ the coding frame by having the independent raters discuss a sample before proceeding to code the main corpus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is grounded theory?

A

Qualitative data analysis method that can be used to explore open questions where there is no prior expectation or theoretical assumption of the insights that the researcher is looking for
First step: read the data closely, looking for interesting categories (‘open coding’)
The researcher then collects fragments, writing ‘memos’ to capture insights as they occur
Emerging themes are organised using axial coding' across different sources of evidence It is important to constantly compare memos, themes and findings to the original data in order to ensure that these can be objective justified The process ends when the theoretical description has reached saturation’ in relation to the original data, with the main themes complete and accounted for

17
Q

Explain how to get ethical clearance

A

Inform ethics committee before you collect any data or recruit any participants
Describe the study, who will participate, what you will ask them to do, what data you will collect
What precautions are being taken, as appropriate to the nature of the research, including the approach taken to informed consent, and whether participants will be anonymous

18
Q

What are three analytical evaluation options?

A

Cognitive walkthrough
KLM/GOMS
Cognitive Dimensions

19
Q

When would you use cognitive walkthrough?

A

Cognitive Walkthrough: Is normally used in formative contexts – if you do have a working system, then why aren’t you observing a real user, which is far more informative than simulating or imagining one? However, Cognitive Walkthrough can be a valuable time-saving precaution before user studies start, to fix blatant usability bugs.

20
Q

When would you use KLM/GOMS?

A

KLM/GOMS: It is unlikely that you’ll have alternative detailed UI designs in advance, so there is not much to be learned from using these methods in the context of a Part II project. If do you have a working system, a controlled observation is superior

21
Q

When would you use Cognitive Dimensions?

A

Is better suited to less structured tasks than Cognitive Walkthrough and KLM/GOMS, which rely on predefined user goal and task structure

22
Q

What empirical approaches could you choose from?

A

Interviews/ethnography
Think-aloud / wizard of oz
Controlled experiments

23
Q

When would you collect data using interviews/ethnography?

A

Useful in formative/preparation phase where an open research method is helpful in developing design ideas or capturing user requirements

24
Q

When would you use think-aloud/wizard of oz?

A

Valuable for both paper prototypes and working systems
Highly effective at uncovering usability bugs as long as the verbal protocol is analysed rigorously using qualitative methods

25
Q

When would you use controlled experiments?

A

Can help to establish the engineering aspects of the work
Important to ensure you can measure the important attributes in a meaningful way (with both internal and external validity)
need to test significance and report confidence interval of observed means and effect sizes

26
Q

When would you use surveys and informal questionnaires

A

Be clear what you are measuring
Is self reporting likely to be accurate
Use a mix of open questions, which capture richer qualitative information, and closed questions that make it easier to aggregate and test hypotheses
Open questions require a coding frame to structure and compare data, or grounded theory methods (if you have a broader research question)
Collecting survey data via interviews is likely to give more insight but questionnaires are faster so that you can collect data from a larger sample
Remember to test questionnaires with a pilot study as it’s easier to get them wrong than with interviews

27
Q

When would you use field testing

A

If a working product exists it may be possible to make a controlled release and collect data on how it is used
Make a risk assessment
Seek ethics approval before proceeding

28
Q

When would you use standardised survey instruments

A

These are standard psychometric instruments to evaluate mental states such as fatigue, stress, confusion and emotional state
There are also standard methods to assess individual differences (e.g. personality, intelligence)
Use standardised approaches wherever possible, so your results can be compared to existing scientific literature
Making changes to these standardised surveys generally invalidates the results

29
Q

What are some bad evaluation techniques?

A

Don’t use purely affective reports
Don’t ask a biased group – e.g. your friends – experimental demand
Don’t make claims that sound as though they result from a formative analytic process but are actually subjective
Don’t use introspective reports made by a single subject – might be biased and subjective

30
Q

How would you evaluate a non HCI project

A

Approach testing as a scientific exercise
Define goals and hypotheses and understand the boundaries and performance limits of your system by exploring them
Keep in mind that it’s often necessary to test to point of failure so that you can make comparisons or explain limits
For non-interactive projects, still necessary to decide whether evaluation should be analytic (proceeding by reasoning and argument, in which case you should ask how consistent and well-structured is your analytic framework) OR empirical (proceeding by measurement/observation, in which case you should ask what you are measuring and why, and ensure that you have achieved scientific validity, where the measurements are compatible with your claims).
All projects can include a mix of formative and summative evaluation
If you only evaluate formatively – did you finish your project?
If carrying out summative evaluation, be clear whether the evaluation criteria are internal (derived from some theory) or external (addressing some problem)
Need to establish objectivity of qualitative data (i.e. that it isn’t simply your own opinion).