Assessment Flashcards

(68 cards)

1
Q

Practicality

A

Time-efficient

Not excessively expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

★ Reliability

A

No errors in scoring
Consistent and dependable : a reliable test should yield similar results.
Subjectivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Inter-rater reliability

A

Two Ts evaluate by using the same rating scale.
Failure stems from lack of scoring criteria.

Subjectivity of the raters

Subjectivity doesn’t enter into the scoring process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Intra-rater reliability이 violate되는 이유와 solution

A

Violation of such reliability can occur in case of unclear scoring criteria, fatigue, bias..

*soultion: careful specification of an analytic scoring instrument can increase both inter- and intra-rater reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

test reliability

A

items that have more than one correct anwer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

student-related reliability

A

temporary illness, fatigue, illness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Validity

A

Test measures exactly what it is supposed to measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Authenticity

A

lg is natural, contexualized items,
includes meaningful, relevant, interesting topics
stimulates real-world tasks
provides some thematic organization to items through episode.
eg) reading passages selected from real-world sources that test-takers are likely to encounter/
listening comprehension sections feature natural lg with hesitations, white noise, and interruptions.

Topics and situations are interesting and relevant to my life.
Tasks replicates, or clearly approximate, real-world tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Washback

A

formative
Give learners feedback that enhances their lg development.
How test influences both teaching and learning

Ts can provide information that washes back to Ss in the form of useful dialogues of strengths and weaknesses.

I expected the teacher to go over the test and give “advice” on what I should focus on in the near future.

No” feedback or comments” from the teacher were given.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

washback 높이려면?

A

to comment generously and specifically on test performance.
“comments and feedback”
Letter grades and numerical scores give no information of intrinsic interst to the S.

Formative tests, by definition, provide washback in the form of information to the learner on progress and goals.

Informal assessment: T provides interactive feedback ->washback 높아져
Formal assement: T provides information on Ss’ progress toward goals -> washback 높아져

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Criterion validity 정의와 두가지 종류 예시

A

하나의 새로운 시험을 기존 시험과 비교해서 타당성을 측정 : The extent to which the criterion of the test has actually been reached.

1) Predictive validity: e.g.) @ placement tests, admissions assessment batteries acheivement tests designed to determine Ss’ readiness to move on to another unit.
2) Concurrent validity: eg) high score -> actually proficiency in the lg.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Formative test

A

Formative tests, by definition, provide washback in the form of information to the learner on progress and goals.

Evaluationg Ss in the progress of forming their competencies and skills.
The delivery (by the T) and internalization 

All kinds of informal assessment are formative.

Gather information on the developmental “process” of their speaking process
Assess their performance regularly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Summative test

A

Measure what Ss have grasped at the end of a course or unit of instruction.
* Evaluate only product not process

Summative test fails to provide crucial info.
(cf. formative test는 정보제공)

One major test at the end of semester

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Norm-referenced

A

목적: to place test-takers along a mathematical continuum in rank order
primary concern: Practicability, realiability, validity

Such tests must have such fixed,predetermined responses.

Use the test results to award scholarships to the top 10%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Criterion-referenced

A

The test is criterion-referenced, assessing the extent to which the students achieved the goals of the class.
Primary concern: authenticity, washback
(실생활에서 그 능력 사용한다는 목표. 즉, 시험과 실생활 간 일치정도 authenticity/ feedback측면에서 washback)

Give test-takers feedback in the form of grades.
The distribution of Ss’ scores across a continuum may be of little concern as long as the instrument assesses appropriate objective.

The Ss who get over 10 out of 16 will pass the conversation course.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Test administration reliability

A

Classroom conditions for the test are equale for all students.

ex) aural comprehension test -> street noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Content validity

A

1) The tests assess real course objectives, direct testing
2) It requires test-takers to perform the behavior that is being measured.

Items focus on previously practiced in-class reading skills.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Construct validity

A

e.g.) conducting an oral interview
major components of oral proficiency: pronunciation, fluency,grammatical accuracy, vocab use, socio-linguistic appropriateness

e.g.) a simple written vocab quiz, covering the content of recent unit -> have Ss correctly define a set of words.
그런데, objective가 communicative use of words라면, writing of definitions certainly failes to match a construct of communicative lg use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Face validity

A

Whether the test looks as if it is measuring what it is supposed to measure.

Tests that relate to their course work./ familiar task/ directions are clear

The printing was too small. had to read five pages in one hour.

Lots of tasks were unfamiliar
I’ve never done those kinds of tasks in class.
material that she had not dealth with in class
It seemed like a writing test rather than a listening test.

The exam “look like” one that high school Ss normally take.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

needs analysis (needs assessment)

A

process of assessing the needs of Ss

Before designing course, it is necessary to make decisions about what would be taught and how it would be taught.

survey and interview

Info about what my Ss needed to learn or change, their learning styles, interestes, proficiency levels etc.

Based on the info, I decided on the course objectives, contents and activities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

a proficiency test/ standardized test

A

not linked to any particular textbook or specific course of study. (not limited to single skill in the lg. Rather, it tests “overall proficiency”.)

Summative and norm-referenced : provide results in the form of a single score, measure performance agaisnt a norm (w/ equated scores and percentile rank)

Not provide diagnositc feedback

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

summative feedback

A

Ss will receive a total score for the reading section

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

constructed-respons item

A

-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Item Response Distribution

A
  1. a certain wrong alternative was chosen by a greater number of high group students than low group students.
  2. more students chose the wrong alternative than those who chose the correct answer.
  3. A certain wrong alternative did not work as a distracter.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
the reliability of the test ***
Item18 deteriorates the internal consistency of the test. | low ability group Ss가 high ability group Ss보다 더 정답을 많이 맞추었을 경우
26
Item Facility
Item Difficulty The extent to which an item is easy or difficult for the proposed group of test-takers 정답을 고른 학생의 비율 보여줌 Mr.Park divided the number of Ss who correctly answered a particular item by the total number of Ss who took the test.
27
Item Discrimination
The extent to which an item differntiates btw high- and low- ability test-takers Item 20 shows the highest discrimination among the five items. Item 2 does not distinguish the upper level Ss from the lower level Ss. 예) 어떤 문항에서 잘하는애와 못하는애가 같은점수 받았다 -> have poor ID, because it didn't discrminate btw the two groups. INTERNAL CONSISTENCY many upper group students incorrectly chose option C. (Item 2 does not distinguish the upper level Ss from the lower level Ss. )
28
Distractor
no one from the upper group and lower group chose option B. Distractor a and b seem to be fulfilling their function of attracting some attention from lower-ability Ss.
29
portfolios
collections of Ss work useful for assessing stuent performance: 1. Ss have ownership over the process of learning, 2. Portfolios allow T to pay attention to Ss' progress as well as achievement.
30
Alternatives
``` portfolios conference Journals self-assessment/ peer-assessment observation ```
31
Alternatives
``` portfolios performance-based assessment conference Journals self-assessment/ peer-assessment observation ```
32
performance-based assessment
The T observes the performance | The task is evaluated through "direct observation" by the T.
33
performance-based assessment
The T observes the performance The task is evaluated through "direct observation" by the T. The task calls for the integration of language skills.
34
analytic rating scales
diagnostic information 제공
35
holistic rating scales (holistic scoring method)
-
36
discrete point test
assessing one point at a time On the assumption that lg can be broken down into component parts and that those parts can be tested successfully. e.g.) grammar and vocab items in multiple choice format./ Large scale stnadardized entrance
37
integrative test 종류와 integrative test가 강조하는 것
Cloze test Dictation emphasizing communication and authenticity / communicative competence
38
Cloze test 종류/ 특징
Fixed-ratio cloze: Every nth word is deleted in a text Rational-deletion cloze: Words are deleted in a text on a rational basis (eg. prepositions, sentence connectors) to assess specified grammatical or rhetorical categories. Rational deletion이 more washback, expectancy grammar (ability to predict the next item) 특징) integrative+ reading ability 측정하는 indirect testing.
39
Rational deletion cloze
specific content words are chosen to be deleted -> more washback, expectancy grammar (ability to predict the next time.) scoring is more difficult in rational deletion cloze than c-test.
40
Cloze test scoring method 종류acceptable word method
a scoring method that accepts a suitiable,grammatically and rhetorically acceptable word that fits the blank in the original text. (face validity 높다)
41
C-test 정의 및 특징
The second half of every other word is deleted it has a higher scoring reliability / lower validity
42
Cloze test 정의 / Ss가 어떤 competence 사용하나/ 종류
an integrative measure not only of reading ability but of other lg abilities * Ss use linguistic competence (formal schemata)/ background experience ( content schemata)/ strategic competence Fixed-ratio deletion Rational deletion
43
Cloze test scoring method 종류exact word method
a scoring method that is limited to accepting the same word found in the original text
44
dictation
It taps into grammatical and discourse competence
45
Subjective testing
Low reliability/ high validity Constructed resonse items e.g.) open-ended response*
46
Objective testing
It has predetermined fixed responses High test reliability, Low validity Selected resonse e.g.) T/F, multiple choice items
47
Direct testing
It involves the test-taker in accurately the target task. High content validity e.g.) Oral presentation, to test performance directly
48
Indirect testing
Learners are not performing the task itself but rather a task that is related in some way
49
Achievement tests
Limited to particular material and are offered after a course has focused on the obejectives in question Determine whether the course objectives have been met by the end of a given period instruction Summative: administrated at the end of a lessen,unit,or term of study Formative: when offereing feedback about the quality of a learner's performance
50
Placement tests
to place a student into a particular level of a lg curriculum or school Diagnostic Formative (correct/incorrect responses provide Ts with useful information on what may or may not be emphasized in the weeks to come)
51
Diagnostic tests
To diagnose aspects of a lg that a S needs to develop or that a course should include -> Should elicit info on what Ss need to work on in the future. Therefore, a diagnostic test will typically offer more detailed, subcategorized information on the learner.
52
Constructed resonse items
A type of test item or task that requires test-takers to respond to a series of open-ended questions by wr,sp or doing something rather than choose answers from already-made list.
53
computer adabptive testing
computer testing software that adjusts the questions depending on Ss' performance on previous test items.
54
Alternative tests (Performance-based assessment )
it requires Ss to perfrom,create,produce or do s/t. use real-world contexts. focus on process as well as products tap into higher level thinking and problem-solving skills provide info about both strengths and weaknesses of Ss involve "an integration of lg skills"
55
Performance-based assessment T의 주의점
- state the overall goal of the performance - specify the objectives (crieteria) of the performance in detail - prepare Ss for performance in stepwise progress - use a reliable evaluation form, checklist. - treat performances as opportunities for giving feedback and provide that feedback systematically - if possible, utilize self- and peer- assessment judiciously.
56
Rubrics
validity ↑, reliablity ↑ A rubric is a device used to evaluate open-ended, oral and written responses of learners - usually composed of a set of criteria or competencies, each with descriptions of levels of expectation - some rubrics involve scaling
57
Rubric-based assessment
not only were rubrics beneficial for teachers but Ss were also able to better focus their efforts, produce work of higher quality earn better grades, and feel less anxious about assignments. 장) rubrics provide points for Ss to focus on and goals to pursue 단) simplicity (makring a few points on a chart and consider our job is done!) may mask the depth and breadth of a S's attainment.
58
Portfolios
a purposeful collection of Ss' work that demonstrates their efforts, progress and acheivements. 장점) foster intrinsic motivation, responsibility and ownership - promote S-T interaction w/ the T as a facilitator - facilitate critical thinking, self-assessment and revision process - offer opportunities for collaborative work w/peers
59
포트폴리오 주의점
-State objectives clearly -Give guidelines on what materials to include (a sample portfoli from a previous Ss can help stimulate some thoughts on what to include) - Communicate assessment criteria to Ss. (self-assessment : formative -Provide positive washback - giving final assessments e.g.) a holistic scoring scale ranging from 1 to 6. narrative evaluation of perceived strengths and weakness by the T
60
Journals
the most formative of all the alternatives in assessment CONTENT VALIDITY ↑, WASHBACK ↑ ↑ a log of one's thoughts ,feelings, reactions, assessments, ideas, or progress toward goals, usually written w/ little attention to structure, form, or correctness. "written conversation between T and Ss"
61
Dialogue journals
They imply an interaction between the T and the S through dialouges or responses 장점) practice in writing fluently, using writing as a thinking process, emphasizing a stuent's own voice, afford a unique opportunity for a teacherto offer various kinds of feedback * T becomes better accuainted with their Ss in terms of both their learning progress and their affective states : meet Ss' individual needs 단점) It's difficult to set up criteria for evaluation 주의점 ) T should provide optimal feedback in your responses. - cheerleading feedback, instructional feedback, in which you suggest strategies or materials, reality-check feedback -> help Ss set more realistic expectations for their lg abilities
62
self-assessment | /peer-assessment
autonomy, develop motivation | / cooperative learning
63
Observation
observe Ss in the classroom assess Ss s/o their awarness naturalness of thier linguistic performance is maximized Can take the form of recording, checklist, ration scales
64
Holistic scoring
an approach that uses a "single general scale" to give a global rating for a test-taker's lg production 장) fast evaluation 단) no diagnostic info is avaible (no washback potential), raters need to be extensively trained to use the scale accurately
65
Analytic scoring
An approach that separtely rates a number of predetermined aspects (e.g. grammar, content, organization) of a test-taker's lg production (e. writing) => establishing learners to hone in on weakness and caplitalize on strengths PRACTICALITY ↓, in that more time is required for T to attend to details but ultimately Ss receive more information about their writing
66
Primary trait scoring
e.g.) 설득하는 글쓰기 -> 설득하는 측면에만 초점두어 점수매기기 It allows both writer and evaluator to focus on function
67
Multiple choice items
Practicality ↑: time-saving scroing procedures, Reliability ↑: pre-determined correct responses multiple choice itmes are all receptive, or selective response items in that the test-taker chooses from a set of responses. STEM: the body of the item that presents a stimulus Options/ Altnernatives - KEY
68
Guidelines for designing multiple choice items.
1. design each item to measure a single objective. e.g.) WH-Q이 objective면 이것만 측정 +) Inadvertant (unintentional) clue 제공하면 X 2) State both stem and options as simply and directly as possible - remove needless redundancy from options and stem 3. Make certain that the intended answer is clearly the only correct one (Only one correct answer) ``` 기출) make sure the distractors are the same grammatical class as the key / make sure the key cannot be selected based on Ss' world kn. ```