Educación Lo que debiera ser

TESTING -MIXING.
IS IT A TEST OR JUST A WRITTEN EXERCISE TO REINFORCE KNOWLEDGE? TEACHERS DECIDE.
By: Rosalía Nallelí Pérez-Estrada.
rosalia_na@hotmail.com
MADISON SCHOOL, COME TO BE THE BEST.
SANTANDER, UNIVERSITY.
MEXICO
Abstract.

This paper tackles, in a general form, the importance of differentiating testing and written exercises to measure the students´ advance especially when it is intended to know how much they have learnt of a language.

For doing so, it presents a brief research done with some Foreign Language Learners and the results are shown.

Key words.

Testing, objectives, reliability, validity, written exercises.

Introduction.

Testing is an activity done in different teaching areas. It is used to measure the developed knowledge of the learner about a topic or a subject. This is also a means to analyze the teaching results to improve or to change a curriculum design for a determined course.

Teachers who decide to construct a test have to set some basic goals since the beginning. They must know what they want to test for and which results they are interested in discovering. On one side, if they do not know what they are testing for, they run the risk of creating more classroom exercises rather than tests. On the other side, if they plan a test with very well established, measurable objectives, and they consider its content, form and analysis, the outcome of their testing will be more reliable and valid. Heaton (1988) says that the validity of a test is the extent to which it measures what it is supposed to measure and nothing else, and that reliability is a necessary characteristic of any good test.

Reliability and validity help to differentiate good tests from those who are not good. They are some basic goals which must be considered while a test is planned. Even more, reliability and validity can help a teacher to distinguish if what he is planning is a test or just simple exercises of practice. Unfortunately, reliability and validity do not always seem to be considered in the test planning. Then, teachers produce exercises instead of tests.

This work intends to discuss on this situation. It establishes a difference between a test and an exercise through the possible analysis of validity and reliability. It also discusses on how much a test purpose can be managed according to the teacher´s goals and how they may be differentiated with simple exercises. Then, at the beginning of this work, there is a fast review of what testing is and what it includes to be considered to be valid and reliable. After this, the discussion of a test to measure some students´ advance in English is presented. For this, a test applied to a group of students at a private English school, which offers Foreign Language Courses in Tlaxcala, Mexico is discussed about.

The group that was selected was formed by 15 students, 9 male and 6 female who worked at different areas in different places. Their ages varied from 24 to 35 years old and they have been learning English as a Foreign Language in order to use it in different contexts. They have been studying General English for more than 1 year and all together have advanced little by little. Their course has been directed to develop communicative skills, but the test which was planned for them had little if not zero of communicative samples to be tested. It fulfilled somehow with the requirements and content of the curriculum, but it had little communicative content. Hence, a modification of the intended test is also presented.

In a third part; there is an explanation of how the test was modified, why it was modified in that form and the suggestions made to change its first presentation into a more directed communicative goal. Finally some results and comments are set at the end of this work.

What is testing?

Testing is a helpful resource to know how well learners have developed knowledge in a determined area or subject. It is also a helpful resource to know how good the teaching was.

It can be applied at the beginning or at the end of a course and helps to identify strengths and weaknesses of the subjects tested. It does not matter the moment it is applied, it usually pursues a similar objective: to determine what the learners have

known or have learned and to get more objective results of that learning. Among other important aspects of testing, tests are also helpful to validate the didactic interventions socially and to create an institution´s identity. Besides, they are a political tool too, because they are established by institutions, schools, or governments to measure the advance of a determined population, especially in the education area.

In the language teaching area, testing has been established as a means to know the students´ advance in the language and to identify the different skills developed. They are also used as modifiers of contents in curriculum.

If a test gives non-satisfactory results, the language teaching path is modified, the linguistic content may be analyzed, and this can give way to different proposals for the language teaching.

In Foreign Language Teaching or in Second Language Teaching there are different types of tests with different designs and objectives: achievement/attainment tests, proficiency tests, aptitude tests and diagnostic tests. Some are created to analyze a method or a purpose. When they are created to analyze a method, multiple choice formats with fixed responses are common to be found and production is left aside.

Other tests pursue to analyze the performance of the learners. In this type of tests, production is one main factor to be tested. Candidates are asked questions which they have to answer orally or in written form.

The tests created to test a purpose usually identify achievement and proficiency. A proficiency test may be associated with how a syllabus is determined and what it is expected to achieve with it. And achievement measures what it was expected to be learnt. For Heaton, (1998:172), “achievement tests, though similar in a number of ways to progress tests, are far more formal tests and are intended to measure achievement on a larger scale”. In These tests, results are discovered when what is taught in class becomes the evidence of learning and it is shown. These also help teachers to notice the progress aspects of a curriculum established. On the other hand, Proficiency tests help to identify a student´s ability to do a specific task. Both, achievement and proficiency help a teacher to notice the expected advance of the

learner, which could have been expected to be achieved or the expected performance in a future situation.

However, if a person who needs to construct a test does not know exactly what his testing goals are when he decides on it, or what parts of the learning progress needs to identify; his decision will possibly lead him to construct a simple exercise for students.

When Foreign Language Teachers or Second Language teachers decide what to practice in a class, they create written or oral exercises. Their teaching progress is their concern. They imagine their students` reactions and the type of result that the exercise will bring with it. They are not thinking about the validity or reliability of the result of that exercise, because they just need their learners to go on practicing a topic or a structure. Then, for some language teachers, if an exercise in class does not work out, it is possible no to repeat it. They simply modify it for the next time they use it.

The real problem begins when, trying to construct a test, they create another written or oral exercise to measure their learners´ proficiency and/or achievement. They do not think on the Validity of their tests; to which Heaton (1988) refers as the extent to which it measures what it is supposed to measure and nothing else. From this, one of the most important stages to give validity to a test can begin since the moment the teacher decides what he is going to test (content validity) or how he is going to present the information (face validity). Another important moment is when the tester considers how to give way to reliability too. Reliability, as Heaton (1988) says; “is a necessary characteristic of any good test: for it to be valid at all, a test must first be reliable as a measuring instrument”. Therefore, reliability can be achieved through the size of the test, and its administration too.

Creating validity and reliability in a test. (something to start with)

To have a test which measures achievement or proficiency, it is essential to design useful and efficient instruments that permit to do evaluating actions of the achievements and capacities of the learners. A teacher must plan the items to be

included in order to achieve a certain degree. He also needs to decide how he is going evaluate it through its content.

To achieve this goal, validity, along with face, construct and content validity at least, must be considered.

Validity has to do with how valid the test is for its teaching and learning purpose. It means, if the test really measures what it is expected to measure. Then, it has a close relation with the construct of the test too. When it refers to its purpose, content validity is a means to know how valid the test is and its specifications might guide to identify it. Hughes (1989) says that a test is said to have content validity if its content constitutes a representative sample of the language skills, structure, etc, with which it is meant to be concerned. For example, if a student has been learning in a communicative language course, and the test asks him to answer just grammar questions, it can be said that there is not validity at all.

Validity has a close relationship with the purpose of the test. Does it intend to measure content? Which skills are intended to be measured? What is it done for? For this, Clapham, and Wall (1995) state that “validity is the extent to which a test measures what it is intended to measure: it relates to the uses made of test scores and the ways in which scores are interpreted, and is always relative to test purpose”. To answer the test purpose, one question to be asked is: what is this test for? What is it intended to test?. For Davies and Pearse (2000:172), validity may be determined in an achievement test if “it contains only forms and uses the learners have practiced in the course” and if “it employs only exercises and tasks that correspond to the general objectives and methodology of the course”

Content validity, which is another important aspect of validity, could also be identified by experts on the testing area who make judgements in a systematic form (Alderson et al, 1995), and who compare what the test includes and what it should include. When a test, intended to measure a specific skill, includes many repeated items with no specifications at all, it is considered to be invalid.

Besides content validity, another aspect which needs to be considered about in a valid test is its face validity. Face validity may guide a testee in an easy way to answer the test accurately or to get lost in his attempt. For Hughes, (1989:27) “a test is said to have face validity if it looks as if it measures what it is intended to measure”. For example, if a test intends to measure the listening skill, but on the test appears most of the audio written (the audio tape script), with blanks to be filled in, its face validity would not be valid at all, due to the fact that instead of measuring listening skills, this test would be measuring reading too. Hence, its face validity, together with its content validity, would not be real at all. Moreover, if in its specifications the test reads that communicative competence is going to be measured; but there is not any situation given to develop conversation and there are just unrelated sentences, there would not be face validity at all. For Heaton, (1988) “if a test item looks right to other testers, teachers, moderators, and testees, it can be described as having at least face validity”. Then face validity may be detected at first sight: one test with 20 open questions to be answered freely, (according to the learners’ different knowledge) would not show face validity since it is seen in a first moment, because any expert would immediately know that the learners ´production would be so different and the criteria to grade the given answers would be quite different at every sentence answered. Once more, one of those questions included in the test would just give way to a language practice in what is supposed to be a test.

Besides validity and some basic aspects of it, we have reliability. Reliability, as well as validity, is another important aspect to be considered when a language teacher is going to test his learners.

Reliability is closely related to proficiency and achievement tests. It helps to identify how reliable a test is to be a measuring instrument. It has to do with different aspects of the test, such as its length: the more items of a single topic in the test, the more reliable it can be considered, its administration: can the test be administer in different moments to different groups? Which are its results?, are they alike? its instructions: are they clear? Is the time specified in rubrics enough? Its scores: are they giving the right value to each item tested? Are the scores similar of two different test´s administrations?. Time: If two groups are given a similar test, are they given the same

amount of time? If they are not and one group is given 1 hour and another group is given 2 hours, then reliability would not exist at all. For Davies and Pearse (2000:173), “reliability is a matter of how far we can believe or trust the results of a test”. And the results can be achieved and studied from the different factors mentioned above.

Tests are also important in the second and foreign language teaching because they guide a foreign language teacher to consider the difference and the results that they show as measuring instruments. One teacher may decide to construct a test but he must be sure of considering at least those aspects mentioned above. If a teacher does not consider any of those aspects, then his work could be just another classroom exercise intended to be a test.

Reporting on a proficiency test taken at a private language school in Tlaxcala:

This test was applied for two times to a basic group, level IV: a test with no changes, (attachment 1), and a test with many changes; (attachment 2). It had many things to be changed. At the beginning of its application everything seemed to be alright. However, it was later discovered that there were some parts in it that needed to be modified and improved. Besides, it did not have specifications at all, but it was possible to detect some specifications established implicitly :

Implicit Specifications

Test´s first purposes. (with no changes)

 To detect if students knew how to use the verbs in past tense. If they were able to identify different tenses in different sentences.

 To have them reading some definitions and to recall vocabulary previously seen by finding words in a word search game.

 To have the students completing some questions by adding the corresponding auxiliary word.

 To make the students write a presentation.

Description of the test takers:

The test was proposed to be answered by a group of 15 students who have studied general English as a foreign language for more than 1 year. Throughout this time, they have been in contact with the English language for 4 hours every week. When they began studying this foreign language they began from the very basic level and little by little they have been having a progress in the language.

Test level:

The level of the test is four –level general English. It may be used for young adults because of the items included in each part. Its content goes from a very basic level to a upper-intermediate level. It includes progress grammar examples.

Sections:

It included 6 sections. They are divided into grammar, linguistic items, reading, vocabulary recognition, and writing. No time was specified to fulfil each task. There was not any specified rubrics which could ask the length of the production. The number of words expected to be written, etc. There was not any weighing to each section of the test, although they all were of different size and with different number of items.

Target language situation:

It was a progress test (or a single exercise) which intended to measure knowledge of vocabulary, verb tenses and it was modified to get a proficiency test. Maybe one target language situation could be identified in the written section in which it was asked the testee to combine the five sections in it.

Language skills to be tested: Reading and writing.

Language elements: Verbs, auxiliary verbs, general vocabulary, collocations.

Text types:

Test tasks: To fill in spaces with provided words, to look for some words in a word search game, to write a short profile.

Rubrics:

Just basic directions to do the task.

Criteria for marking:

Not specified

Other Problems detected in the first test with no changes:

 Test´s face validity: this test seemed to be a classroom exercise, more than a test because it included many similar questions of one single item, and the examples were repeated.

 Content validity: the content of the test was simply to identify vocabulary knowledge, reading understanding, little production and none listening test. Although it intended to measure some topics seen in class, some were not included, such as the word search.

 It was divided in sections which included: grammar, linguistic terminology: directions which asked the learners to use some specific words, such as should or shouldn´t, vocabulary, writing. It did not give a chance to the learners to decide which forms to use in a communicative situation. These sections made the learners be aware of every tested skill, making it unnatural.

 The instructions were too short. Besides, its rubrics did not include examples, nor did they include specific time to do an activity. It did not include any listening exercise either or a speaking area.

 The exercises were repeated and the students had just to complete some blanks. To look for words or to join sentences. It was not a prototype of communicative contexts.

 The content validity seemed to be invalid due to the fact that the abilities to be measured were not specified at all. For example, in the area where the students had to look for words, they just left that space in blank. When they were asked why they had not looked for the words, they said that they did not like to look for words in those games.

 It had unrelated sentences; besides it did not have real communicative tasks.

Comments: if the goal intended was to discover if they were able to find words among other words this just did not work.

Test 2. ( modified)

Specifications

Test´s modified purposes.

One of the fist purposes was to give face validity to this test. Because it was intended to be a test and not a simple exercise, the title of the test was included, as well as a space to write the learner´s name, group and date was left. In the written part, some lines were added in order to give space to the learner to write on them. An order was established and instructions were set in black. Besides, it intended to measure:

 If students were able to use some verbs in different tenses according to the context that was established for them.

 To make them read some definitions and to recall vocabulary previously seen in their classes. Then to use some vocabulary included in part 4.

 To make the students order some questions, setting the appropriate auxiliary word in its place.

 To make the students write a presentation, by following some specific requested information.

Description of the test takers:

The test was proposed to be answered by the same group of 15 students who have studied general English as a foreign language for 1 year and who had previously answered this test but without changes.

Test level:

Sections:

It included three sections. Section 1 included dialogues, in which some linguistic items had to be used. Section 2 included a very basic reading which just included vocabulary definitions, recognition, and writing. Section 3 included a listening part.

The writing part was set to identify production performance of the writer. The time was specified to fulfil the tasks of each section. There were some specified rubrics which intended to guide the testee to have a better result. There was the establishment of some weighing scores to each section of the test.

Target language situation:

it was a progress test which intended to measure knowledge of language use in different contexts. Maybe one target language situation could be identified in the written section in which the testee was asked to combine different parts of the test and of the course in a single presentation.

Language skills to be tested:

Reading comprehension, writing, listening. Oral production was left aside.

Language elements:

Verbs, auxiliary verbs, general vocabulary, etc.

Text types:

Test tasks:

To fill in some blanks, to order some questions, to write down a complete presentation, To fill in spaces with provided words, to establish a dialogue with an unknown imaginary person, etc.

Rubrics:

They tried to cover the most important aspects to be followed by the learner, in order to lead him into an accurate test.

Criteria for marking:

Each item was provided of two points in order to set the score.

Comments:

There were several changes in the second test sample. Although the intention was to respect its goals as much as possible, there were some parts of it that were eliminated, in order to test language communicative use. It also intended to test the learners’ knowledge about the language more than their grammar knowledge. To achieve this, several parts were modified. More information was added and some other parts were definitely changed or eliminated.

One first change was to its face validity. The first test seemed to be an activity or an exercise to have students occupied rather than a test. In that first test there were not divisions of activities. There were many exercises for testing a single part and the rubrics seemed to be too short and general. The Square to find words was a real distracter for students. Many of them preferred not to do it, or they took too much time looking for the words in it but they forgot to go on answering the test. The courses’ goal was to measure 4 skills in the test, then there was not any section for listening. It was added to the modified test.

According to its content validity:

Because the general objective was to test the advance and knowledge of the students according to the course ´curriculum, the content validity was modified to measure: communicative situations, reading of complete ideas, writing of a presentation and a listening exercise.

EXPLANATION: PART 1:

Test with general modifications (and comparisons with test 1):

In this first part, the word grammar was definitely eliminated. The learner had to show somehow if he was able to recognize the verb tense and its use according to the situation given. It was not necessary to tell him explicitly that he was going to be tested about his grammar knowledge about verbs. Because there were many sentences (or examples) it was a very long exercise in which the tenses (present simple, past simple) and the conjugation of the verbs were tested once. It was necessary to eliminate all the other ones that were repeated.

All the verbs were set on a single column, to facilitate the learner their identification and application. And the word Verb was replaced by the term word.

EXPLANATION. PART 2. The original part wanted the learner to use the auxiliary words: should or shouldn´t and there were 5 examples to practice these words in the test. To make it of real use, the directions were modified and a dialogue was included. The objectives of this dialogue were to make the learner to think what possible recommendations he could give to a person in trouble. Then, when the direction says: recommend Mario to do something different, the objective is to make this person solve some problems by using not only the intended words should or shouldn´t, but possibly to make him use other structure such as: why don´t you...? You have to... and possibly: you must...etc.

What was intended to be tested was now presented in a dialogue and real communication could be identified in a small dialogue. It was necessary to set scenery and to include the learner in that conversation.

Explanation part 3.- In the first test, the learners had to complete the sentences with an auxiliary word. In this exercise they just needed to read key words like yesterday and they could complete the question with the auxiliary word DID. This part of the test did not help much to identify their real knowledge of how to make questions in a natural context. Auxiliary words were included in the questions (they did not have to select the appropriate word for each question. In test 2, this time they had to order the complete question. To do so, they were given all the elements of the question but not in order so that they could order them. To help them a little bit more in their proficiency, compound words such as street food, a good time, computer games, on vacation were not separated. A situation of an interview was created so that the students could identify the objective of ordering such exercise.

Explanation part 5. the directions were completely modified. This part of the test was now directed to make the learner read and understand what it was there written and to show understanding by underlining the appropriate word to the definitions given.

It was also intended to test vocabulary knowledge. When the students had to look the words up in the word search, it was considered to be time consuming, because there was not any production at all when they had to find the required items. Neither did they have to use the words in a context. It was considered necessary to eliminate that part and just

to prove reading understanding of some definitions and to discriminate some vocabulary in that level.

EXPLANATION PART 6.- This writing part in test 1, required a short profile of the learner, but the directions were too vague and ample. First of all, there was not a specific goal to accomplish in it, due to the fact that the learner had just to fulfill the task, but there was not a real objective for this paper. Then, there were not a limited number of words so it was going to be difficult to grade it. Some students would have preferred to write down 20 words while others would have preferred to write down a complete story up to 200 words. (in fact the lines of this test were 20). The instructions were too general and the person had to write a variety of topics at the same time. (Personal information, about the family, the last vacation, the plans for future, but it did not seem to be naturalistic because nobody can write from nothing). There was not any example either to guide the testee. Besides it was necessary to add number of words requested, and more specific instructions.

EXPLANATION PART 7. It was necessary to prove the learners´ listening skills too. Then, a listening part was added. The objective was to give the learners a more complete test and to have them to show use of language and communicative competence in different levels and aspects.

In summary, it is possible to talk about some general differences between these two tests:

1.- Test 2 is longer than test 1, (although there were many exercises eliminated form test 1).

2,. Test 2 includes listening comprehension and dialogues.

3.- There are participants in test 2 and dialogues. This test has a situation or a dialogue established.

4.- Test 2 mixes grammatical structures, but does not mention them explicitly. Test 1 tests them separately.

5.- Test 1 and test 2 explain what it is being tested but in test 1.

6.- Test 2 includes more instructions and it seems to be better than test 1 because it was intended to be a progress test. There is a listening exercise, with production more directed, there are not many repetitive sentences which could favor some learners but

damage some others. The situation provides dialogues into a context focusing more

attention in the content than in the grammar rules. It makes test 2 more valid for a

communicative test, because it´s a little bit longer, it gives more instructions with some

examples.

Results from test 1, with no changes. The grade of 15 students was added. Blue color

represents the number of students and the red one gives the total grade.

Results from test 2, with changes. The grade of 15 students was added. There is a small

difference in which the results of test number 2 improved. Blue color represnts the

number of students and the red one gives the total grade.

(GRAFICOS)
Conclusions:

Testing is a very important aspect to measure the learner´s advance and the teaching results, However, if the person who is supposed to construct the test does not consider those very important aspects to be covered into the test, that test may become a single exercise but a test. This person needs to establish the goals of testing and the specifications to be covered in that test. It would not be fair to give the students a test just because a teacher has to cover or to report a grade, which in many cases is requested by a school´s authorities. If the intention is to give a grade, many other aspects can be considered to, such as participation in class, homework’s, reports, work team participation, etc. The differences of the test applied (one test with no changes and a modified one) proved that a more planned test can give better results than an improvised one

Moreover, tests must be seen as a means to improve the teachers´ teaching too. If professors set goals to be achieved in a test they would grew the test validity, and if that test had to be reapplied to a similar group then reliability would also represents a great advantage for that test.

References:
Anderson, J.C; Clapham, C and D. Wall (1995), Language Test Construction and Evaluation. Cambridege University Press
Davies, Paul and Pearse, E. (2000) Succes in English Teaching. testing and evaluation. Oxford University Press.
Dos Santos, Manuel. (1980), Welcome to my World. Mc Graw Hill
Dunne, A. Roger. 2007. The Exaver project: Conception and Development Mexico. MEXTESOL, Journal.Vol. 31, Number 2. Edwards, Linda; 2007. Elevator Resource Bank. Richmond Publishing. October 11
Heaton, J.B. (1988) Writing English Language Tests. Longman
Hughes, A. 1989. Testing For Language Teachers. Cambridge University Press

This article may be read completely in: https://www.academia.edu/17278653/TESTING-MIXING

Educación Lo que debiera ser

martes, 15 de noviembre de 2016

No hay comentarios.:

Publicar un comentario