Assessment of Higher Order Thinking Skills15 Feb 2012
Higher order thinking skills include critical thinking, problem solving, decision making, and creative thinking (Lewis & Smith, 1993). They encompass the skills defined in Blooms Taxonomy of Educational Objectives (Bloom, 1956); the hierarchy of learning capabilities propounded by Briggs and Wager (1981), Gagn (1985), and Gagn, Briggs, and Wager (1988); and a number of other less well-known conceptualizations. An example is Gubbins Matrix of Critical Thinking Skills (as cited in Legg, 1990), which includes (1) problem solving, (2) decision making, (3) inferencesinductive and deductive reasoning, (4) divergent thinking, (5) evaluative thinking, and (6) philosophy and reasoning.
Assessment methods for measuring higher order thinking include multiple-choice items, multiple-choice items with written justification, constructed response items, performance tests, and portfolios. These methods can be used in both classroom and statewide assessments, but for convenience, consider the two kinds of assessments separately.
Validity and Generalizability of Higher Order Thinking Skills and Dispositions
Assessing the validity of measures of higher order thinking skills is more difficult than assessing those of lower order thinking skills. It is necessary to verify that higher order processes were used in arriving at correct answers. For example, some items (especially multiple-choice) must be answered through the use of higher order thinking by students who have not previously
encountered the problems presented. Other students can arrive at correct answers to the same items by calling on prior knowledge. In addition to those related to the influence of prior knowledge, questions concerning the generalizability of higher order skills remain to be answered.
A comprehensive definition of validity was formulated by Messick (1995).
Validity is an overall evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions based on test scores or other modes of assessment (Messick, 1989b). Validity is not a property of the test or assessment as such, but rather of the meaning of the test scores. The scores are a function not only of the items or stimulus conditions, but also of the persons responding as well as the context of the assessment. In particular, what needs to be valid is the meaning or interpretation of the scores as well as any implications for action that this meaning entails (Cronbach, 1971). The extent to which score meaning and action implications hold across persons or population groups and across settings or contexts is a persistent and perennial empirical question. This is the main reason that validity is an evolving property and validation a continuing process. (p. 5)
Norris (1989) considered two questions of importance in determining the validity of tests of critical thinking: (a) Is critical thinking generalizable? and (b) What is a critical thinking disposition? (p. 21). He also raised the question of whether critical thinking dispositions are generalizable. Students may have the skills to think critically but may not employ them in testing situations because of other factors such as lack of subject-specific knowledge or their religious or political beliefs.
Generalizability of critical thinking skills has two aspectsepistemological and psychological. Epistemological generalizability holds that there are skills such as inductive reasoning that apply to all subject matter contents. Critics of this point of view argue that each subject matter area has a unique epistemology and that each area has its own set of critical
thinking skills. Psychological generalizability presumes that epistemological generalizability exists and that skills acquired in one subject matter can be applied in others.
An important goal of education should be the production of critical thinking dispositions in students.
Critical thinkers are disposed to seek reasons, try to be well informed, use credible sources and mention them, look for alternatives, consider seriously points of view other than their own, withhold judgment when the evidence and reasons are insufficient, seek as much precision as the subject permits, among other activities. (Norris, 1989, p. 22)
Much of the evidence for generalizability of higher order thinking skills comes from psychological studies of transfer. Perkins and Salomon (1989) summarized a considerable amount of research conducted during the last 30 years that indicated that cognitive (higher order) skills are context bound. However, they pointed out that
. . . recent research shows that, when general principles of reasoning are taught together with self-monitoring practices and potential applications in varied contexts, transfer often is obtained (e.g., Nickerson, et al., 1985).
In summary, recent research and theorizing concerning transfer put the negative findings cited earlier in a different light. These findings do not imply either that people have little ability to accomplish transfer or that skill is almost entirely context bound.
Rather, the negative results reflect the fact that transfer occurs only under specific conditions, which often are not met in everyday life or laboratory experiments (Brown, Kane, & Long, in press). When the conditions are met, useful transfer occurs. (p. 22) The work of Perkins and Salomon suggests that instruments can be constructed that are both valid for the measurement of higher order skills and sensitive to instruction.
Lohmann (1993) argued that while intelligence test scores have most often been used as predictors of educational attainment, their most important use may be as measures of educational outcomes. Many studies have shown that intelligence and educational attainment are positively correlated and it is reasonable to conclude that increases in education cause increases in intelligence. Intelligence tests (particularly the so-called performance variety) often measure something Cattell (1963) and others call fluid ability (Gf). General academic achievement tests, on the other hand, usually measure something Cattell calls crystallized abilities (Gc) (p. 13). Lohmann pointed out that transfer of old learning to new situations is greater for fluid than for crystallized intelligence. Thus, both kinds of abilities are the products of education, but fluid
abilities are more closely akin to higher order thinking skills than crystallized abilities. He cited the results of the Follow-Through Study in which highly structured projects were more successful in producing crystallized abilities than more unstructured ones, while the reverse was true for fluid abilities. Similar results were cited for other investigations.
Haladyna (1997) and Sternberg (1998) adopted much the same view as Lohmann (1993). Haladyna characterized abilities (Gf) as being developed over long periods of time compared to achievement (Gc), which can be developed in a shorter time frame. He defined abilities as complex combinations of what we have called knowledge and skills, but they also include affective components like motivation and attitude (p. 14). Examples of abilities are critical thinking, problem solving, and creativity.
Sternberg (1998) conceptualized abilities as being forms of developing expertise. He pointed out that ability tests measure achievement of content that students encountered in previous grades. He viewed abilities as educational outcomes and not as the causes of such outcomes.
Individual differences in developing expertise result in much the same way they result in most kinds of learningfrom (a) rate of learning (which can be caused by amount of direct instruction received, amount of problem solving done, amount of time and effort spent in thinking about problems, and so on) and from (b) asymptote of learning (which can be caused by differences in numbers of schemas, organization of schemas, efficiency in using schemas, and so on; see Atkinson, Bower, & Crothers, 1965). (p. 14)
Peterson (1986) employed a real-life problem in each of 3 academic content domains (social sciences and humanities, social sciences and natural sciences, and social sciences and psychology) crossed with 6 generic problem-solving skills (decision making, communication, analysis, synthesis, valuing, and execution) to conduct a multitrait-multimethod study of the
structure of these 18 skill/task observations. Subjects were university students: lower level, 20; upper level, 26; and graduate level, 16. Confirmatory factor analysis was used to evaluate a model that contained a general second-order factor, 6 skill factors, and 3 subject content factors.
This model provided a moderately good fit to the data, but most of the common variance was accounted for by the general factor. A model containing only skill and subject matter factors did not provide a good fit. Peterson concluded that a general reasoning test, a vocabulary test, or a general knowledge test would provide as much information as a test of generic problem-solving skills. However, he pointed out that because all variables were measured by written responses, a general writing ability could have inflated the influence of the general problem-solving factor.
Published Measures of Higher Order Thinking Skills
Ennis (1993) listed 7 possible purposes for which published tests of critical thinking may be used. These purposes are listed as follows:
1. Diagnosing the levels of students critical thinking.
2. Giving students feedback about their critical thinking prowess.
3. Motivating students to be better at critical thinking.
4. Informing teachers about the success of their efforts to teach students to think critically.
5. Doing research about critical thinking instructional questions and issues.
6. Providing help in deciding whether a student should enter an educational program.
7. Providing information for holding schools accountable for the critical thinking prowess of their students.
……. be continue ……