Developing A Computer-Based Assessment of Complex Problem Solving in Chem

Developing a Computer-based Assessment of Complex Problem Solving in Chem

Uploaded by

Science SHS Department

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views

Developing A Computer-Based Assessment of Complex Problem Solving in Chem

Developing a Computer-based Assessment of Complex Problem Solving in Chem

Uploaded by

Science SHS Department

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Scherer et al.

International Journal of STEM Education 2014, 1:2

http://www.stemeducationjournal.com/content/1/1/2

RESEARCH Open Access

Developing a computer-based assessment of

complex problem solving in Chemistry
Ronny Scherer1*, Jenny Meßinger-Koppelt2 and Rüdiger Tiemann3

Abstract
Background: Complex problem-solving competence is regarded as a key construct in science education. But due
to the necessity of using interactive and intransparent assessment procedures, appropriate measures of the construct
are rare. This paper consequently presents the development and validation of a computer-based problem-solving
environment, which can be used to assess students' performance on complex problems in Chemistry. The test consists
of four scales, namely, understanding and characterizing the problem, representing the problem, solving the problem,
and reflecting and communicating the solution. Based on this four-dimensional framework, the computer-based
assessment has been evaluated with the data of N = 395 10th grade high school students.
Results: Result showed that students' complex problem-solving competence could be modelled by four related but
empirically distinct factors with moderate to high intercorrelations. The construct showed substantial relations with fluid
intelligence and prior domain knowledge in Chemistry, indicating that construct validity and domain specificity were
given. Processes of understanding and characterizing the problem were substantially related to subsequent processes
in complex problem solving.
Conclusions: Due to the complexity of complex problem-solving processes in Chemistry, multidimensionality of the
construct could be assumed. Consequently, science educators should take into account abilities of understanding,
representing, solving the problem, and finally reflecting and communicating the solution when developing
instructional approaches and valid computer-based assessments.
Keywords: Chemistry education; Complex problem solving; Computer-based assessment; Interactivity; Validity

Background tered with the help of computer-based assessments

The assessment of competences has been shifted towards (CBAs) (Honey and Hilton 2011). Moreover, due to the
the use of computer-based procedures (Jurecka 2008). Even availability of different representation modes and item
in the field of science education, there are different ap- formats, students' conceptual understanding of scientific
proaches to use computers in order to facilitate the work on phenomena can be assessed. Furthermore, computer-
complex problems (Jonassen 2004). Van Merriënboer (2013), based assessments enable teachers and researchers to col-
one of the leading educational researchers in problem solv- lect different types of data. In addition to traditional scores
ing, stressed the importance of using computers as assess- on multiple-choice or constructed-response items, data
ment and instructional tools, as they simulate real-world on the time needed to perform a number of interactions
problems, which are ill-structured and complex in nature. and the sequence of operations are accessible (Wirth
The major advantage of computer-based tests lies in the 2008). Hence, researchers have the possibility to design
assessment of new content areas and constructs (Drasgow meaningful and motivating real-life scenarios, in which
and Chuah 2006; Wirth 2008). Furthermore, different students can solve complex and interactive problems
kinds of skills such as scientific processing and the ability (Funke 2010; Greiff et al. 2013). Additionally, the use
to design and execute scientific investigations can be fos- of CBAs is advantageous due to test economics, im-
provements in objectivity, and test reliability (Wirth
* Correspondence: ronny.scherer@cemo.uio.no 2008). But the efficiency of CBA procedures strongly
1
Centre for Educational Measurement at University of Oslo (CEMO), Faculty
of Educational Sciences, Postbox 1161 Blindern, 0318 Oslo, Norway depends on the application of common design charac-
Full list of author information is available at the end of the article teristics, which determine reliability and validity measures

© 2014 Scherer et al.; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction
in any medium, provided the original work is properly credited.
Scherer et al. International Journal of STEM Education 2014, 1:2 Page 2 of 15
http://www.stemeducationjournal.com/content/1/1/2

(Rigas et al. 2002). Also, CBAs need to be based on educa- we present a framework and an assessment which sys-
tional frameworks, taking into account different types of tematically combine cognitive psychology, science educa-
knowledge and cognitive processes (Van Merriënboer tion, and modern assessments.
2013). In science education, there is still a need for con-
ceptual models of inquiry- and problem-oriented abilities Problem solving from a psychological perspective
which could be used to design meaningful and valid as- In this study, we focus on the evaluation of complex
sessments (Abd-El-Khalick et al. 2004; Scherer 2012). problem-solving competence in the domain of Chemistry.
However, there are difficulties within the evaluation According to the PISA problem-solving framework
process. Due to a high complexity of computer-based as- (OECD 2004, 2013), this project refers to ‘problem solving
sessments, it might be difficult for test takers to find an competence’ as
optimal or correct solution (Sager et al. 2011). Further-
more, the amount of collected data could become prob- […] an individual’s capacity to engage in cognitive
lematic within the evaluation process. processing to understand and resolve problem
Understanding the cognitive processes, which are in- situations where a method of solution is not
volved in problem solving, and analyzing the structure immediately obvious. It includes the willingness to
of the problem solving construct have been a challenge engage with such situations in order to achieve one’s
in psychology and science education (Greiff et al. 2013; potential as a constructive and reflective citizen.
Kind 2013). Consequently, the present study investigates (OECD 2013, p. 122).
students' complex problem-solving competence (CPS)
by taking into account different abilities, which determine This definition underlines the term ‘competency’ by
their problem solving success. taking into account its domain-specific, contextual, and
The present study, first, proposes a theoretical model situational characteristics. In contrast to analytical prob-
of CPS in Chemistry, which formed the basis for devel- lem solving, complex problem solving (CPS) is defined
oping a computer-based assessment. Second, the design by the following characteristics: complexity and connect-
and characteristics of the assessment tool are described. ivity of system variables, temporal dynamics and system
By means of item response theory, the CBA is empiric- changes, interactivity, disclosed structure of the system
ally evaluated and tested for construct validity in a third or the problem situation (intransparency), and polytely
step (Quellmalz et al. 2012). In this regard, we check in the presence of competing goals (Funke 2010; OECD
whether or not a theoretical model, which distinguishes 2013). Obviously, traditional paper-and-pencil tests are
between four components of CPS, is supported by the not able to assess CPS due to its dynamic and interactive
data. This empirical evaluation is mainly concerned with character (Wirth and Klieme 2004).
the dimensionality of the construct. Although there have While performing a problem-solving process, different
been approaches to describe cross-curricular CPS by kinds of cognitive operations and influences of covari-
empirical means (e.g., Bühner et al. 2008; Kröner et al. ates such as prior knowledge, experience, and motivation
2005), this study incorporates domain-specific operatio- come together. For instance, due to the complex charac-
nalizations of the construct. We also address the import- ter and intransparency, problem solvers must interact
ance of taking into account the many components of with given systems in order to obtain information about
designing computer-based assessments in science (Kuo variables and their connectivity, and, finally, use these
and Wu 2013; Quellmalz et al. 2012). information to solve the problem successfully (Funke
The paper, thus, contributes to the development of a 2010). Consequently, feedback and supportive informa-
theory-driven assessment of students' complex problem- tion are needed to reach a given goal state by using
solving competences which reflects different compo- problem-solving strategies (Taasoobshirazi and Glynn
nents of the construct and provides meaningful insights 2009). These strategies mostly involve controlling for
into the determining factors and the opportunities to variables in order to obtain information on their effects
model CPS in Chemistry. Such an assessment could be on the outcome. In doing so, students build up a mental
used to provide a detailed feedback for teachers and model which represents the structure of variables and
learners on the different abilities involved in problem solv- their relations (Künsting et al. 2011). This knowledge
ing. Consequently, this approach systematically extends the can subsequently be used to achieve a goal state, repre-
domain-general problem-solving framework of problem senting a problem solution. Finally, after finding an appro-
solving within the Programme for International Student priate solution, students must evaluate and communicate
Assessment (PISA) studies towards the domain of Chem- their solutions (Kapa 2007; OECD 2004). These (meta)
istry. In light of the upcoming importance of theoretically cognitive skills are essential in order to monitor the
sound and empirically valid assessment in technology- problem-solving process, and, subsequently, publish the
rich environments (OECD 2013; Quellmalz et al. 2012), results (Scherer and Tiemann 2012). These processes
Scherer et al. International Journal of STEM Education 2014, 1:2 Page 3 of 15
http://www.stemeducationjournal.com/content/1/1/2

and cognitive requirements are essential in CPS and Problem solving in science
were, at least to some degree, studied in domain-general Contextualized and domain-specific assessment proce-
settings (Greiff et al. 2013; Sonnleitner et al. 2013). dures have gained importance in many subjects (Funke
Taken together, there are at least four problem-solving and Frensch 2007; Jonassen 2004). They are regarded as
components that can be distinguished: exploring and powerful tools to evaluate problem-solving skills and
understanding, representing and formulating, planning curriculum-related competences (Koeppen et al. 2008).
and executing, and monitoring and reflecting (OECD As some researchers discussed (Gabel and Bunce 1994;
2013). Jonassen 2004; Kind 2013), the concept of domain speci-
There have been further attempts to describe the struc- ficity does not only manifest in the effects of domain
ture of cross-curricular problem solving (Funke 2010). For knowledge on performance but also in specific problem-
instance, Kröner et al. (2005) operationalized the con- solving strategies. Scherer and Tiemann (2012) further
struct as a measure of intelligence and distinguished argued that even knowledge about strategies would be
between three components: rule identification, rule know- domain-specific. It is, thus, indicated that domain-general
ledge, and rule application. In the first step, students have and domain-specific CPS are related but distinct con-
to identify the connections between variables in order to structs (Molnár et al. 2013). This argument stressed the
acquire system knowledge. Subsequently, they apply their need for contextualized assessments (Koeppen et al. 2008)
knowledge to solve a complex problem. In this regard, it and led, for instance, to the development of simulations in
has also been investigated whether interactive problem which students could explore structure–property rela-
solving could be regarded as a component of intelligence tionships and basic concepts in Chemistry (Cartrette
(Danner et al. 2011; Funke and Frensch 2007; Kröner and Bodner 2010). These concepts focused on scientific
et al. 2005; Leutner 2002). But so far, the results on the re- inquiry and the conceptual understanding in science
lationship between the two constructs have been quite (Abd-El-Khalick et al. 2004; Flick and Lederman 2006).
contradictory. The correlations differed according to the Moreover, Taasoobshirazi and Glynn (2009) showed that
types and factors of intelligence (Leutner 2002). Danner affective and motivational constructs play an important
et al. (2011) argued that dynamic decision-making, which role in problem solving in science. It appears reasonable
is often regarded as complex problem solving, significantly that students' attitudes towards science affect their suc-
correlated with general intelligence but required further cess in domain-specific problem solving. These relation-
abilities. Therefore, they concluded that problem-solving ships are often mediated by students' goal orientations
competence was distinct from intelligence, although it (Künsting et al. 2011).
determines processes of knowledge acquisition. Accord- Besides domain specificity, different cognitive variables
ingly, psychological research on problem solving identi- affect the results of problem-solving processes (Kröner
fied, first, domain-general cognitive processes involved et al. 2005). As mentioned previously, the understanding
in problem solving, and, second, analyzed the empirical and characterization of the problem requires reasoning
distinction between intelligence and complex problem- abilities and relates to the analytical properties of problem-
solving competence. solving competences. The interpretation of system outputs
Other approaches such as the MicroDYN framework and information represented by tables, texts, and diagrams
distinguish between three types of competences which allow students to understand complex tasks. Furthermore,
are necessary to solve complex problems: model build- the ability to extract and apply information is regarded as
ing, forecasting, and information retrieval (Wüstenberg one of the key components within the PISA framework of
et al. 2012). This approach was implemented for cross- scientific literacy (Nentwig et al. 2009) and is an integral
curricular problem solving rather than domain-specific part of science education (Jones 2009). Kind (2013)
dimensions of the construct. Although the MicroDYN stressed the importance of these competences as compo-
framework captures essential cognitive abilities, it does nents of scientific reasoning. In his review, he identified
not reflect educational demands of problem solving. For different aspects and curricular demands in reasoning and
instance, processes of monitoring, reflecting, and com- problem solving situations: hypothesizing, experimenting,
municating a solution in science are essential parts of and evaluating evidence. He also argued that these pro-
the scientific problem-solving process (e.g., Bernholt et al. cesses could be regarded as domain-general, whereas dif-
2012; Klahr 2000). These components have not been ferent types of knowledge involved in scientific reasoning
taken into account explicitly in domain-general models are highly domain-specific. In this context, content know-
such as MicroDYN. Furthermore, the approaches de- ledge and epistemological knowledge about science inter-
scribed above have rarely been transferred to complex fere with solution strategies (see also Abd-El-Khalick et al.
problem situations in the domain of science, in which 2004). In contrast, domain-general CPS was often found to
prior knowledge and ‘strong’ solution strategies gain im- predict grades and school achievement in specific domains
portance (Jonassen 2004). (e.g. Wüstenberg et al. 2012). But these findings might be
Scherer et al. International Journal of STEM Education 2014, 1:2 Page 4 of 15
http://www.stemeducationjournal.com/content/1/1/2

due to the strong relationships among reasoning, domain- To sum up, there are different processes involved in
specific, and domain-general CPS (e.g. Molnár et al. 2013). scientific problem solving. These refer to the abilities
Gilbert and Treagust (2009) further argued that the role such as information retrieval, problem representation,
of adequate problem representations at a macroscopic, model building, strategic behavior, knowledge application,
microscopic, and sub-microscopic level was significant in and the phases of scientific inquiry such as planning an
chemistry education in order to foster conceptual under- experiment and evaluating the results (e.g., Bernholt et al.
standing. Furthermore, Lee (2010) and Lee et al. (2011) 2012; Klahr 2000). Based on these and the outcomes of
supported this argumentation by stressing that developing domain-general studies, which provided evidence on fun-
a mental model that represents the problem structure is damental cognitive processes independent from the do-
crucial for subsequent solution steps. Consequently, one main, we propose a model of complex problem solving
can argue that if students' problem-solving strategies are consisting of four factors (based on Koppelt 2011, OECD
built upon an adequate representation of their conceptual 2013 and Scherer 2012): First, students have to under-
knowledge in a specific domain, they are more likely to stand, characterize, and simplify the problem situation
develop expertise (Taasoobshirazi and Glynn 2009). (PUC) in order to build an adequate mental model which
As mentioned above, the transfer of problem-solving is represented in the second step (PR). Based on this
competences into domain-specific areas gains importance, model, strategies and methods for solving the problem are
especially in science education because the underlying developed (PS). Finally, students evaluate and communi-
processes and skills are closely related to scientific inquiry cate the problem solution (SRC). We note that this model
(Friege and Lind 2006; Klahr 2000; Künsting et al. 2011; is cyclic and that students could repeat the different steps.
Schmidt-Weigand et al. 2009). Especially the features of A detailed summary of these factors is given in Table 1.
interactivity of experimental systems are powerful tools
for the assessment of scientific process skills (Jurecka The present study
2008; Rutten et al. 2011; Kim and Hannafin 2011; Wu and The purpose of this study is twofold: First, we de-
Pedersen 2011). Together with Jonassen (2004), we argue velop a computer-based problem-solving environment
that problem-solving skills are domain-specific and em- with interactive features according to the proposed
bedded in specific contexts. Van Merriënboer (2013) sup- model of complex problem solving in Chemistry. Sec-
ported this argumentation and distinguished between ond, we evaluate this tool by checking the fit between
‘weak’ and ‘strong’ problem-solving strategies. The latter the empirical data and the theoretical framework. As
refer to strategies and sequences of actions which only a first attempt, we analyze whether or not the differenti-
occur in specific domains or contexts (Gabel and Bunce ation of four problem-solving steps leads to adequate
1994). However, until now, the issue of domain specificity measurement models. We analyze whether the theoretic-
has not yet been clarified for solving complex and ill- ally implied structure of the construct could be repre-
defined problems (Scherer 2012). sented by the data and, thus, address construct validity

Table 1 Cognitive dimensions of problem-solving processes

Cognitive dimension Description
Understanding and characterizing the problem (PUC) -Understanding the situation in which the problem occurs
-Identifying relevant information
-Extracting information from scientific texts, tables, and/or figures
-Developing scientific hypotheses
Representing the problem (PR) -Creating adequate representations of the problem situation (e.g., mind maps, structural
representations of chemical substances)
-Shifting between different types of problem representations (graphical, verbal, symbolic,
and tabular information)
Solving the problem (PS) -Performing systematic and strategic methodologies in order to achieve a goal state (e.g.,
by controlling variables in a scientific experiment)
-Planning and executing scientific investigations and experiments; testing hypotheses
Reflecting and communicating the solution (SRC) -Evaluating and reflecting problem solutions and scientific evidence
-Finding alternative solutions
-Communicating solutions and addressing different audiences (e.g., the scientific
community, the public)
-Distinguishing between scientific and everyday language
Scherer et al. International Journal of STEM Education 2014, 1:2 Page 5 of 15
http://www.stemeducationjournal.com/content/1/1/2

(Messick 1995). Finally, relationships among CPS and re- exploration phases of 10 min each, students, first, had to
lated constructs such as intelligence and domain know- identify unknown chemicals by using an analysis and a
ledge are assessed in order to obtain evidence on external synthesis machine (for a discussion on the importance of
validity. The newly developed computer-based assess- exploration phases in CPS assessments, see Leutner et al.
ment is used to evaluate the theoretical framework of 2005). Second, students had to identify and synthesize a
CPS in Chemistry. Our modelling approach provides a flavoring substance (in this case, methyl butyrate) fulfilling
combination between the assessment of cognitive abil- different criteria. In order to design an attractive and mo-
ities of problem solving and educational demands in tivating problem-solving environment, the task was em-
Chemistry lessons. It therefore contributes to the areas bedded into a contextual framework.
of educational assessment in science and the develop- Within the computer-based assessment, there were dif-
ment of frameworks of higher-order thinking skills. ferent machines representing complex systems. Students
had to interact with these systems during the problem-
Methods solving process in order to obtain information about their
Sample and procedure functionalities. This design feature requires the acquisition
Participants were 10th grade students attending the German of system knowledge, which is crucial for solving complex
Gymnasium in the federal state of Berlin (N = 420; 52.4% fe- problems (Goode and Beckmann 2010; Sonnleitner et al.
male). These students worked on a computer-based assess- 2013). As the relationships between dependent and inde-
ment of complex problem-solving competence. Additionally, pendent variables were disclosed at the beginning (Funke
paper-and-pencil tests were administered, which assessed co- 2010), the systems allowed different adjustments to identify
variates such as prior domain knowledge in Chemistry, fluid the number of correlated variables and their complexity.
intelligence (Gf), and general interest. The pupils' mean age In order to utilize system interactivity, the identifica-
was 15.8 years (SD = 0.7 years) ranging from 14 to 18 years. tion of unknown chemical substances had to be accom-
Some of the students reported that their mother language plished with the help of analytical spectra. This kind of
was not German (23.8%). However, for the present data, supportive and indirect feedback was necessary in order
Koppelt (2011) has shown that this did not affect the results to foster students' interactions with the systems. Never-
of the assessment. The assessment procedure was divided theless, system interactivity played an important role in
into two sessions of 90 min each, which were conducted on assessing CPS and in simulating domain-specific prob-
two adjacent days, leading to 395 complete data sets. lems by computational means (Jonassen 2004; Scherer
and Tiemann 2012). Due to the administration of different
Measures subtasks, which referred to the problem-solving steps, stu-
Dependent variable: complex problem-solving competence dents had to focus on the main task in the presence of
The following section provides a detailed description of competitive goals. This program feature referred to goal
the CBA's design characteristics and shows examples of orientation and polytely (Blech and Funke 2010).
items and indicators which were used to assess the four Moreover, students had to overcome some difficulties
dimensions of CPS. within the environment: The name of the chemical sub-
stance was unknown, and students had to suggest a
Development of a computer-based assessment of CPS plausible synthesis. Furthermore, the reaction yield had
The computer-based assessment was implemented with to be optimized. The problem-solving task was, thus,
the easy-to-use developmental environment ChemLab- complex and constructed in a way that students with low
Builder (Meßinger 2010). In this environment, one can prior knowledge would also be able to solve the task suc-
define various laboratory tools such as chemical sub- cessfully (Scherer and Tiemann 2012).
stances, machines for syntheses or analyses, and differ- To sum up, the main characteristics of complex
ent forms of information materials. The tool requires problem-solving environments were taken into account
the operationalization of inputs, outputs, and the data within the test development procedure (see, for example,
which is shown in the resulting log files. Furthermore, Funke 2010). However, due to our focus on a fixed chem-
the degree of interactivity can be adapted according to ical system with defined variables, temporal dynamics
the number of variables and their relationships. have not been implemented. Accordingly, our environ-
In order to assess complex problem-solving compe- ment was based on curricular demands of teaching the
tence, different tasks were implemented which could be concepts of chemical equilibria and the chemistry of esters
assigned to one of the four problem-solving steps pre- in grade 10, which did not refer to time-varying settings.
sented by Koppelt (2011) (see ‘Problem solving in sci-
ence’ section). In these tasks, students were able to solve Measurement of ‘understanding and characterizing the
single items independently from their performance on problem’ The evaluation of students' performance on
previous ones. After completing two evaluation-free ‘understanding and characterizing the problem’ included
Scherer et al. International Journal of STEM Education 2014, 1:2 Page 6 of 15
http://www.stemeducationjournal.com/content/1/1/2

the analysis of unknown chemical substances. Students' syntheses. As students' activities in the laboratory as well
responses were evaluated against a direct solution (vari- as system inputs and outputs were recorded, they were
ables PUC01 to PUC07 in Table 2). given the opportunity to monitor or recall settings and
To identify the substances, students had to consider the outputs during the tasks. We consequently coded the
information given in the lab books (‘spectra’ and ‘molar PS02 items to zero if an analysis or synthesis was applied
mass’), extract relevant properties, and, finally, relate them more than twice. This feature was in line with the opera-
to each other in order to suggest the substance's name. tionalization of systematic problem-solving behavior
For example, a chemical substance showed a molar mass proposed by Künsting et al. (2011). Moreover, if students
of 46 g/mol. This substance could either be ethanol or for- performed syntheses with substances, which were not
mic acid. Only by the determination of the substance class identified previously, the entries of ‘syntheses (student's
with the help of the spectral output and information view)’ appeared with a question mark, and the variable
depicted in a table, students were able to suggest the cor- PS04 was coded to zero (see Figure 2).
rect name. Figure 1 shows a screenshot of this subtask. Taken together, the PS items required different cognitive
The items of PUC varied in difficulty according to the processes such as tactical and goal-oriented actions, the
amount of information needed for the identification pro- achievement of required conditions, and the systematic
cess (see Table 2). The evaluation of the students' answers variation of factors (Scherer and Tiemann 2012). Further
was applied by using computer-based means. Each of the aspects of measuring PS are reported in Koppelt (2011).
seven items (i.e., unknown substances) was dichotomously
coded, which resulted in a maximum score of seven.
Measurement of ‘reflecting and communicating the
solution’ In order to operationalize this step, we developed
Measurement of ‘representing the problem’ The com-
static tasks, which assessed the various abilities of reflecting
puter environment enabled us to develop interactive and
and communicating the solution (SRC). First, the students
static tasks. In order to provide measures of representing
had to answer multiple-choice items, in which they had to
the problem (PR), we implemented static items with
recall their solution strategies. Second, they had to choose
multiple-choice options, in which students had to complete
among given representations of the problem solution (e.g.,
concept maps or tables with different formats of structural
a journal article) and decide whether they were appropriate
representations of chemical substances. The resulting items
for different audiences (for further information on this
were polytomously coded (for further information, refer to
approach, see Bernholt et al. 2012). Again, the items were
Koppelt 2011 and Scherer 2012).
dichotomously and polytomously coded.
Measurement of ‘solving the problem’ To analyze the
performance on ‘solving the problem (PS)’, the achievement Log file data For each student, a single log file has been
of a goal state (PS01a, PS01b) and students' general obtained, which contained the following information:
problem-solving behavior (PS02a to PS04) were examined. responses on multiple-choice, multiple-select, and con-
In this operationalization, the goal state referred to an opti- structed response items (provided as numbers or nominal
mal solution, which could be achieved by different types of entries), time needed to solve the task, number of actions,
system settings. Subsequently, the students' solution was a list of chemical substances used in the analyses and syn-
directly compared with an optimal goal state. Table 3 con- theses, the sequence of action within the computer-based
tains examples of variables measuring PS, which were eval- assessment, and the sequence of analyses and syntheses.
uated by analyzing the students' log files. To filter these raw data, we set up variables that were
Within this record of problem-solving behavior, a sep- assigned to measure the abovementioned factors of CPS.
arate table was given that contained a list of analyses In doing so, the entries on constructed response items
and syntheses. With the help of these entries, it was pos- and sequences of actions have been coded according to
sible to determine the number of replicated analyses and their correctness and efficiency. For further details, refer

Table 2 Examples of variables measuring the competence of ‘understanding and characterizing the problem (PUC)’
Variable Analysis of Description Coding
PUC01-02 Methanol and acetic acid Analyzing an unknown substance by taking into account system outputs (spectra, 0, 1 each
molar mass) and additional information given by tables
PUC03-06 Ethanol, pentanol, formic acid, Analyzing an unknown substance by taking into account system outputs (spectra, 0, 1 each
and butyric acid molar mass) and further information given by tables
Different types of information have to be combined (tabular and graphical sources)
PUC07 Sulfuric acid Analyzing an unknown substance with information that is explicitly given (molar mass) 0, 1
Scherer et al. International Journal of STEM Education 2014, 1:2 Page 7 of 15
http://www.stemeducationjournal.com/content/1/1/2

Figure 1 Screenshot of the CBA showing an item example for the dimension of PUC. PUC, understanding and characterizing the problem.

to Koppelt (2011), Scherer (2012), and Scherer and interrater reliability and attained statistically significant
Tiemann (2012). values (p < 0.05) ranging from 0.85 to 1.00 across the four
steps. Values of 1.00 occurred due to the fact that selected
Quality of the coding scheme In order to check the tasks have been analyzed by computer-based means only.
coding scheme for explicitness and reliability, 20% of the Consequently, there was no need for further interpretation
log files were coded by two independent raters. By these of students' responses in these tasks.
means, objectivity of the coding procedure has been
ensured. We determined Cohen's κ as a measure for Covariates of complex problem-solving competence
In order to investigate the external validity of the CPS
Table 3 Examples of variables measuring the competence assessment, we administered tests on related constructs
of ‘solving the problem (PS)’ of CPS. This approach is common in evaluating discrimin-
Variable Description Coding ant validity and provides information on the uniqueness
PS01a Choosing three correct substances in order 0, 1, 2, 3 of constructs and the quality of the assessment tools. In
to synthesize the required substance this context, validity is referred to as a test characteristic,
PS01b Choosing the correct system settings of 0, 1, 2 supporting the interpretation of the relationships between
concentration and distillation in order to
maximize the reaction yield
test scores and empirical evidence (Messick 1995). Reli-
abilities and descriptive statistics of the tests on covariates
PS02a Replicating analyses only once 0, 1
are reported in the Table 4.
PS02b Replicating syntheses only once 0, 1 First, a domain-specific prior knowledge test was ad-
PS03 Applying optimization steps of the reaction 0, 1 ministered, which comprised three scales in different
yield for meaningful reactions only
content areas: the chemistry of esters, structure–property
PS04 Synthesizing substances with previously 0, 1 relationships, and the nature of chemical equilibria.
identified and analyzed precursors only
The test consisted of 22 multiple-choice items, which
Scherer et al. International Journal of STEM Education 2014, 1:2 Page 8 of 15
http://www.stemeducationjournal.com/content/1/1/2

Figure 2 Lab journal taken from a student's log file.

were dichotomously scored. The resulting sum score and summed up to a final score for each scale (0 = I totally
was regarded as a measure of students' prior domain disagree to 3 = I totally agree).
knowledge.
Second, the students had to work on a test of fluid Data analyses
intelligence, which contained 36 items. These referred to Application of probabilistic measurement models
a figural, a numerical, and a verbal scale of 12 items To estimate students' problem-solving abilities, raw
each. The students had to work on figural and verbal scores were scaled with the help of item response theory
analogies (e.g., ‘apples:juice = potatoes:?’) as well as math- (IRT) models, which are implemented in the software
ematical reasoning problems (for details, see Schroeders package ACER ConQuest 2.0 (Wu et al. 2007). These
et al. 2010). All items were dichotomously scored and models are advantageous in modelling multidimensional
comprised to a general factor (Gf ). competences because they allow direct comparisons be-
Finally, we checked the students' general interest by tween competing models and show the relationship be-
administering two scales of the AIST test (German: All- tween item difficulties and person ability parameters.
gemeiner Interessens-Struktur-Test; Bergmann and Eder They have become prominent in test development and
2005). We chose the realistic (AIST-r) and the investiga- science education (Neumann et al. 2011). We chose the
tive (AIST-i) scales for further analyses, as they reflect IRT analysis as a state-of-the-art approach in modelling
facets of scientific interest with acceptable psychometric complex and multidimensional constructs (Bond and
properties. The test on general interest contained 20 state- Fox 2007). The IRT models are appropriate in modelling
ments, which were ranked on a four-point Likert scale categorical data without the assumption of normally
Table 4 Descriptive statistics and correlations of tests on covariates
Variables NItems M SD Min Max α Correlations
1. 2. 3.
1. Domain-specific prior knowledge 22 6.53 3.18 0 18 0.73 1.00
2. Fluid intelligence (Gf) 36 12.03 5.53 0 30 0.84 0.30** 1.00
3. General interest (AIST) 20 43.60 16.37 0 74 0.94 0.02 ns −0.15*** 1.00
Correlations were based on the sum scores. NItems, number of items; Min, Max, achieved minimum/maximum; ns, statistically insignificant (p > 0.05); α, Cronbach's
alpha; Gf, general factor; AIST, Allgemeine Interessens-Struktur-Test. **p < 0.001; ***p < 0.01.
Scherer et al. International Journal of STEM Education 2014, 1:2 Page 9 of 15
http://www.stemeducationjournal.com/content/1/1/2

distributed variables and allow for non-linear relation- Markov chain Monte Carlo algorithm for multidimen-
ships between latent variables and items (Wirth and sional IRT models has been applied (Wu et al. 2007).
Edwards 2007). Following the modelling approaches of To evaluate model fit, common information criteria
problem solving and inquiry-based abilities proposed by such as the Akaike's Information Criterion (AIC) and the
many science educators and educational psychologist Bayesian Information Criterion (BIC) were taken into ac-
(e.g., Kuo and Wu 2013; Neumann et al. 2011; Scherer count. These indexes are given as follows:
2012; Quellmalz et al. 2012; Wüstenberg et al. 2012),
these analyses appeared appropriate in applying a con- AIC ¼ dev þ 2np
firmatory analysis of dimensionality, especially because BIC ¼ dev þ logðN Þ np ;
of the categorical nature of students' scores (Bond and
Fox 2007). Since items were dichotomously and polyto- where ‘dev’ represents the final deviance of the model, N
mously scored, the partial credit model has been used in the sample size, and np the number of parameters for
this study. This model accounts for thresholds between model estimation. Models with smaller values of AIC
categories and can be generalized to multidimensional and BIC are empirically preferred. In order to compare
analogues, interpreting person abilities and item difficul- competing models, the information criteria and a χ2 like-
ties as multidimensional vectors. Additionally, the struc- lihood ratio test of final deviances were used.
ture of the variance-covariance matrix was taken into Referring to our theoretical assumption of four problem-
account. We further note that item dependencies within solving factors, we only tested whether or not the four-
the students' responses were neglectable (for further infor- dimensional model outperformed the model with one
mation, refer to Scherer 2012). factor. It would have been possible to test other models of
The IRT scaling was applied in two steps: The first CPS with fewer dimensions, but we focused on the valid-
step included an analysis based on all administered items ation of the proposed model with four factors because
with the aim of identifying items which did not fit the there was no conceptual and empirical evidence for com-
IRT model. Uni- and four-dimensional models were used bining or differentiating these steps (Koppelt 2011; OECD
to investigate the latent structure of the construct and reli- 2013; Scherer 2012).
abilities of each factor (see Figure 3).
Subsequently, only items with a moderate discrimination, Structural equation modelling
a weighted mean square value (wMNSQ) between 0.75 In order to analyze the relationships between CPS and
and 1.33 and an absolute t value lower than 1.96, were related constructs, we established structural equation
included in the second step (Adams and Khoo 1996). models, which were analyzed in Mplus 6.0 (Muthén and
Again, the uni- and four-dimensional models were applied Muthén 2010). In these analyses, the person parameters
to the data. The resulting person and item parameters were of the IRT scaling procedure have been used as indica-
used to obtain descriptive statistics. In this procedure, the tors of CPS. Within the estimation process, missing

Figure 3 Two competing model structures. (a) Single-factor model and (b) faceted model with four correlated factors representing the
problem-solving steps. Letters l, m, n, and o denote the numbers of items in each of the four steps.
Scherer et al. International Journal of STEM Education 2014, 1:2 Page 10 of 15
http://www.stemeducationjournal.com/content/1/1/2

values were handled by applying the full information persons variance (EAP/PV) reliability of 0.83. Further-
maximum likelihood procedure (FIML) (Enders 2010). more, the internal consistency, which is based on assump-
According to Little's MCAR test, the data were likely to tions of classical test theory, was acceptable for this model
follow the missing completely at random mechanism (χ2 (α = 0.79). However, these data did not provide informa-
(309) = 335.16, p = 0.15), which legitimized the use of tion on the measurement accuracy for each problem-
FIML. In this modelling approach, CPS served as the solving step. Therefore, we conducted an IRT analysis by
outcome and covariates as predictors. We have finally establishing a four-dimensional partial credit model and
chosen structural equation modelling (SEM) to obtain a checked for EAP/PV reliability of these four scales. Table 5
quite parsimonious model that reflects the relations contains the scaling outcomes of this analysis. We note
among CPS and covariates. In light of the relatively small that we constrained the means of item parameters of each
sample size, alternative approaches such as IRT modelling scale to zero in order to identify the scale of person
with latent regression could have yielded more biased abilities (Bond and Fox 2007). In this context, students
estimates. showed lowest performance in PUC and PS, whereas they
performed better in representing the problem and com-
Results municating the solution. By and large, the ability distribu-
In this section, we address the result in relation to our tions were broad.
research goals. First, the results on the IRT scaling are The PUC and PS scales showed acceptable reliabilities
presented. Second, the relationships between CPS and above 0.70, whereas the PR and SRC scales provided
related constructs are described, obtaining information reasonable values of 0.65. However, we argue that these
on the external validity of the CPS test. values were substantial in order to assess a quite complex
construct which is composed of further factors (Brunner
IRT scaling outcomes and Süß 2005; Yang and Green 2011).
Item selection and estimation of reliability
One of the prerequisites of IRT analyses is that the cat- Structure of CPS
egorical items are locally independent from each other By establishing uni- and four-dimensional models of CPS,
(Bond and Fox 2007). Therefore, we conceptually checked we evaluated the differences in model fit criteria (Table 6).
whether or not the solutions of two adjacent tasks affected A χ2 likelihood ratio test of final deviances (dev) was ap-
each other. After this evaluation process, 32 out of 39 in- plied to test for significant differences between the models.
dependent items resulted, which were used for further Again, the faceted model with four correlated dimensions
analyses. outperformed the single-factor approach, as the difference
Further results on unidimensional IRT scaling of the in the final deviances was statistically significant (Δdev =
32 items suggested that only 1 item of the PUC step did 595.17, Δdf = 9, p < 0.001).
not meet the item fit criteria (wMNSQ value above 1.33; Information criteria supported this finding: The AIC
Bond and Fox 2007) and was therefore excluded. The value of the unidimensional model was greater than the
exclusion of this item led to a significant improvement AIC of the four-dimensional model (AIC1dim > AIC4dim).
in final deviance and favored the IRT model with 31 items Exactly the same relation was found for the BIC, which
(model with 32 items: final deviance dev = 18,085.76, np = took into account the sample size: BIC1dim > BIC4dim.
56; model with 31 items: dev = 17,344.36, np = 53; model Both indices favored the CPS model with four separable
comparison: Δdev = 741.1, Δnp = 3, p < 0.001). Since the steps. Finally, this model has been accepted.
remaining items showed a sufficient fit (wMNSQ values Furthermore, the latent correlations between the four
between 0.79 and 1.26, absolute t values below 1.96), we problem-solving factors were statistically significant and
accepted the model with 31 items. In item response theory ranged between low and moderate values (Table 7). The
modelling, one assumption refers to the local independ- strongest relationship has been found between PUC and
ence of items (Bond and Fox 2007). Major violations of SRC, followed by PS and SRC. The lowest value oc-
this assumption could lead to biased estimates of item curred for the dimensions of PR and PS.
difficulties, thresholds, and further model parameters
(Wainer et al. 2007). In the present study, we addressed Relationships among CPS and covariates
this issue in two ways: First, we designed items that did In order to analyze the relationships between CPS and
not necessarily require correct responses from previous related constructs, we established measurement models
items (Koppelt 2011). Second, Scherer (2012, 2014) quan- within a structural equation framework with CPS as a la-
tified item dependencies and showed that they were tent variable, measured by the four scales PUC, PR, PS,
neglectable for the present assessment. and SRC. We used person parameters as indicators of stu-
The application of the unidimensional partial credit dents' performance on these scales. By introducing fluid
model revealed a sufficient expected a posteriori over intelligence, domain-specific prior knowledge, and general
Scherer et al. International Journal of STEM Education 2014, 1:2 Page 11 of 15
http://www.stemeducationjournal.com/content/1/1/2

Table 5 Descriptive statistics of item and person parameters

Descriptive statistics Item parameters Person parameters
PUC PR PS SRC SUM PUC PR PS SRC SUM
M 0.00 0.00 0.00 0.00 0.00 −0.45 0.12 −0.47 0.60 −0.04
SD 0.77 1.15 0.89 1.14 0.87 1.11 1.51 1.70 0.95 0.72
Min −0.83 −1.49 −1.37 −1.91 −2.56 −3.17 −3.81 −3.70 −2.14 −2.01
Max 1.14 2.32 1.51 2.77 1.56 3.30 4.01 2.52 3.96 1.89
EAP/PV reliability 0.71 0.65 0.76 0.65 0.83
NItems 7 7 11 6 31
SUM, overall problem-solving performance; EAP/PV, expected a posteriori over persons; NItems, number of items. In order to identify the scale, the means of the
item parameters were constrained to zero.

interest, we checked for their effects on CPS. This ap- Discussion

proach has the major advantage of correcting from meas- Test development and the structure of complex
urement error (Rigas et al. 2002). In this test, fluid problem solving
intelligence was measured by three scales which are based The present study focused on the development of a
on the manifest raw scores. Finally, we specified a regres- computer-based assessment of CPS, which was based on
sion model in which CPS ability was the criterion, fluid a theoretical framework. In this regard, a coding scheme
intelligence, prior knowledge, and general interest (AIST) has been proposed which referred to the four problem-
were the predictors (Figure 4). solving components and enabled us to score the data of
In order to check whether or not this model repre- students' log files. Moreover, the evaluation of these log
sented the data, we used different fit indexes such as files was substantially objective and resulted in a reliable
the χ2 value, the comparative fit index (CFI), the root IRT scale. However, as students had to cope with differ-
mean square error of approximation (RMSEA), and the ent virtual devices and tasks, the assessment procedure
standardized root mean square residual (SRMR). The was quite complex. Therefore, the psychometric proper-
models show a reasonable fit in the case of a CFI above ties could be affected by interactions between different
0.90, an RMSEA below 0.08, and a SRMR below 0.09 assessment modes, item formats, or biases deriving from
(Hu and Bentler 1999). The resulting model showed a missing data (Bond and Fox 2007; Enders 2010). In future
reasonable goodness of fit (χ2(49, N = 395) = 108.91, p < research, these problems need further attention and might
0.001, χ2/df = 2.22; CFI = 0.93, RMSEA = 0.06, SRMR = be modelled explicitly (Scherer 2012). Also, due to the
0.05) and was thus accepted. In this model, only prior complex operationalization of the problem-solving steps,
domain knowledge and Gf showed substantial regres- there is a need for in-depth analyses of their structure. It
sion weights, whereas interest (AIST) did not signifi- should be investigated whether different competences are
cantly predict CPS. In sum, 18% of variance in CPS subsumed by each step or the steps are strictly unidi-
could be explained, whereby prior knowledge in Chem- mensional. A further differentiation in the model struc-
istry showed larger effects than Gf. Interestingly, fluid ture might be beneficial (Kröner et al. 2005; Scherer
intelligence and interest were negatively associated, 2012; Sonnleitner et al. 2013).
meaning that students with higher values of fluid The computer-based assessment was implemented for
intelligence showed lower interest in realistic and inves- one content area of the German National Curriculum
tigative actions. for Chemistry, namely, esters and organic compounds,
and was evaluated for a sample of N = 395 students.
Therefore, it is of importance to validate our CPS model
with further content areas in order to check for content
Table 6 Final deviances and information criteria of the
specificity by empirical means. Further research should
uni- and the four-dimensional partial credit model
also focus on the transfer of the assessment procedure
Model Unidimensional Four-dimensional
model model to different age groups with a much greater sample size.
Final deviance (dev) 17,344.36 16,749.19
In our study, we also focused on the evaluation of
construct validity. In this regard, we first analyzed the
Number of parameters (np) 53 62
dimensionality of CPS by referring to two different as-
AIC 17,450.36 16,873.19 sumptions: (a) CPS as a single factor (unidimensionality)
BIC 17,481.98 16,910.18 and (b) CPS as a construct which comprises four separ-
AIC, Akaike's Information Criterion; BIC, Bayesian Information Criterion. able factors (multidimensionality). In order to address
Scherer et al. International Journal of STEM Education 2014, 1:2 Page 12 of 15
http://www.stemeducationjournal.com/content/1/1/2

Table 7 Latent correlations among the four problem- structure of a problem are strongly associated with pro-
solving components cesses of reflecting and evaluating a solution. This rela-
1. 2. 3. 4. tionship appears reasonable, as students need to recall
1. PUC 1.00 0.55*** 0.57*** 0.78*** criteria of an appropriate solution and activate their
2. PR 1.00 0.20*** 0.43*** knowledge about the problem (Jones 2009). Also, solving
the problem and evaluating a solution showed a moder-
3. PS 1.00 0.60***
ate correlation. Again, this finding appears reasonable,
4. SRC 1.00
as evaluating a solution against given criteria requires
***p < 0.001.
that a solution has been generated previously. In terms
of metacognition, the processes of monitoring and elab-
this aspect, model fit criteria of partial credit models oration are related to PS and SRC (Kapa 2007; Scherer
were evaluated and compared. It was found that the and Tiemann 2012). Unexpectedly, the dimensions of PR
four-dimensional model outperformed the model with a and PS showed the lowest correlation. This finding might
single factor. Therefore, the conclusion is eminently rea- be due to the design of items in the computer-based as-
sonable that CPS can be regarded as a competence con- sessment. Items referring to PR required students to, first,
sisting of multiple factors, which represent the four shift between different levels of representation (e.g., trans-
problem-solving steps (Koppelt 2011; Scherer 2012). Al- forming a structural formula into a sum formula) and, sec-
though the multidimensional framework has been sup- ond, build representations of the problem structure (e.g.,
ported empirically, we argue that other frameworks of by using concept maps). As these items do not necessarily
CPS could also work better than the unidimensional ap- involve aspects of systematicity or the strategy of control-
proach. For instance, Scherer (2014) was able to show ling variables, their direct relation to solution processes
that two- and three-dimensional models of CPS, which was comparably weak. However, further attention on this
were based on the ‘problem solving as dual search’ as- relationship is needed, as current research on domain-
sumption (Klahr 2000), significantly outperformed the general CPS identified the two processes as strongly asso-
unidimensional model. However, the four-dimensional ciated (e.g., Wüstenberg et al. 2012).
model was superior to less dimensional approaches. Fur- Furthermore, it can be concluded that the process of
thermore, Koppelt (2011) clarified that from a theoret- understanding and characterizing the problem is strongly
ical and conceptual perspective, a further differentiation associated with subsequent problem-solving steps. Jones
of the problem-solving factors would lead to unreliable (2009), Goode and Beckmann (2010), and Greiff et al.
fits of items to these factors. To this extent, we regard (2013) argued that building an adequate mental model of
our model as appropriate for describing the structure of the complex system by understanding the relationships
CPS in domain-specific settings. among variables is crucial for solving the complex prob-
Moreover, the relationships between the four latent lem. Hence, the empirical data obtained from the present
factors indicated their empirical distinction. Interest- study supported this argumentation. Although low to
ingly, the steps of PUC and SRC showed the highest cor- moderate correlations were found, the structure of CPS
relation, meaning that processes of understanding the could confound with a general factor, which underlies the

Figure 4 Regression model with correlated predictors of CPS. equi., equilibrium; real, realistic AIST scale; invest, investigative AIST scale;
numeric, numerical scale of fluid intelligence. ns, statistically insignificant (p > 0.05). **p < 0.01, ***p < 0.001.
Scherer et al. International Journal of STEM Education 2014, 1:2 Page 13 of 15
http://www.stemeducationjournal.com/content/1/1/2

four latent dimensions. Therefore, further analyses are ne- Chemistry, underlining the importance of knowledge
cessary in order to validate the higher-order structure of about the system and Chemistry as a domain (e.g.,
our model (Scherer 2012; Sonnleitner et al. 2013). Jonassen 2004; Koppelt 2011). In this model, the effect
of fluid intelligence was strong, as expected and pro-
External validation of the CPS assessment posed by previous research (e.g., Funke 2010). Again,
In our study, we were able to replicate the findings on general interest, as measured by an investigative and a
the relationship between intelligence and complex prob- realistic scale, did not show significant regression coeffi-
lem solving by obtaining a latent correlation of ρ = 0.57 cients on CPS, indicating that motivational variables do
which was comparable to the findings obtained by Bühner not necessarily play an important role in solving com-
(2008) and Kröner et al. (2005). Although Rigas et al. plex problems in Chemistry. This is in contrast to the
(2002) found partial correlations between fluid intelligence argumentation of science educators such as Taasoob-
and CPS performance between 0.38 and 0.51, they argued shirazi and Glynn (2009) who proposed strong effects of
that there was a unique competence which accounted for students' attitudes towards science in problem solving.
the specificity of computer-based assessments. Further- However, we argue that previous results on this relation-
more, they concluded that general CPS measures would ship were mainly focused on analytical and static prob-
become more similar to intelligence tests if researchers lems, whereas our study used complex and interactive
changed the characteristics of CBAs (Funke 2010). We, problems with different task characteristics (Wirth and
finally, argue that intelligence has a moderate effect on Klieme 2004). In computer-based scenarios, it appears
CPS performance, but both constructs could be sepa- more likely that students already have a certain level of
rated. This result is in line with research conducted by interest which subsequently leads to a weaker relationship
Danner et al. (2011) and Scherer (2012). However, dif- with performance (Jonassen 2004). Again, further research
ferent forms of intelligence could be taken into account on the different effects of interest and motivational vari-
in future research. ables on CPS is necessary.
Moreover, we argue that students' complex problem- Another issue that needs to be addressed in future re-
solving ability was related to their prior domain know- search is that students' personal background (e.g., the
ledge. This result underpins the significance of knowledge mother language) could affect the results of the present
acquisition by interacting with a given system (Goode and assessment. As discussed by Scherer (2012), measure-
Beckmann 2010; Sonnleitner et al. 2013). Although the as- ment invariance across different subgroups of students
sessments were designed in such a way that students with might be compromised by differential item functioning.
low prior knowledge could successfully solve the problem, It would consequently be desirable to investigate these
prior knowledge on the chemical concepts, implemented effects for a larger and more representative sample of
in the CBA, was beneficial (supporting Friege and Lind German students.
2006; Hambrick 2005; Schmidt-Weigand et al. 2009). To
some degree, this finding indicates the domain depend- Conclusions
ence of CPS. To sum up, our findings suggested that As a conclusion, our model of complex problem solving
intelligence and prior knowledge were determining factors with four factors represents a theoretical framework
of CPS. The effect of prior domain knowledge is often which describes the complex structure of CPS. We
underestimated in problem solving (Jonassen 2004) but exemplarily showed how this framework could be trans-
revealed moderate relationships in our study. ferred to specific tasks and item responses which subse-
As considered, the analysis of the relationships between quently led to appropriate measurement models (Kuo
CPS and related constructs was conducted by using per- and Wu 2013). The underlying construct map served as
son parameters which resulted from IRT scaling. By estab- a guideline for developing the computer-based assess-
lishing latent variables, the resulting correlations were ments (Kuo and Wu 2013; Pellegrino 2012). CPS could,
corrected from measurement error. These analyses were thus, be assessed by taking into account all four steps in
considered as adequate for the investigation of discrimin- order to investigate students' strengths and weaknesses
ant validity (Kuo and Wu 2013). Furthermore, model fit within the problem-solving process. Consequently, we
indexes were obtained, which provided additional infor- argue that differentiating into the factors of CPS yields
mation on how well the data represented the proposed more diagnostic information than a unidimensional ap-
theoretical framework. But due to high and statistically proach. Finally, computer-based assessments are power-
significant correlations among CPS, intelligence, and prior ful measurement tools, but researchers must keep an
domain knowledge, further models were established in eye on psychometric properties within the process of
order to control for the relationships among the covariates test development in order to establish valid and mean-
of CPS. The resulting regression model revealed that do- ingful assessments (Quellmalz et al. 2012; Wüstenberg
main knowledge was the strongest predictor of CPS in et al. 2012).
Scherer et al. International Journal of STEM Education 2014, 1:2 Page 14 of 15
http://www.stemeducationjournal.com/content/1/1/2

The results on the relationships among covariates and Author details

1
CPS in Chemistry provided evidence on construct valid- Centre for Educational Measurement at University of Oslo (CEMO), Faculty
of Educational Sciences, Postbox 1161 Blindern, 0318 Oslo, Norway. 2Joachim
ity (Messick 1995). Based on our framework, which sys- Herz Stiftung, Langenhorner Chaussee 384, 22419 Hamburg, Germany.
tematically combined approaches of scientific inquiry 3
Department of Chemistry, Humboldt-Universität zu Berlin, Brook-Taylor-Str. 2,
and psychological theories of problem solving, the theor- 12489 Berlin, Germany.

etical assumptions on the structure of CPS and the Received: 16 October 2013 Accepted: 29 January 2014
relationships with covariates such as intelligence and do- Published: 27 August 2014
main knowledge have been confirmed. Hence, fostering
problem-solving abilities requires knowledge acquisition References
and the development of reasoning abilities (Kuhn 2009). Abd-El-Khalick, F, Boujaoude, S, Duschl, R, Lederman, NG, Mamlok-Naaman, R,
Hofstein, A, Niaz, M, Treagust, D, & Tuan, H-L. (2004). Inquiry in science
Furthermore, students' abilities to understand the com- education: International perspectives. Science Education, 88, 397–419.
plex problem (PUC) and find an appropriate solution by Adams, RJ, & Khoo, ST. (1996). ACER Quest [computer software]. Melbourne: ACER.
applying systematic strategies (PS) are crucial in struc- Bergmann, C, & Eder, F. (2005). Allgemeiner Interessen-Struktur-Test mit
Umwelt-Struktur-Test (UST-R) – Revision (AIST-R). Göttingen: Beltz.
turing problem-solving processes. It might therefore be Bernholt, S, Eggert, S, & Kulgemeyer, C. (2012). Capturing the diversity of
beneficial to specifically enhance these two factors (Kim students' competences in science classrooms: Differences and commonalities
and Hannafin 2011; Klahr 2000). of three complementary approaches. In S Bernholt, K Neumann, & P Nentwig
(Eds.), Making it tangible – learning outcomes in science education
The present study provided a new computer-based (pp. 173–200). Münster: Waxmann.
assessment of complex problem-solving competence in Blech, C, & Funke, J. (2010). You cannot have your cake and eat it, too: How
Chemistry, which was developed by taking into account induced goal conflicts affect interactive problem solving. The Open
Psychology Journal, 3, 42–53.
domain-general and domain-specific processes of problem Bond, TG, & Fox, CM. (2007). Applying the Rasch Model: Fundamental measurement
solving. This assessment could be used in science class- in Human Sciences (2nd ed.). Mahwah: Lawrence Erlbaum.
rooms in order to evaluate students' competences and to Brunner, M, & Süß, H-M. (2005). Analyzing the reliability of multidimensional
measures: An example from intelligence research. Educational and
give specific feedback to learners (Kim and Hannafin Psychological Measurement, 65(2), 227–240.
2011; Quellmalz et al. 2012). As the domain-general PISA Bühner, M, Kröner, S, & Ziegler, M. (2008). Working memory, visual-spatial-intelligence
2012 assessment of problem solving was based on pro- and their relationship to problem-solving. Intelligence, 36, 672–680.
Cartrette, DP, & Bodner, GM. (2010). Non-mathematical problem solving in
cesses similar to those described in our study (OECD organic chemistry. Journal of Research in Science Teaching, 47(6), 643–660.
2013), the CBA could add more diagnostic information Danner, D, Hagemann, D, Holt, DV, Hager, M, Schankin, A, Wüstenberg, S, &
for students' competences in science. Using such an as- Funke, J. (2011). Measuring performance in dynamic decision making.
Reliability and validity of the tailorshop simulation. Journal of Individual
sessment could also foster the incorporation of computers Differences, 32(4), 225–233.
in science. It also shows how evidence-centered assess- Drasgow, F, & Chuah, SC. (2006). Computer-based testing. In M Eid & E Diener
ments and model-based learning approaches could be (Eds.), Handbook of multimethod measurement in Psychology (pp. 87–100).
Washington, DC: American Psychological Association.
combined for the construct of complex problem solving Enders, CK. (2010). Applied missing data analysis. New York: The Guilford Press.
(Kuo and Wu 2013; Quellmalz et al. 2012). Also, the pro- Flick, LB, & Lederman, NG (Eds.). (2006). Scientific inquiry and nature of science.
posed model of problem solving could serve as a teaching Dordrecht: Springer.
Friege, G, & Lind, G. (2006). Types and qualities of knowledge and their relations
tool which forms a guideline for science lessons and devel- to problem solving in physics. International Journal of Science and
oping instructional material (Van Merriënboer 2013). Mathematics Education, 4, 437–465.
Funke, J. (2010). Complex problem solving: A case for complex cognition?
Cognitive Processing, 11, 133–142.
Abbreviations Funke, J, & Frensch, PA. (2007). Complex problem solving: The European
AIC: Akaike's information criterion; AIST: Allgemeiner-Interessens-Struktur-Test perspective – 10 years after. In DH Jonassen (Ed.), Learning to solve complex
(German general test on the structure of interest); BIC: Bayesian information scientific problems (pp. 25–47). New York/London: Lawrence Erlbaum.
criterion; CBA: computer-based assessment; CFA: confirmatory factor analysis; Gabel, DL, & Bunce, DM. (1994). Research on problem solving: Chemistry. In DL
CFI: comparative fit index; CPS: complex problem solving; dev: final deviance; Gabel (Ed.), Handbook of research on science teaching and learning
EAP/PV: expected a posteriori over persons variance; FIML: full information (pp. 301–326). New York: Macmillan.
maximum likelihood; Gf: general factor (of fluid intelligence); IRT: item Gilbert, JK, & Treagust, D (Eds.). (2009). Multiple representations in chemistry
response theory; MCAR: missing completely at random; PR: representing the education. New York: Springer.
problem; PS: solving the problem; PUC: understanding and characterizing Goode, N, & Beckmann, JF. (2010). You need to know: There is a causal
the problem; RMSEA: root mean square error of approximation; relationship between structural knowledge and control performance in
SRC: reflecting and communicating the solution; SRMR: standardized root complex problem solving tasks. Intelligence, 38, 345–352.
mean square residual. Greiff, S, Holt, DV, Wüstenberg, S, Goldhammer, F, & Funke, J. (2013). Computer-based
assessment of complex problem solving: Concept, implementation, and
application. Educational Technology Research & Development, 61, 407–421.
Competing interests Hambrick, DZ. (2005). The role of domain knowledge in higher-level cognition. In
The authors declare that they have no competing interests. O Wilhelm & RW Engle (Eds.), Handbook of understanding and measuring
intelligence (pp. 361–372). Thousand Oaks: Sage Publications.
Honey, MA, & Hilton, ML (Eds.). (2011). Learning science through computer games
Authors’ contributions and simulations. Washington, DC: The National Academic Press.
JM-K and RS participated in the design of the study, performed the statistical Hu, L, & Bentler, PM. (1999). Cutoff criteria for fit indexes in covariance structure
analyses, and drafted the manuscript. RT conceived the study, and participated in analysis: Conventional criteria versus new alternatives. Structural Equation
its design and coordination. All authors read and approved the final manuscript. Modeling, 6, 1–55.
Scherer et al. International Journal of STEM Education 2014, 1:2 Page 15 of 15
http://www.stemeducationjournal.com/content/1/1/2

Jonassen, DH. (2004). Learning to solve problems: An instructional design guide. San Sager, S, Barth, CM, Diedam, H, Engelhart, M, & Funke, J. (2011). Optimization as
Francisco: John Wiley & Sons. an analysis tool for human complex problem solving. Journal of Optimization,
Jones, GJF. (2009). An inquiry-based learning approach to teaching information 21(3), 936–959.
retrieval. Information Retrieval, 12, 148–161. Scherer, R. (2012). Analyse der struktur, messinvarianz und ausprägung komplexer
Jurecka, A. (2008). Introduction to the computer-based assessment of problemlösekompetenz im fach Chemie [Analyzing the structure, invariance, and
competencies. In J Hartig, E Klieme, & D Leutner (Eds.), Assessment of performance of students' complex problem-solving competencies in Chemistry].
competencies in educational contexts (pp. 193–214). Cambridge/Göttingen: Berlin: Logos.
Hogrefe & Huber Publishers. Scherer, R. (2014). Psychometric challenges in modeling scientific problem-solving
Kapa, E. (2007). Transfer from structured to open-ended problem solving in a competency: An item response theory approach. In H Bock (Ed.), Studies in
computerized metacognitive environment. Learning and Instruction, 17, 688–707. classification, data analysis, and knowledge organization. New York: Springer.
Kim, MC, & Hannafin, MJ. (2011). Scaffolding problem solving in technology-enhanced in press.
learning environments (TELEs): Bridging research and theory with practice. Scherer, R, & Tiemann, R. (2012). Factors of problem-solving competency in a
Computers & Education, 56, 403–417. virtual chemistry environment: The role of metacognitive knowledge about
Kind, PM. (2013). Establishing assessment scales using a novel disciplinary strategies. Computers & Education, 59(4), 1199–1214.
rationale for scientific reasoning. Journal of Research in Science Teaching, Schmidt-Weigand, F, Hänze, M, & Wodzinski, R. (2009). Complex problem solving
50(5), 530–560. and worked examples. Zeitschrift für Pädagogische Psychologie, 23, 129–138.
Klahr, D. (2000). Exploring science. Cambridge: MIT Press. Schroeders, U, Wilhelm, O, & Bucholtz, N. (2010). Reading, listening, and viewing
Koeppen, K, Hartig, J, Klieme, E, & Leutner, D. (2008). Current issues in comprehension in English as a foreign language: One or more constructs?
competence modeling and assessment. Journal of Psychology, 216(2), 61–73. Intelligence, 38, 562–573.
Koppelt, J. (2011). Modellierung dynamischer problemlösekompetenz im Sonnleitner, P, Keller, U, Martin, R, & Brunner, M. (2013). Students' complex
Chemieunterricht [Modeling complex problem-solving competence in Chemistry]. problem-solving abilities: Their structure and relations to reasoning ability
Berlin: Mensch & Buch. and educational success. Intelligence, 41(5), 289–305.
Kröner, S, Plass, JL, & Leutner, D. (2005). Intelligence assessment with computer Taasoobshirazi, G, & Glynn, SM. (2009). College students solving chemistry
simulations. Intelligence, 33, 347–368. problems: A theoretical model of expertise. Journal of Research in Science
Kuhn, D. (2009). Do students need to be taught how to reason? Educational Teaching, 46(10), 1070–1089.
Research Review, 4(1), 1–6. Van Merriënboer, JJG. (2013). Perspectives on problem solving and instruction.
Künsting, J, Wirth, J, & Paas, F. (2011). The goal specificity effect on strategy use Computers & Education, 64, 153–160.
and instructional efficiency during computer-based scientific discovery Wainer, H, Bradlow, E, & Wang, X. (2007). Testlet response theory and its
learning. Computers & Education, 56, 668–679. applications. Cambridge: Cambridge University Press.
Kuo, C-Y, & Wu, K-H. (2013). Toward an integrated model for designing Wirth, J. (2008). Computer-based tests: Alternatives for test and item design. In J
assessment systems: An analysis of the current status of computer-based Hartig, E Klieme, & D Leutner (Eds.), Assessment of competencies in educational
assessments in science. Computers & Education, 68, 388–403. contexts (pp. 235–252). Cambridge/Göttingen: Hogrefe & Huber Publishers.
Lee, CB. (2010). The interactions between problem solving and conceptual Wirth, RJ, & Edwards, MC. (2007). Item factor analysis: Current approaches and
change: System dynamic modelling as a platform for learning. Computers & future directions. Psychological Methods, 12, 58–79.
Education, 55, 1145–1158. Wirth, J, & Klieme, E. (2004). Computer-based assessment of problem solving
Lee, CB, Jonassen, DH, & Teo, T. (2011). The role of model building in problem solving competence. Assessment in Education: Principles, Policy and Practice, 10(3),
and conceptual change. Interactive Learning Environments, 19(3), 247–265. 329–345.
Leutner, D. (2002). The fuzzy relationship of intelligence and problem solving in Wu, H-L, & Pedersen, S. (2011). Integrating computer- and teacher-based scaffolds
computer simulations. Computers in Human Behavior, 18, 685–697. in science inquiry. Computers & Education, 57, 2352–2363.
Leutner, D, Klieme, E, Meyer, K, & Wirth, J. (2005). Die Problemlösekompetenz in Wu, ML, Adams, RJ, Wilson, M, & Haldane, S. (2007). ACER Conquest 2.0:
den Ländern der Bundesrepublik Deutschland. In D PISA-Konsortium (Ed.), Generalized item response modeling software [computer software]. Hawthorn:
PISA 2003 – Der zweite Vergleich der Länder in Deutschland (pp. 125–146). ACER.
Münster: Waxmann. Wüstenberg, S, Greiff, S, & Funke, J. (2012). Complex problem solving – more
Messick, S. (1995). Standards of validity and the validity of standards in performance than reasoning? Intelligence, 40(1), 1–14.
assessment. Educational Measurement: Issues and Practice, 14(4), 5–8. Yang, Y, & Green, SB. (2011). Coefficient alpha: A reliability coefficient for the 21st
century? Journal of Psychoeducational Assessment, 29(4), 377–392.
Meßinger, J. (2010). ChemLabBuilder [computer software]. Chemnitz: mera.bit
Meßinger & Rantzuch GbR.
Molnár, G, Greiff, S, & Csapó, B. (2013). Inductive reasoning, domain specific and doi:10.1186/2196-7822-1-2
complex problem solving: Relations and development. Thinking Skills and Cite this article as: Scherer et al.: Developing a computer-based
assessment of complex problem solving in Chemistry. International Journal
Creativity, 9, 35–45.
of STEM Education 2014 1:2.
Muthén, B, & Muthén, L. (2010). Mplus 6 [computer software]. Los Angeles:
Muthén & Muthén.
Nentwig, P, Roennebeck, S, Schoeps, K, Rumann, S, & Carstensen, C. (2009).
Performance and levels of contextualization in a selection of OECD countries
in PISA 2006. Journal of Research in Science Teaching, 46(8), 897–908.
Neumann, I, Neumann, K, & Nehm, R. (2011). Evaluating instrument quality in
science education: Rasch-based analyses of a nature of science test.
International Journal of Science Education, 33, 1373–1405.
OECD. (2004). Problem solving for tomorrow's world – First measures of cross
Submit your manuscript to a
curricular competences from PISA 2003. Paris: OECD. journal and beneﬁt from:
OECD. (2013). PISA 2012 assessment and analytical framework. Paris: OECD.
Pellegrino, JW. (2012). Assessment of science learning: Living in interesting times. 7 Convenient online submission
Journal of Research in Science Teaching, 49(6), 831–841. 7 Rigorous peer review
Quellmalz, ES, Timms, MJ, Silberglitt, MD, & Buckley, BC. (2012). Science 7 Immediate publication on acceptance
assessments for all: integrating science simulations into balanced state 7 Open access: articles freely available online
science assessment systems. Journal of Research in Science Teaching, 49(3),
7 High visibility within the ﬁeld
363–393.
Rigas, G, Carling, E, & Brehmer, B. (2002). Reliability and validity of performance 7 Retaining the copyright to your article
measures in microworlds. Intelligence, 30, 463–480.
Rutten, N, Van Joolingen, WR, & Van der Veen, JT. (2011). The learning effects of
Submit your next manuscript at 7 springeropen.com
computer simulations in science education. Computers & Education, 58, 136–153.