Anat Ben-Simon is the CEO of the National Institute for Testing and Evaluation (NITE) in Jerusalem. Prior this position Dr. Ben-Simon headed the Department of Computer-Based Testing, directed the Israeli National Assessment of Educational Progress, the development of MATAL - a computerized test battery for the diagnosis of learning disabilities, and the Hebrew Language Project (HLP), which involves the development of tools for automated essay scoring and text analysis. Dr. Ben-Simon obtained her Ph.D. degree in Psychology specializing in Psychometrics, from the Hebrew University of Jerusalem. She has also been teaching in the Psychology Department of the Hebrew university for the past 15 years. Dr. Ben-Simon is also the vice president of IAEA.
This study examined the effect of the incorporation of environmental distractors in computerized ... more This study examined the effect of the incorporation of environmental distractors in computerized continuous performance test (CPT) on the ability of the test in distinguishing ADHD from non-ADHD children. It was hypothesized that children with ADHD would display more distractibility than controls while performing CPT as measured by omission errors in the presence of pure visual, pure auditory, and a combination of visual and auditory distracting stimuli. Participants were 663 children aged 7–12 years, of them 345 diagnosed with ADHD and 318 without ADHD. Results showed that ADHD children demonstrated more omission errors than their healthy peers in all CPT conditions (no distractors, pure visual or auditory distractors and combined distractors). However, ADHD and non-ADHD children differed in their reaction to distracting stimuli; while all types of distracting stimuli increased the rate of omission errors in ADHD children, only combined visual and auditory distractors increased it ...
The diagnosis of learning disabilities (LD) is a very complex undertaking. It is especially chall... more The diagnosis of learning disabilities (LD) is a very complex undertaking. It is especially challenging when the main purpose is determining eligibility for accommodations in high-stake tests, a context in which standardization, objectivity and fairness must not be compromised. The current paper describes an endeavor to develop policy and procedure for standardizing and regulating the diagnosis of LD both in applicants to higher education institutions and in currently enrolled students, and for regulating the provision of test accommodations and other types of assistance. This endeavor, conducted by The National Institute for Testing and Evaluation (NITE) in cooperation with the Council of Higher Education in Israel,included the following: (1) development, validation and norming of a computer-based test battery for the diagnosis of LD; (2) development of a statistical decision rule for determining diagnosis based on a combination of test results. (3) development of a set of guidelin...
This paper focuses on the relationship between different aspects of the linguistic structure of a... more This paper focuses on the relationship between different aspects of the linguistic structure of a given language and the complexity of the computer program, whether existing or prospective, that is to be used for the scoring of essays in that language. The first part of the paper discusses common scales used to assess writing products, then briefly describes various methods of Automated Essay Scoring (AES) and reviews several AES programs currently in use. It also presents empirical results attesting to the reliability and validity of these programs, principally with regard to essays written in English. The second part of the paper presents various linguistic features that may vary extensively across languages and examines the ramifications of these features on the complexity of the AES operational system. This analysis is presented chiefly with regard to Hebrew and English, which are used to illustrate the differences that may exist between languages. (Contains 5 tables and 30 refe...
This study evaluated a “substantively driven” method for scoring NAEP writing assessments automat... more This study evaluated a “substantively driven” method for scoring NAEP writing assessments automatically. The study used variations of an existing commercial program, e-rater®, to compare the performance of three approaches to automated essay scoring: a brute-empirical approach in which variables are selected and weighted solely according to statistical criteria, a hybrid approach in which a fixed set of variables more closely tied to the characteristics of good writing was used but the weights were still statistically determined, and a substantively driven approach in which a fixed set of variables was weighted according to the judgments of two independent committees of writing experts. The research questions concerned (1) the reproducibility of weights across writing experts, (2) the comparison of scores generated by the three automated approaches, and (3) the extent to which models developed for scoring one NAEP prompt generalize to other NAEP prompts of the same genre. Data came ...
Content is one of the main writing dimensions on which essays are judged and rated. Since no auto... more Content is one of the main writing dimensions on which essays are judged and rated. Since no automated essay scoring (AES) system is capable (yet) of truly understanding the content of an essay and assessing its breadth, depth and relevance, AES systems use indirect methods and proxy indices for judging its quality. Most such indices are based on measures of semantic similarity between a given essay and some gold standard. The purpose of this study is to examine the efficiency (validity) of five computer-generated sematic indices used by NiteRater – an AES system for text analysis and essay scoring of Hebrew texts (NiteRater, 2007). These indices can be classified into three categories: (1) indices based on semantic proximity between essays – the similarity of an essay's vocabulary to that of essays in various score-categories; (2) indices based on Principal Component Analysis (PCA) of semantic similarities; and (3) indices based on prompt-related vocabulary – the similarity of ...
The vast diversity of operational definitions of learning disabilities (LD) and practices used fo... more The vast diversity of operational definitions of learning disabilities (LD) and practices used for its diagnosis threaten standardization, objectivity and fairness in the diagnosis of LD and the provision of test accommodations. The current paper describes an endeavor to overcome this problem by regulating and standardizing the diagnosis of learning disability (LD) in tertiary education and the provision of test accommodations. This endeavor, conducted by The National Institute for Testing and Evaluation (NITE) in cooperation with the Council of Higher Education in Israel, included: (1) development, validation and norming of MATAL: a computer-based test battery for the diagnosis of LD; (2) development of statistical decision rules for determining diagnosis based on test results; (3) development of guidelines for the provision of test accommodations; (4) establishment of diagnostic centers within institutions of higher education; and (5) establishment of a professional network of all...
The vast diversity of operational definitions of learning disabilities (LD) and practices used fo... more The vast diversity of operational definitions of learning disabilities (LD) and practices used for its diagnosis threaten standardization, objectivity and fairness in the diagnosis of LD and the provision of test accommodations. The current paper describes an endeavor to overcome this problem by regulating and standardizing the diagnosis of learning disability (LD) in tertiary education and the provision of test accommodations. This endeavor, conducted by The National Institute for Testing and Evaluation (NITE) in cooperation with the Council of Higher Education in Israel, included: (1) development, validation and norming of MATAL: a computer-based test battery for the diagnosis of LD; (2) development of statistical decision rules for determining diagnosis based on test results; (3) development of guidelines for the provision of test accommodations; (4) establishment of diagnostic centers within institutions of higher education; and (5) establishment of a professional network of all...
Forty years have passed since educational achievements were first compared on an international sc... more Forty years have passed since educational achievements were first compared on an international scale. What began as a hesitant and sporadic attempt to compare scholastic achievements in various countries has grown into a well-established enterprise encompassing close to 50 countries worldwide. Perhaps as a function of globalization and increasing awareness of the role human capital plays in furthering economic development, policy makers around the world are expressing growing interest in the results of such surveys, realizing their importance for precipitating educational reform. The quality of international comparisons of educational achievements has improved consistently as experience in the field has accumulated. Nevertheless, policy makers in many countries still fail to interpret the results of cross-national surveys in an accurate and useful manner, partly because they are unaware of the potential influence that diverse methodological factors have on the results of the tests. ...
This study examined the predictive role of learning difficulties in the academic self-efficacy of... more This study examined the predictive role of learning difficulties in the academic self-efficacy of students enrolled in higher education institutions and the serial multiple mediation of inner and external resources. The sample consisted of 2,113 students (age range = 18–35) at 25 higher education institutions in Israel. Participants were divided into four groups: (a) 668 typical students (without learning difficulties or attention deficit hyperactivity disorder [ADHD]), (b) 370 students with self-reported but undiagnosed academic difficulties, (c) 372 students diagnosed with specific learning disabilities (SLDs), and (d) 703 students diagnosed with attention deficit disorders (ADHD). Implicit theories on accommodations, perceptions of social support, hope, and academic self-efficacy were examined. Results demonstrated that students with SLD and ADHD had higher beliefs in the value of expectations, yet they experienced lower levels of academic self-efficacy than their typical peers. ...
th grade. The primary motivation for this study was to develop a scale for measuring progress whi... more th grade. The primary motivation for this study was to develop a scale for measuring progress which would serve as a "national yardstick" and later be used to assess the effectiveness of various educational interventions. In this study the same tests (assessing proficiency in Hebrew, Arabic and mathematics) were administered twice, one year apart, to the same schools and classes.
ד"ר אברהים אסדי, פרופ' מיכל שני ופרופ' רפיק אברהים ממרכז אדמונד י. ספרא לחקר המוח בל... more ד"ר אברהים אסדי, פרופ' מיכל שני ופרופ' רפיק אברהים ממרכז אדמונד י. ספרא לחקר המוח בלקויות למידה, וד"ר ענת בן-סימון מהמרכז הארצי לבחינות והערכה מציגים פיתוח חלוצי של ערכת אבחון מתוקננת בשפה הערבית ליכולות שפה וקריאה של ילדים בכיתות א'-ו' (لغة ألقراءه - לוג'את אלקירא'אה). פירסום על הערכה ניתן למצוא באתר של הסתדרות הפסיכולוגים "פסיכולוגיה עברית". בתחילה מוצג הרציונל לפיתוח מערכת זו, תוך כדי התייחסות לייחודיות של השפה הערבית הן בהיבט הדיגלוסי והן בהיבט האורתוגרפי. לאחר מכן מוצג תהליך הפיתוח תוך כדי התייחסות לבניית הכלים בשלב הפיילוט, ניתוח הפריטים והרצת הנורמות. לבסוף מוצגים השלבים בהליך הדגימה הארצית וכן דוגמאות מהכלים השפתיים, הקוגניטיביים ומבחני הקריאה. המערכת תשרת את כל אנשי המקצוע שיהיו רשאים לעסוק באיבחון וטיפול בלקויות למידה (קריאה). לקריאה לחץ על הקישור http://www.hebpsy.net/articles.asp?id=3160
In 2000, NITE launched the Hebrew Language Project (HLP). The goal of the project is to develop c... more In 2000, NITE launched the Hebrew Language Project (HLP). The goal of the project is to develop computational tools for the analysis and evaluation of Hebrew texts. The current paper reports the results of two studies. The first study examined the differential contribution of quantified text features to the automated scoring of essays elicited in three different contexts: essays written by 8th-grade native Hebrew-speakers who took part in the Israeli National Assessment of Educational Progress (n=1413); essays written by 12th-grade indigenous students in an instructional writing program (n=662); and essays written by applicants to higher education who took the YAEL test of Hebrew as a foreign language (n=980). The study also examined the effects of the size of the training sample used to develop the prediction model, and the effect of the textfeatureclustering model on the precision of the automated score. The second study examined the feasibility of assessing the difficulty (readab...
In the last two decades there has been an increase in the number of university applicants who are... more In the last two decades there has been an increase in the number of university applicants who are diagnosed as learning disabled (LD) and for whom test accommodations on university entrance exams are provided. The most frequent recommendation in the diagnostic reports of LD applicants is to extend the time limits of their tests. In the context of high-stakes testing, this kind of accommodation raises the question of equity: is it fair to extend the time limit of a speeded test to a particular group of examinees? Does it really give the LD a fair chance? And if so, by how much should the time limit be extended? Administering a computer-based version of the test to the LD can largely circumvent these issues. University applicants in Israel were required, until recently, to submit scores on the Psychometric Entrance Test (PET) to universities. This paper discusses the issues associated with test accommodations in general and with PET accommodations in particular. It then describes the ...
This study examined the effect of the incorporation of environmental distractors in computerized ... more This study examined the effect of the incorporation of environmental distractors in computerized continuous performance test (CPT) on the ability of the test in distinguishing ADHD from non-ADHD children. It was hypothesized that children with ADHD would display more distractibility than controls while performing CPT as measured by omission errors in the presence of pure visual, pure auditory, and a combination of visual and auditory distracting stimuli. Participants were 663 children aged 7-12 years, of them 345 diagnosed with ADHD and 318 without ADHD. Results showed that ADHD children demonstrated more omission errors than their healthy peers in all CPT conditions (no distractors, pure visual or auditory distractors and combined distractors). However, ADHD and non-ADHD children differed in their reaction to distracting stimuli; while all types of distracting stimuli increased the rate of omission errors in ADHD children, only combined visual and auditory distractors increased it ...
This study examined the effect of the incorporation of environmental distractors in computerized ... more This study examined the effect of the incorporation of environmental distractors in computerized continuous performance test (CPT) on the ability of the test in distinguishing ADHD from non-ADHD children. It was hypothesized that children with ADHD would display more distractibility than controls while performing CPT as measured by omission errors in the presence of pure visual, pure auditory, and a combination of visual and auditory distracting stimuli. Participants were 663 children aged 7–12 years, of them 345 diagnosed with ADHD and 318 without ADHD. Results showed that ADHD children demonstrated more omission errors than their healthy peers in all CPT conditions (no distractors, pure visual or auditory distractors and combined distractors). However, ADHD and non-ADHD children differed in their reaction to distracting stimuli; while all types of distracting stimuli increased the rate of omission errors in ADHD children, only combined visual and auditory distractors increased it ...
The diagnosis of learning disabilities (LD) is a very complex undertaking. It is especially chall... more The diagnosis of learning disabilities (LD) is a very complex undertaking. It is especially challenging when the main purpose is determining eligibility for accommodations in high-stake tests, a context in which standardization, objectivity and fairness must not be compromised. The current paper describes an endeavor to develop policy and procedure for standardizing and regulating the diagnosis of LD both in applicants to higher education institutions and in currently enrolled students, and for regulating the provision of test accommodations and other types of assistance. This endeavor, conducted by The National Institute for Testing and Evaluation (NITE) in cooperation with the Council of Higher Education in Israel,included the following: (1) development, validation and norming of a computer-based test battery for the diagnosis of LD; (2) development of a statistical decision rule for determining diagnosis based on a combination of test results. (3) development of a set of guidelin...
This paper focuses on the relationship between different aspects of the linguistic structure of a... more This paper focuses on the relationship between different aspects of the linguistic structure of a given language and the complexity of the computer program, whether existing or prospective, that is to be used for the scoring of essays in that language. The first part of the paper discusses common scales used to assess writing products, then briefly describes various methods of Automated Essay Scoring (AES) and reviews several AES programs currently in use. It also presents empirical results attesting to the reliability and validity of these programs, principally with regard to essays written in English. The second part of the paper presents various linguistic features that may vary extensively across languages and examines the ramifications of these features on the complexity of the AES operational system. This analysis is presented chiefly with regard to Hebrew and English, which are used to illustrate the differences that may exist between languages. (Contains 5 tables and 30 refe...
This study evaluated a “substantively driven” method for scoring NAEP writing assessments automat... more This study evaluated a “substantively driven” method for scoring NAEP writing assessments automatically. The study used variations of an existing commercial program, e-rater®, to compare the performance of three approaches to automated essay scoring: a brute-empirical approach in which variables are selected and weighted solely according to statistical criteria, a hybrid approach in which a fixed set of variables more closely tied to the characteristics of good writing was used but the weights were still statistically determined, and a substantively driven approach in which a fixed set of variables was weighted according to the judgments of two independent committees of writing experts. The research questions concerned (1) the reproducibility of weights across writing experts, (2) the comparison of scores generated by the three automated approaches, and (3) the extent to which models developed for scoring one NAEP prompt generalize to other NAEP prompts of the same genre. Data came ...
Content is one of the main writing dimensions on which essays are judged and rated. Since no auto... more Content is one of the main writing dimensions on which essays are judged and rated. Since no automated essay scoring (AES) system is capable (yet) of truly understanding the content of an essay and assessing its breadth, depth and relevance, AES systems use indirect methods and proxy indices for judging its quality. Most such indices are based on measures of semantic similarity between a given essay and some gold standard. The purpose of this study is to examine the efficiency (validity) of five computer-generated sematic indices used by NiteRater – an AES system for text analysis and essay scoring of Hebrew texts (NiteRater, 2007). These indices can be classified into three categories: (1) indices based on semantic proximity between essays – the similarity of an essay's vocabulary to that of essays in various score-categories; (2) indices based on Principal Component Analysis (PCA) of semantic similarities; and (3) indices based on prompt-related vocabulary – the similarity of ...
The vast diversity of operational definitions of learning disabilities (LD) and practices used fo... more The vast diversity of operational definitions of learning disabilities (LD) and practices used for its diagnosis threaten standardization, objectivity and fairness in the diagnosis of LD and the provision of test accommodations. The current paper describes an endeavor to overcome this problem by regulating and standardizing the diagnosis of learning disability (LD) in tertiary education and the provision of test accommodations. This endeavor, conducted by The National Institute for Testing and Evaluation (NITE) in cooperation with the Council of Higher Education in Israel, included: (1) development, validation and norming of MATAL: a computer-based test battery for the diagnosis of LD; (2) development of statistical decision rules for determining diagnosis based on test results; (3) development of guidelines for the provision of test accommodations; (4) establishment of diagnostic centers within institutions of higher education; and (5) establishment of a professional network of all...
The vast diversity of operational definitions of learning disabilities (LD) and practices used fo... more The vast diversity of operational definitions of learning disabilities (LD) and practices used for its diagnosis threaten standardization, objectivity and fairness in the diagnosis of LD and the provision of test accommodations. The current paper describes an endeavor to overcome this problem by regulating and standardizing the diagnosis of learning disability (LD) in tertiary education and the provision of test accommodations. This endeavor, conducted by The National Institute for Testing and Evaluation (NITE) in cooperation with the Council of Higher Education in Israel, included: (1) development, validation and norming of MATAL: a computer-based test battery for the diagnosis of LD; (2) development of statistical decision rules for determining diagnosis based on test results; (3) development of guidelines for the provision of test accommodations; (4) establishment of diagnostic centers within institutions of higher education; and (5) establishment of a professional network of all...
Forty years have passed since educational achievements were first compared on an international sc... more Forty years have passed since educational achievements were first compared on an international scale. What began as a hesitant and sporadic attempt to compare scholastic achievements in various countries has grown into a well-established enterprise encompassing close to 50 countries worldwide. Perhaps as a function of globalization and increasing awareness of the role human capital plays in furthering economic development, policy makers around the world are expressing growing interest in the results of such surveys, realizing their importance for precipitating educational reform. The quality of international comparisons of educational achievements has improved consistently as experience in the field has accumulated. Nevertheless, policy makers in many countries still fail to interpret the results of cross-national surveys in an accurate and useful manner, partly because they are unaware of the potential influence that diverse methodological factors have on the results of the tests. ...
This study examined the predictive role of learning difficulties in the academic self-efficacy of... more This study examined the predictive role of learning difficulties in the academic self-efficacy of students enrolled in higher education institutions and the serial multiple mediation of inner and external resources. The sample consisted of 2,113 students (age range = 18–35) at 25 higher education institutions in Israel. Participants were divided into four groups: (a) 668 typical students (without learning difficulties or attention deficit hyperactivity disorder [ADHD]), (b) 370 students with self-reported but undiagnosed academic difficulties, (c) 372 students diagnosed with specific learning disabilities (SLDs), and (d) 703 students diagnosed with attention deficit disorders (ADHD). Implicit theories on accommodations, perceptions of social support, hope, and academic self-efficacy were examined. Results demonstrated that students with SLD and ADHD had higher beliefs in the value of expectations, yet they experienced lower levels of academic self-efficacy than their typical peers. ...
th grade. The primary motivation for this study was to develop a scale for measuring progress whi... more th grade. The primary motivation for this study was to develop a scale for measuring progress which would serve as a "national yardstick" and later be used to assess the effectiveness of various educational interventions. In this study the same tests (assessing proficiency in Hebrew, Arabic and mathematics) were administered twice, one year apart, to the same schools and classes.
ד"ר אברהים אסדי, פרופ' מיכל שני ופרופ' רפיק אברהים ממרכז אדמונד י. ספרא לחקר המוח בל... more ד"ר אברהים אסדי, פרופ' מיכל שני ופרופ' רפיק אברהים ממרכז אדמונד י. ספרא לחקר המוח בלקויות למידה, וד"ר ענת בן-סימון מהמרכז הארצי לבחינות והערכה מציגים פיתוח חלוצי של ערכת אבחון מתוקננת בשפה הערבית ליכולות שפה וקריאה של ילדים בכיתות א'-ו' (لغة ألقراءه - לוג'את אלקירא'אה). פירסום על הערכה ניתן למצוא באתר של הסתדרות הפסיכולוגים "פסיכולוגיה עברית". בתחילה מוצג הרציונל לפיתוח מערכת זו, תוך כדי התייחסות לייחודיות של השפה הערבית הן בהיבט הדיגלוסי והן בהיבט האורתוגרפי. לאחר מכן מוצג תהליך הפיתוח תוך כדי התייחסות לבניית הכלים בשלב הפיילוט, ניתוח הפריטים והרצת הנורמות. לבסוף מוצגים השלבים בהליך הדגימה הארצית וכן דוגמאות מהכלים השפתיים, הקוגניטיביים ומבחני הקריאה. המערכת תשרת את כל אנשי המקצוע שיהיו רשאים לעסוק באיבחון וטיפול בלקויות למידה (קריאה). לקריאה לחץ על הקישור http://www.hebpsy.net/articles.asp?id=3160
In 2000, NITE launched the Hebrew Language Project (HLP). The goal of the project is to develop c... more In 2000, NITE launched the Hebrew Language Project (HLP). The goal of the project is to develop computational tools for the analysis and evaluation of Hebrew texts. The current paper reports the results of two studies. The first study examined the differential contribution of quantified text features to the automated scoring of essays elicited in three different contexts: essays written by 8th-grade native Hebrew-speakers who took part in the Israeli National Assessment of Educational Progress (n=1413); essays written by 12th-grade indigenous students in an instructional writing program (n=662); and essays written by applicants to higher education who took the YAEL test of Hebrew as a foreign language (n=980). The study also examined the effects of the size of the training sample used to develop the prediction model, and the effect of the textfeatureclustering model on the precision of the automated score. The second study examined the feasibility of assessing the difficulty (readab...
In the last two decades there has been an increase in the number of university applicants who are... more In the last two decades there has been an increase in the number of university applicants who are diagnosed as learning disabled (LD) and for whom test accommodations on university entrance exams are provided. The most frequent recommendation in the diagnostic reports of LD applicants is to extend the time limits of their tests. In the context of high-stakes testing, this kind of accommodation raises the question of equity: is it fair to extend the time limit of a speeded test to a particular group of examinees? Does it really give the LD a fair chance? And if so, by how much should the time limit be extended? Administering a computer-based version of the test to the LD can largely circumvent these issues. University applicants in Israel were required, until recently, to submit scores on the Psychometric Entrance Test (PET) to universities. This paper discusses the issues associated with test accommodations in general and with PET accommodations in particular. It then describes the ...
This study examined the effect of the incorporation of environmental distractors in computerized ... more This study examined the effect of the incorporation of environmental distractors in computerized continuous performance test (CPT) on the ability of the test in distinguishing ADHD from non-ADHD children. It was hypothesized that children with ADHD would display more distractibility than controls while performing CPT as measured by omission errors in the presence of pure visual, pure auditory, and a combination of visual and auditory distracting stimuli. Participants were 663 children aged 7-12 years, of them 345 diagnosed with ADHD and 318 without ADHD. Results showed that ADHD children demonstrated more omission errors than their healthy peers in all CPT conditions (no distractors, pure visual or auditory distractors and combined distractors). However, ADHD and non-ADHD children differed in their reaction to distracting stimuli; while all types of distracting stimuli increased the rate of omission errors in ADHD children, only combined visual and auditory distractors increased it ...
Uploads
Papers by Anat Ben-simon