Melek Gülşah Şahin
Gazi University, Eğitimde ölçme Ve Değerlendirme, Faculty Member
Computer Adaptive Multistage Testing (ca-MST), which take the advantage of computer technology and adaptive test form, are widely used, and are now a popular issue of assessment and evaluation. This study aims at analyzing the effect of... more
Computer Adaptive Multistage Testing (ca-MST), which take the advantage of computer technology and adaptive test form, are widely used, and are now a popular issue of assessment and evaluation. This study aims at analyzing the effect of different panel designs, module lengths, and different sequence of a parameter value across stages and change in b parameter range on measurement precision in ca-MST implementations. The study has been carried out as a simulation. MSTGen simulation software tool was used for that purpose. 5000 simulees derived from normal distribution (N (0,1)) were simulated. 60 different conditions (two panel designs (1-3-3; 1-2-2), three module lengths (10-15-20), 5 different a parameter sequences ("0.8; 0.8; 0.8"-"1.4; 0.8; 0.8"-"0.8;1.4; 0.8"-"0.8; 0.8;1,4"-"1.4; 1,4; 1.4") and two b parameter difference (small; large) conditions) were taken into consideration during analysis. Correlation, RMSE and AAD values of conditions were calculated. Conditional RMSE values corresponding to each ability level are given in a graph. Dissimilar to other studies in the literature, this study examines b parameter difference condition in three-stage tests and its interaction with a parameter sequence. Study results show that measurement precision increases as the number and length of the modules increase. Errors in measurement decrease as item discrimination values increase in all stages. Including items with a high value of item discrimination in the second or last stage contributes to measurement precision. In extreme ability levels, large difficulty difference condition produces lower error values when compared to small difficulty difference condition.
Research Interests:
Research Interests:
This is a post-hoc simulation study which investigates the effect of different item difficulty distributions, sample sizes, and test lengths on measurement precision while estimating the examinee parameters in right and left-skewed... more
This is a post-hoc simulation study which investigates the effect of different item difficulty distributions, sample sizes, and test lengths on measurement precision while estimating the examinee parameters in right and left-skewed distributions. First of all, the examinee parameters were obtained from 20-item real test results for the right-skewed and left-skewed sample groups of 500, 1000, 2500, 5000, and 10000. In the second phase of the study, four different tests were formed according to the b parameter values: normal, uniform, left skewed and right skewed distributions. A total of 80 conditions were formed within the scope of this research by selecting 20-item and 30-item condition as the test length variable. In determining the measurement precision, the RMSE and AAD values were calculated. The results were evaluated in terms of the item difficulty distributions, sample sizes, and test lengths. As a result, in right-skewed examinee distribution, the highest measurement precision was obtained at the normal b distribution and the lowest measurement precision was obtained at the right skewed b distribution. A higher measurement precision was obtained in the 30-item test, however, it was observed that the change in the sample size didn't affect the measurement precision significantly in right-skewed examinee distribution. In the left skewed distribution, the highest measurement precision was obtained at the normal b distribution and the lowest measurement precision was obtained at the left-skewed b distribution. Also it was observed that the change in the sample size and test length didn't affect the measurement precision significantly in the left-skewed distribution.
Research Interests:
Öz Bu araştırmada öğretmen adaylarının öz-akran ve öğretmen değerlendirme-sine ilişkin görüşlerini ortaya çıkarmak amaçlanmıştır. Araştırma nitel araştırma yaklaşımına uygun olarak tasarlanmıştır. Araştırmanın çalışma grubunu bir devlet... more
Öz Bu araştırmada öğretmen adaylarının öz-akran ve öğretmen değerlendirme-sine ilişkin görüşlerini ortaya çıkarmak amaçlanmıştır. Araştırma nitel araştırma yaklaşımına uygun olarak tasarlanmıştır. Araştırmanın çalışma grubunu bir devlet üniversitesinde öğrenim gören 37 öğretmen adayı oluşturmaktadır. Araştırmada öğretmen adaylarının öz, akran ve öğretmen değerlendirmesine ilişkin görüşleri alınmıştır. Veri analiz yöntemi olarak içerik analizi kullanılmıştır. Araştırmanın bulgularına göre öz, akran ve öğretmen değerlendirmesinin üstün yanına ilişkin sırasıyla üç, dört, altı olmak üzere toplam on üç, sınırlı yanına ilişkin sırasıyla iki, üç, iki olmak üzere toplam yedi tane alt kategori belirlenmiştir. Abstract In this study, it was aimed to reveal the opinions of preservice teachers on self-,peer-and teacher assessments. Qualitative research design was used in this study. The study group of the research consisted of 37 preservice teachers at a public university. The preservice teachers' opinions about self, peer and teacher assessments were examined.Content analysis method was used to analyze data. According to the findings of the research, regarding the superior side of self, peer and teacher evaluation thirteen subcategories were determined and these categories was three, four and six, respectively. In relation to the limited side, seven subcategories were determined and these categories two, three and two, respectively.
Research Interests:
Research objective is comparing the objective methods often used in literature for determination of differential item functioning (DIF) and the subjective method based on the opinions of the experts which are not used so often in... more
Research objective is comparing the objective methods often used in literature for determination of differential item functioning (DIF) and the subjective method based on the opinions of the experts which are not used so often in literature. Mantel-Haenszel (MH), Logistic Regression (LR) and SIBTEST are chosen as objective methods. While the data of an extensive examination in Turkey applied for objective methods, the data that are obtained from Expert Opinions Form used to evaluate the items of the same examination. The data obtained from 5077 female and 5271 male students are used for the objective methods, and 23 experts' opinions are used for subjective method. The concordance between the objective and subjective methods is calculated by using the compatibility rate and Cohen's kappa coefficient in the research. While the highest concordance related to the existence of DIF is obtained between MH and SIBTEST methods (.90; κ=0,79) and the lowest concordance is between LR and SIBTEST methods (.75; κ=0,50) in objective methods, When the concordance of the objective method with the subjective methods is examined, at least moderate concordance (.75; κ=0,47) is obtained in the decision. When items which have DIF is examined according to DIF level, three items indicate low level of DIF and one item indicates moderate or high level of DIF for both methods. In addition, in subjective method, a decision study is made on the number of the experts presenting opinion within the generalizability theory and the acceptable reliability value is reached with 13 experts' opinions.
Research Interests:
ÖZET Bireyselleştirilmiş testlerde, geleneksel testlerden farklı olarak test algoritması söz konusudur. Test algoritması; teste başlama, devam etme ve testi sonlandırma olmak üzere üç bölümden oluşmaktadır. Bu çalışmanın amacı,... more
ÖZET Bireyselleştirilmiş testlerde, geleneksel testlerden farklı olarak test algoritması söz konusudur. Test algoritması; teste başlama, devam etme ve testi sonlandırma olmak üzere üç bölümden oluşmaktadır. Bu çalışmanın amacı, bireyselleştirilmiş bilgisayarlı test (BBT) uygulamalarında farklı sonlandırma kurallarının kullanılmasının ölçme kesinliğine ve test uzunluğuna etkisini incelemek ve birbirleri ile karşılaştırmaktır. Araştırma simülasyon çalışması olarak yürütülmüştür. Araştırma kapsamında sabit uzunluk, standart hata, standart hata-en az madde, theta yakınsama ve theta yakınsama-en az madde olmak üzere beş farklı sonlandırma kuralı kullanılmıştır. Her bir sonlandırma kuralında farklı koşullar söz konusu olup toplam 12 koşul birbiriyle karşılaştırılmıştır. Ayrıca sonlandırma kurallarının karşılaştırılmasında BBT'de test algoritmasında önemli yere sahip olan farklı madde havuzu büyüklükleri (250 ve 500 madde) ve yetenek kestirim yöntemleri (Maksimum Olabilirlik Kestirimi ve Beklenen Sonsal Dağılım) seçilmiştir. Her bir BBT uygulamasında ölçme kesinliği için RMSE, yanlılık ve 1 Bu makale aynı başlıklı doktora tezinden üretilmiştir.
Research Interests:
This study analyses peer assessment through many facet Rasch model (MFRM). The research was performed with 91 undergraduate students and with lecturer teaching the course. The research data were collected with holistic rubric employed by... more
This study analyses peer assessment through many facet Rasch model (MFRM). The research was performed with 91 undergraduate students and with lecturer teaching the course. The research data were collected with holistic rubric employed by 6 peers and the lecturer in rating the projects prepared by 85 students taking the course. This study analyses raters, measurements for students who are rated, criteria used in rating and extent to which rubrics fulfil their function. Moreover, it also investigates effects of peers' levels of achievement on the process. In consequence, it was found that raters differed in the levels of strictness and generosity in rating, and that students were distinguished adequately in terms of the property measured. Besides, a very high level of reliability value was estimated in relation to the criteria in the study. This was interpreted as that they functioned in a reliable way in distinguishing between students' performances. It was found in the analyses of achievement levels of peers taking part in peer assessment that ratings made by students with high levels of achievement differed significantly from those made by students with medium or low level of achievement. Finally, the views about peer assessment were generally positive. Keywords: peer assessment, many facet Rasch model, levels of peer achievement, rubric
ÖZET: Bu çalmada, " Newton'un Hareket Yasalar " ünitesinde örencilerin baarlar ölçmek için kullanlan kavram haritas ve yaplandlm gridin geçerlik ve güvenirlikleri aralmr. Çalma grubunu, 2009-2010 öretim nda Fen Bilgisi Öretmenlii ve Fizik... more
ÖZET: Bu çalmada, " Newton'un Hareket Yasalar " ünitesinde örencilerin baarlar ölçmek için kullanlan kavram haritas ve yaplandlm gridin geçerlik ve güvenirlikleri aralmr. Çalma grubunu, 2009-2010 öretim nda Fen Bilgisi Öretmenlii ve Fizik Öretmenlii program 1. sfta okuyan 102 örenci oluturmaktadr. Kavram haritas ve yaplandlm gridin geçerlii için ölçüt olarak ayn konuda gelitirilen ksa cevapl bir test kullanlmr. Kavram haritasndan ve ksa cevapl testten elde edilen puanlar arasndaki Pearson korelasyon katsay 0.57 bulunmutur (p<0.05). Kavram haritasn iç tutarlk anlamnda güvenirlik katsay olan Cronbach alfa katsay 0.69'dur. Yaplandlm gridden elde edilen puanlar ile ksa cevapl testten elde edilen puanlar arasndaki iliki 0.69'dur (p<0.05). Yaplandlm gridin güvenirlii için Cronbach alfa katsay hesaplanm ve 0.77 bulunmutur. Kavram haritas ve yaplandlm grid arasnda pozitif yönde, orta düzeyde ve anlaml bir iliki olduu görülmütür (0.51; p<0.05). Anahtar sözcükler: kavram haritas, yaplandlm grid, tamamlay ölçme araçlar, geçerlik, güvenirlik ABSTRACT: This study aims to research on the validity and reliability of concept maps and structured communication grids used to assess students' achievement regarding the unit called " Newton's Laws of Motion ". The sampling consisted of 102 students studying in their Year One at Science Teaching and Physics teaching departments during 2009 – 2010 academic year. The responses to the short-answer test consisted of the main measure in checking the validity of the concept map and the structured communication grid. The Pearson Correlation Coefficient between the scores of the short-answer test and the concept map was calculated to be 0.57. In order to find the reliability state of the concept map, the calculation concluded as 0.69. The degree of the relationship between structured communication grid and short answer test scores was calculated (0.69; p<0.05). For the reliability of the structured communication grid, Cronbach Alpha was calculated 0.77. A positive, medium level and significant relationship was observed between the structured communication grid and the concept map (0.51; p<0.05).
Research Interests:
In this study, the effectiveness of peer assessment, which has an important role in measurement and evaluation, was attempted to be defined. For this purpose, performance task, which is one of the alternative assessment techniques, was... more
In this study, the effectiveness of peer assessment, which has an important role in measurement and evaluation, was attempted to be defined. For this purpose, performance task, which is one of the alternative assessment techniques, was evaluated with the help of a scoring rubric prepared by the researchers. As a basic research, the working group was 41 sophomore students and their instructor. Three of 41 students were acted as rater and they rated their 38 peers' performances with the instructor. The analysis of the data was carried out by using fully crossed two-facet design (sxtxr) of generalizability theory in three steps: G-studies for peer and peers-instructor ratings and D-study for peer ratings. According to the results of the G studies, the reliability coefficient obtained from the peer ratings and peer-instructor ratings were quite high (0.86 and 0.82 respectively). According to the result of the D study of peer ratings, just two peer raters are enough for getting high reliability coefficient. With the help of the gained results, it is suggested that peer assessment, which is effective on learning and decision making processes of students, should be used more often in education systems.