Equating of standardized science subjects tests using various methods: which is the most profitable?

Muh. Asriadi AM, Heri Retnawati

Abstract


A good test set can be reflected in the quality of the items. It can measure the ability of the test takers reasonably even though they are distributed in several question packages. This study uses an exploratory, descriptive method to determine the equivalence of standardized test sets in science subjects for junior high schools in Indonesia. The data were obtained from the database of Junior High School National Examination results in the subject of Natural Sciences, which consisted of 5 question packages with 40 items/package. The equating technique uses the Item Response Theory 3 PL approach with the help of R Studio Software. The research results show that the national exam questions, which consist of 5 question packages, have a good level of item difficulty and all guesses. However, the discrimination index and several items obtained unfavorable results. In addition to the results of equating the graphical method using the closeness of the test characteristic curve, the Stocking & Lord methods produce the most equivalent scores. These findings can be a reference for test developers or researchers in the field of measurement to produce better and more accurate test kits.

Full Text:

PDF

References


Akin-Arikan, Ç., & Gelbal, S. (2021). A comparison of kernel equating and item response theory equating methods. Eurasian Journal of Educational Research, 21(93), 179–198. https://doi.org/10.14689/ejer.2021.93.9

Aminah, N. S. (2013). Karakteristik metode penyetaraan skor tes untuk data dikotomos. Jurnal Penelitian Dan Evaluasi Pendidikan, 16, 88–101. https://doi.org/10.21831/pep.v16i0.1107

Baker, F. B., & Al‐Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147–162. https://doi.org/10.1111/j.1745-3984.1991.tb00350.x

Bramley, T. (2020). Comparing small-sample equating with angoff judgement for linking cut-scores on two tests. Research Matters, 2017, 23–27.

Cohen, A. S. (1998). An investigation of linking methods under the graded response model. Applied Psychological Measurement, 22(2), 116–130. https://doi.org/10.1177/01466216980222002

der Linden, W. J. va. (2022). What is actually equated in “test equating”? A didactic note. Journal of Educational and Behavioral Statistics, 47(3), 353–362. https://doi.org/10.3102/10769986211072308

Diao, H., & Keller, L. (2020). Investigating repeater effects on small sample equating: Include or exclude? Applied Measurement in Education, 33(1), 54–66. https://doi.org/10.1080/08957347.2019.1674302

Furter, R. T., & Dwyer, A. C. (2020). Investigating the classification accuracy of rasch and nominal weights mean equating with very small samples. Applied Measurement in Education, 33(1), 44–53. https://doi.org/10.1080/08957347.2019.1674307

Goodman, J. T., Dallas, A. D., & Fan, F. (2020). Equating with small and unbalanced samples. Applied Measurement in Education, 33(1), 34–43. https://doi.org/10.1080/08957347.2019.1674311

Hadi, S., Haryanto, H., AM, M. A., Marlina, M., & Rahim, A. (2022). Developing classroom assessment tool using learning management system-based computerized adaptive test in vocational high schools. Journal of Education Research and Evaluation, 6(1), 143–155. https://doi.org/10.23887/jere.v6i1.35630

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. In SAGE Publications, Inc. (Vol. 29, Issue 07). https://doi.org/10.5860/choice.29-4185

Herkusumo, A. P. (2011). Penyetaraan (Equating) Ujian Akhir Sekolah Berstandar Nasional (UASBN) Dengan Teori Tes Klasik. Jurnal Pendidikan Dan Kebudayaan, 17(4), 455–471. https://doi.org/10.24832/jpnk.v17i4.41

Johnson, R. B., & Christensen, L. (2017). Educational research: Quantitative, qualitative, and mixed approaches. In SAGE Publications, Inc.

Kartowagiran, B., Munadi, S., Retnawati, H., & Apino, E. (2018). The equating of battery test packages of mathematics national examination 2013-2016. SHS Web of Conferences, 42(January), 00022. https://doi.org/10.1051/shsconf/20184200022

Kim, S., & Kolen, M. J. (2006). Robustness to format effects of IRT linking methods for mixed-format tests. Applied Measurement in Education, 19(4), 357–381. https://doi.org/10.1207/s15324818ame1904

Kim, S. Y. (2022). Digital module 29: Multidimensional item response theory equating. Educational Measurement: Issues and Practice, 41(3), 85–86. https://doi.org/10.1111/emip.12525

Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices (statistics for social science and public policy) (3rd ed.). Springer Science+Business Media, LLC.

Li, D., & Kapoor, S. (2022). Evaluating population invariance of test equating during the COVID‐19 pandemic. Education Measurement, 41(1), 33–41.

Livingston, S. A. (2014). Equating test scores (without IRT). In ETS Report (Second). Educational Testing Service. papers3://publication/uuid/753FF7E7-6A9F-4F37-99FA-5FC927542973

Lu, R., & Kim, S. (2021). Effect of statistically matching equating samples for common-item equating. ETS Research Report Series, 2021(1), 1–14. https://doi.org/10.1002/ets2.12313

Nisa, C., & Retnawati, H. (2018). Comparing the methods of vertical equating for the math learning achievement tests for junior high school students. Research and Evaluation in Education, 4(2), 164–174. https://doi.org/10.21831/reid.v4i2.19291

Peabody, M. R. (2020). Some methods and evaluation for linking and equating with small samples. Applied Measurement in Education, 33(1), 3–9. https://doi.org/10.1080/08957347.2019.1674304

Retnawati, H. (2016). Perbandingan metode penyetaraan skor tes menggunakan butir bersama dan tanpa butir bersama. Jurnal Kependidikan: Penelitian Inovasi Pembelajaran, 46(2), 164–179. https://doi.org/10.21831/jk.v46i2.10383

Retnawati, H., Kartowagiran, B., Arlinwibowo, J., & Sulistyaningsih, E. (2017). Why are the mathematics national examination items difficult and what is teachers’ strategy to overcome it? International Journal of Instruction, 10(3), 257–276. https://doi.org/10.12973/iji.2017.10317a

Rosidin, U., Herpratiwi, Suana, W., & Firdaos, R. (2019). Evaluation of national examination (UN) and national-based school examination (USBN) in Indonesia. European Journal of Educational Research, 8(3), 827–837. https://doi.org/10.12973/eu-jer.8.3.827

Skaggs, G., & Lissitz, R. W. (1986). IRT test equating: Relevant issues and a review of recent research. Review of Educational Research, 56(4), 495–529. https://doi.org/10.3102/00346543056004495

Supriyati, Y., Iriyadi, D., & Falani, I. (2021). The development of equating application for computer based test in physics hots category. Journal of Technology and Science Education, 11(1), 117–128. https://doi.org/10.3926/jotse.1135

Sutari, V. R. (2017). National examination in Indonesia and its backwash effects: Teachers’ perspectives. Ninth International Conference on Applied Linguistics (CONAPLIN 9), 82(Conaplin 9), 331–333. https://doi.org/10.2991/conaplin-16.2017.76

Uysal, İ., & Kilmen, S. (2016). Comparison of item response theory test equating methods for mixed format tests. International Online Journal of Educational Sciences, 8(2). https://doi.org/10.15345/iojes.2016.02.001

Uysal, İ., Şahin-Kürşad, M., & Kılıç, A. F. (2022). Effect of item parameter drift in mixed format common items on test equating. Participatory Educational Research, 9(5), 143–160. https://doi.org/10.17275/per.22.108.9.5

Wiberg, M. (2021). Practical assessment, research, and evaluation on the use of different linkage plans with different observed-score equipercentile equating methods. Practical Assessment, Research & Evaluation, 26(23), 1–18.

Yurtcu, M., & Güzeller, C. O. (2017). Investigation of equating error in tests with differential item functioning. International Journal of Assessment Tools in Education, January, 50–57. https://doi.org/10.21449/ijate.316420

Yusron, E., Retnawati, H., & Rafi, I. (2020a). Bagaimana hasil penyetaraan paket tes USBN pada mata pelajaran matematika dengan teori respon butir? Jurnal Riset Pendidikan Matematika, 7(1), 1–12. https://doi.org/10.21831/jrpm.v7i1.31221

Yusron, E., Retnawati, H., & Rafi, I. (2020b). Bagaimana hasil penyetaraan paket tes USBN pada mata pelajaran matematika dengan teori respon butir? [What are the results of equating the USBN test package in mathematics with item response theory?].

Jurnal Riset Pendidikan Matematika, 7(1), 1–12. https://doi.org/10.21831/jrpm.v7i1.31221

Zhang, Z. (2020). Asymptotic standard errors of equating coefficients using the characteristic curve methods for the graded response model. Applied Measurement in Education, 33(4), 309–330. https://doi.org/10.1080/08957347.2020.1789142

Zhang, Z. (2022). Estimating standard errors of IRT true score equating coefficients using imputed item parameters. Journal of Experimental Education, 90(3), 760–782. https://doi.org/10.1080/00220973.2020.1751579

Zhu, W. (1998). Test equating: What, why, how? Research Quarterly for Exercise and Sport, 69(1), 11–23. https://doi.org/10.1080/02701367.1998.10607662




DOI: http://dx.doi.org/10.21043/thabiea.v6i1.19503

Refbacks

  • There are currently no refbacks.