Validity and Reliability Analysis of HOTS Multiple Choice Questions in a Chemistry Course at a Senior High School


  • Anis Syafitri Chemistry Education, Universitas Negeri Medan, Jl. Willem Iskandar Pasar V Medan, North Sumatra 20221, Indonesia
  • Murniaty Simorangkir Chemistry Education, Universitas Negeri Medan, Jl. Willem Iskandar Pasar V Medan, North Sumatra 20221, Indonesia
  • Ajat Sudrajat Chemistry Education, Universitas Negeri Medan, Jl. Willem Iskandar Pasar V Medan, North Sumatra 20221, Indonesia



This study examined the validity and reliability of a newly developed multiple-choice evaluation system that measured students’ higher-order thinking skills (HOTS). The instrument test consisted of 45 multiple-choice items and was developed based on the cognitive domain of Bloom’s Taxonomy. A quantitative method was used. It consisted of three phases: Content Validity by inter-rater agreement, Construct Validity by principal component analysis (PCA), and Reliability shown by Chronbach’s alpha. The content validity by inter-rater agreement found that the instrument was categorized as valid. The construct validity by PCA found that each item in the evaluation instrument measured one-dimensionality, which is good to be used as an evaluation instrument test. The reliability was established to be a high degree with Chronbach’s Alpha being 0.94. From the result of this study, a valid and reliable HOTS multiple-choice item evaluation instrument has been produced and is ready to be tested in a small sample to examine its empirical quality.

Keywords: validity, reliability, multiple-choice, evaluation system


[1] Syahida A, Irwandi D. Analisis keterampilan berpikir tingkat tinggi pada soal ujian nasional kimia. Edusains. 2015;7(1):77–87.

[2] Wiwik S. Buku Penilaian Berorientasi Higher Order Thinking Skills. 2015.

[3] Bloom BS, Krathwohl DR, Taxonomy of educational objectives: The classification of educational goals. Book 1, Cognitive domain. Logman; 2020.

[4] Nurwahidah I. Pengembangan soal penalaran model timss untuk mengukur high order thinking (HOT). Thabiea : Journal of Natural Science Teaching. 2018;1(1):20.

[5] Subia GS, Marcos MC, Pascual LE, Tomas AV, Liangco MM. Cognitive levels as measure of higher-order thinking skills in senior high school mathematics of science, technology, engineering and mathematics (STEM) graduates. Technology Reports of Kansai University. 2020;62(3):261–8.

[6] Brookhart SM. How to assess higher-order thinking skills in your classroom. Ascd; 2010.

[7] Nurhayati S, Ningrum RT. Influence of cognitive assessment instrument based higher order thinking skill toward students critical thinking skill. Proceeding of ICMSE. 2016;3(1).

[8] Saido GM, Siraj S, Nordin AB, Al Amedy OS. Higher order thinking skills among secondary school students in science learning. The Malaysian Online Journal of Educational Science. 2015;3(3):13–20.

[9] Hewi L, Shaleh M. Refleksi hasil pisa (the programme for international student assesment): upaya perbaikan bertumpu pada pendidikan anak usia dini). Jurnal Golden Age. 2020;4(01):30–41.

[10] Ghani IB, Ibrahim NH, Yahaya NA, Surif J. Enhancing students’ HOTS in laboratory educational activity by using concept map as an alternative assessment tool. Chem Educ Res Pract. 2017;18(4):849–74.

[11] Heale R, Twycross A. Validity and reliability in quantitative studies. Evid Based Nurs. 2015 Jul;18(3):66–7.

[12] Hulteen RM, Barnett LM, True L, Lander NJ, Del Pozo Cruz B, Lonsdale C. Validity and reliability evidence for motor competence assessments in children and adolescents: A systematic review. J Sports Sci. 2020 Aug;38(15):1717–98.

[13] Taherdoost H. Validity and reliability of the research instrument; how to test the validation of a questionnaire/survey in research. How to test the validation of a questionnaire/survey in research. 2016.

[14] Sudijono A. “Pengantar evaluasi pendidikan.,” p. 2001.

[15] Barak M, Watted A, Haick H. Establishing the validity and reliability of a modified tool for assessing innovative thinking of engineering students. Assess Eval High Educ. 2020;45(2):212–23.

[16] Creswell JW. Educational research: planning, conducting, and evaluating quantitative and qualitative research. 2012.

[17] Eka S, Purba D. Analisis model Rasch instrumen tes prestasi pada mata pelajaran dasar dan pengukuran listrik A Rasch model analysis of instrument achievement test on basic electrical lesson and electrical measurements. Jurnal Penelitian dan Evaluasi Pendidikan. 2018;6(2):142–147.

[18] Bus Umar H. Principal component analysis (pca) dan aplikasinya dengan spss. Jurnal Kesehatan Masyarakat Andalas. 2009;3(2):97–101.

[19] Alfarisa F, Purnama DN. Analisis butir soal ulangan akhir semester mata pelajaran ekonomi sma menggunakan rasch model. 2019;11(2).

[20] Ridho A. Karakteristik psikometrik tes berdasarkan pendekatan teori tes klasik dan teori respon aitem. Insan Media Psikologi. 2007;9(2):83–104.

[21] Santoso A, Kartianom K, Kassymova GK. Kualitas butir bank soal statistika (Studi kasus: instrumen ujian akhir mata kuliah statistika Universitas Terbuka). Jurnal Riset Pendidikan Matematika. 2019;6(2):165–76.

[22] Morad S, Ragonis N, Barak M. The validity and reliability of a tool for measuring educational innovative thinking competencies. Teach Teach Educ. 2021; 97:103193.

[23] Solihatun S, Rangka IB, Ratnasari D. Measuring of student learning performance based on geometry test for middle class in elementary school using dichotomous Rasch analysis. Journal of Physics: Conference Series. 2019;1157(3).




How to Cite

Syafitri, A. ., Simorangkir, M. ., & Sudrajat, A. . (2024). Validity and Reliability Analysis of HOTS Multiple Choice Questions in a Chemistry Course at a Senior High School. KnE Social Sciences, 9(8), 367–376.