The Applicability of a Scale on Self-Regulated Writing Strategies in English for High School Students

Abstract: Writing appeared to be crucial in learning foreign languages. This study aimed at examining the applicability of a measure on self-regulated writing strategies in terms of its psychometric properties. The scale items were designed from theoretical foundations and previous relevant studies. Questionnaires were distributed online to high school students, collected and analyzed as many as 106. Correlation, confirmatory factor analyses, and calculating Cronbach’s alpha were carried out to make sure the applicability of the scale. The findings showed five main constructs to be applicable. They were self-initiating, planning, text-generating, revising, and acting on feedback.

Third, the Self-Regulated Learning Strategy Questionnaire (SRLSQ) (Abadikhah et al., 2018) which contains 60 items with a 5-point Likert-scale. It consists of 6 dimensions, namely motive, method, time, performance, physical environment, and social environment. The reliability of this scale, according to Abadikhah et al. (2018) is 0.95 for Persian speakers who learn English. Even though it has a high reliability, giving 60 questions to participants certainly takes a long time, and to improve the practicality of this questionnaire, it is necessary to reduce the questions item.
Fourth, the Questionnaire of English Writing Self-Regulated Learning Strategies (QEWSRLS) (Sun & Wang, 2020) which contains 26 questions with a 4-point Likert-scale (from 0 which means 'never' to 3 which means 'often'). This questionnaire is based on an adaptation of the Questionnaire of English Self-Regulated Learning Strategies (Wang & Bai, 2017). The QEWSRLS scale consists of 3 categories, namely environmental SRL strategies, behavioural SRL strategies, and personal SRL strategies. The internal reliability of this scale ranges from 0.65 to 0.88 (Sun & Wang, 2020).
Based on searches from the internet, the authors did not find many studies that examine the use of SRW strategies in Indonesia. So far, only one has mentioned the use of SRW by university students (Umamah & Cahyono, 2020). The authors, however, had not been able to find literature that discusses SRW strategies by high school students in Indonesia. The study conducted by Umamah & Cahyono (2020) only used the SRL questionnaire, so it did not use the SRW questionnaire. High school is an important level of education for preparing students to enter higher degree of education. This shows that even though it is needed, there is still no SRW instrument developed for research purposes in Indonesia. Therefore, this study aimed to examine the applicability of Writing Strategy Scale (WSS) that we have developed based on the adaptation from the previous research instruments. It is hoped that this study instrument would be valid and applicable for the SRW research in Indonesia. To that end, the researcher poses the following research questions: Is the SRW instrument developed in this study valid, meaningful, and reliable?
This study aimed at examining the applicability of a scale that we developed by adapting based on constructs from previous studies. Adaptations from the previous literature were mainly carried out on the number of questions and the Likert scale range. The results of the adaptation were then tested in order to determine the validity and reliability of the new scale when applied in other countries. The applicability of a scale can refer to validity and reliability (Cipora, Szczygieł, Willmes, & Nuerk, 2015;Halse, Bjørkløf Engedal, Rokstad, Persson, Eldholm, Selbaek, & Barca, 2020;Morgado, Meireles, Neves, Amaral, & Ferreira, 2017); performance, safety, time and cost efficiency (Al-Bahlani & Babadagli, 2011;Guida, 2021); and sharing the same philosophical orientation or cultural values towards scale items (Yu, de Maria, Barbaranelli, Vellone, Matarese, Ausili, Rejane, Osokpo, & Riegel, 2021). This present study used the very basic and widely used terms, i.e finding the appropriate validity, meaningfulness, and reliability known as psychometric properties. This scale was written in Indonesian language to solve the cultural and linguistic matters as stressed by Yu et al. (2021). However, the meaningfulness, validity and reliability of the scale needed to be assessed before the scale can be used.
Based on the four kinds of questionnaires that have been developed in previous research, this study used 7 constructs, namely: self-initiating, planning, text generating, self-monitoring, and management, revising, acting on feedback, and resourcing. The seven constructs express the writing skills. In addition, self-initiating, and self-monitoring and management are related to self-regulation. The other five constructs (planning, text generating, revising, acting on feedback, and resourcing) are often used in writing for various educational levels.

METHODS
This type of research is quantitative in nature by applying the idea of item-response-theory (IRT) which has long been applied in psychometric in which the characteristics of the items as a data source determine the meaning of groups which are called latent variables (De Ayala, 2013;Bock & Gibbons, 2021;Gorsuch, 2015;McDonald, 2014;Nering & Ostini, 2011). The applicability of items and constructs was traced in stages by taking into account the self-regulated writing scale grid, research samples, and the following analytical strategies.
This study was conducted at SMA Kolese St. Yusuf Malang or popularly called Kosayu High-School in the city of Malang, Indonesia. According to the school website (https://www.smakkosayu.sch.id/v3/about-us/), Kosayu High-School is one of the largest private schools in Malang with a population of 1,293 students originating from all provinces throughout Indonesia with equal distributions between male and female students. The education is coeducational which is almost balanced the students' gender. As in general, most high schools in Indonesia consist of three major streams namely science, social studies, and language. Two dominant majors in this school are science and social studies. The language major, on the other hand, has only one class in grades 11 and 12 with student number ranging from 10 to around 20 people. The sample of this study was 57.5% from science and 42.5% from social studies majoring students. As part of the study, on the problem of sample adequacy, the Kaiser-Meyer-Olkin (KMO) analysis was performed to solve the problem of sample adequacy. The results can be seen in Table 1. The instrument was arranged based on the grid outline, as many as 7 dimensions with 38 items. A questionnaire was composed several questions about the respondent's background and all the statements from the grid in Google Form (GF). The GF link was distributed via mobile phone to a random sample of respondents through the subject teacher after obtaining Kaluge, Halimi, The Applicability of… 275 permission from the School Principal. After 2 weeks of distributing the online questionnaire, 107 filled it out but one student was not willing his data to be used in this research. Thus, this study analyzed data from a sample of 106 with details of 42.1% males and 57.9% females, born in the years of 2003-2006. The respondents came from 14 different classes all of whom had experienced doing writing assignments (ranging from 5-24 times) given from their English teacher.
Data analysis was carried out by using two software, namely IBM-SPSS and LISREL respectively. In guaranteing the meaningful of the scale and its dimensions, we used the Indonesian language and the grid outline for scale development to make sure everything was clear with no misconception, then followed by the next three processes. First, item analysis with IBM-SPSS was conducted to sort valid items (represented in the item-total correlation) and reliability contribution (alpha criteria not exceeding 0.65 if the item is deleted). Second, confirmation of constructs and items was done through confirmatory factor analysis (CFA) embedded in LISREL, includes considering goodness of fit and inter-correlation between factors. CFA was chosen because the SRW scale has been arranged using a grid so that the number of dimensions and related items can already be seen. Third, ensuring the reliability of the results of the last selection was done by calculating Cronbach's alpha from the results of the CFA. In addition to the criteria for forming a dimension, there was a minimum of 3 items with a minimum loading (λ) > 0.3 and t value > 1.96 which being indicated by a black line (not red) (see Figure 1).  (8) 3. Text-generating (4) 4. Self-monitoring (5) 5. Revising (4) 6. Acting on feedback (6)  Regarding sample size, usually factor analysis requires a big number ideally 1,000 or at least 300 subjects (Tabachnick & Fidell, 2013). However, Table 1 showed that the results of the KMO calculation proved significant on Barlett's test of sphericity, meaning that a sample of 106 was sufficient for further analysis for the seven dimensions with all item contents in it.

FINDINGS AND DISCUSSION
The applicability assessment of the scale under study was conducted through three stages that had been mentioned, namely item analysis, confirmation of the validity of the dimensions and model fit, and reliability testing. Everything was presented and discussed promptly. First, the item analysis considered the item-total correlation and Cronbach's alpha value if the item was discarded, to decide whether the item should be retained or discarded. In Table 2 it was shown that of the 38 items formulated, eleven did not meet the specified criteria and thus fell out. The dropped items were 5, 6, 11. 13, 18, 19, 22, 32. 33, 35, 38, because the item-total correlation or alpha coefficient if the item deleted did not meet the predetermined criteria. Two dimensions also removed because they failed to have a minimum number of items, namely self-monitoring, and resourcing. The deletion of these two dimensions did not mean that they were not important, but the possibility that the meaning of these items and dimensions had not been commonly realized or done by high school students so that they responded to them in a very varied manner. Because of this, they were not consistent with each other. In addition, the reversed items were still used but researchers need to be aware of those items that had an inverse meaning (unfavorable items). For such remained items, the values were recoded backwards (4 changed to 3, 3 to 2, 2 to 3, and 1 to 4) before being factor-analyzed (CFA) in the second stage. By recoded such items, the results of the CFA need to be observed and interpreted in reverse from their meaning so as not to cause misunderstandings for users in the future. The findings shown in Figure 1 were the results of all items that preserved and were analyzed through the CFA. All items were significantly charged (t value was greater than the criterion 1.96). The correlations between various factors in Table  3, some were high and low in addition to being significant and some were not (appears in red in Figure 1). This confirmed that the selection of the maximum likelihood extraction on CFA was correct.  1.000 Notes: All the correlation coefficients were significant, p = 0.00.
In terms of goodness of fit, Table 4 provided an interesting illustration. Not all coefficients from the analysis (actual value) satisfied the required criteria, for example Chi-square, GFI, NFI, CFI, IFI, and RFI. However, as a rule of thumb, if one of the criteria was met, then the resulting model was considered to have fulfilled the requirements (Brown, 2015;McNeish &Wolf, 2020). Because the other three criteria RMSEA, AGFI, and RMR met the fitness requirements, it was concluded that the CFA model was considered fit and deserved to be considered applicable. As previously mentioned, a construct needed at least three items to provide the minimum coverage (Hair, Black, Babin, & Anderson, 2019;McDonald, 2014) although it could be argued that one or two valid items should be kept if valid. Due to the meaningfulness, a construct is considered under identified if it is consisted of less than three indicators. In addition, with two or less indicators, the analysis would jump into deficit situations since negative degree of freedom, limited bivariate correlation, poor identification in meaning, and lack of appropriate analysis particularly in CFA (Bonifay & Cai, 2017;Brown, 2015;Mair, 2018;Heninger & Meiser, 2020;Tachnick & Fidel, 2013). Thus, the two constructs (self-monitoring and resourcing) that only had two items each were aborted. In general, Table 5 revealed that of the seven dimensions of WSS compiled, in the end only five met the requirements so that they could be applied to high school students. The five dimensions were self-initiating, planning, text-generating, revising, and acting on feedback. Cronbach's alpha ranged between 0.654 and 0.835 means reliable. It should be borne in mind that this instrument was not intended to be used as a diagnostic measuring tool to find weaknesses, but to reveal strategies commonly used by high school students when working on writing assignments in a foreign language. Teachers could take advantage of information by using this scale for teaching purposes by starting from what is commonly practiced by students. The final result of the WSS were attached as in the Appendix. This study aimed to test the application of the self-regulated writing scale of students in senior high schools in Indonesia. The results of the study agreed that the overall scale could be applied in terms of the validity and reliability of the items and their dimensions as well. This result confirmed the earlier ideas adapted during the initial preparation which supposedly described the general school context in this country. The load (λ) of the results of the factor analysis showed that all constructs were proven to be valid. There were five constructs related to self-regulated writing expressed by students, namely self-initiative, planning, text generation, self-monitoring, acting on feedback, and resources.
The authors recognized that there were methodological limitations that deserve to be discussed here. Item response theory (De Ayala, 2013;Nering & Ostini, 2011) adopted in this study was indeed popular but still raises doubts and is currently being developed both in concept form and in software form. Three limitations were recognized in this study. First, related to the emergence of the idea of hierarchical factor analysis. The data of this research was not possible to be analysed due to the absence of level indicator variables so that the analysis of flat data still suffers from doubts. The second weakness relates to local cultural values. Human behaviour varies due to different cultural backgrounds, thus allowing for different interpretations Kaluge, Halimi, The Applicability of… 279 of each statement in the instrument (Rahman, 2020). Unfortunately, this consideration was overlooked early on in tool preparation. Third, due to the outbreak of the corona virus, educational practices around the world have changed from traditional patterns to heutagogy and cybergogy, although the data had been collected during this pandemic. Therefore, it is very possible that the applicability of the instrument is questioned when the pattern of educational practice changes and there are many variations in society.
Despite the limitations available, the WSS developed in this study is a valid instrument to use for future studies. The strength of this scale is the limited number of items (22), in comparison to 60 items in Abadikhah et al. (2018). Researchers could implement the scale to measure student's strategies in writing, specifically in Indonesian context.

CONCLUSION
From the findings of this study, it becomes clear that the WSS is applicable because it is a valid and reliable scale after going through the process of item analysis and factor confirmation. The gauge maintains the previous structure which remains the same as the original seven dimensions. These results indicate that this scale can be applied to high school students in Indonesia. Dimensions appear valid and reliable; although the indicator structure has changed because it does not meet the criteria. This experience explains the meaning of application in different educational environments, and locations, that have unique cultural contexts, and systematic structure.