Abstract
For the last two decades Guatemala has developed an educational assessment system for accountability purposes following a continuous improvement cycle. The system is nowadays led by the Ministry of Education’s Dirección General de Evaluación e Investigación Educativa [General Directorate for Evaluation and Educational Research] (DIGEDUCA). There are data available now to analyze whether current assessment practices provide necessary information for effective policy decisions. The set of studies in this thesis explores the aforementioned issues with the aim to achieve three objectives: 1) Using data obtained from national assessment projects of elementary grades in Guatemala to answer questions with direct implications for the management of the education system; 2) Conducting up-to-date statistical analyses to ensure that comparisons between groups that differ in background characteristics have a solid psychometric foundation; 3) Providing recommendations as a way to move national assessment forward. To meet the objectives, four studies were developed based on DIGEDUCA’s experience taking into consideration substantive factors known to be relevant in Guatemala: ethnicity, language, socio-economic conditions, gender, and the urban-rural divide.
The first study reviews the assessment experience in Guatemala exploring the country’s context and the logistic and technical demands of educational testing. The paper describes how assessment units in heterogeneous contexts needs to develop valid information that is free of bias to orient and support the decisions made by policy administrators. However, local stakeholders (especially parents and teachers) and policy administrators will interpret assessment results according to their own background, experience, and needs. The policy administrator will seek to adhere to political and budgetary calendars, adopt procedures with low costs and aggregate data. At the local level, teachers and parents will prefer results that provide information for individuals and that are easily aligned to teaching and learning activities. As a result of the divergent demands on assessment, tensions may emerge between consumers and producers of assessment information. Psychometricians and test-developers will have to negotiate quality considerations in the design of the assessment that will affect the interpretation of the data (such as sampling strata) from those that will invalidate them (such as administering tools that have not been appropriately piloted). The outcome could alter the scope of the assessment, but should not alter its scientific approach. A common perspective is required to link the underlying assumptions that justify the policy with the psychometric and theoretical characteristics of the assessment.
The second paper is the first of three empirical studies (all of which use secondary data from large-scale educational testing). It explores Differential Item Functioning (DIF) and educational risk factors in Guatemalan reading tests. Different DIF estimation methods and effect size measures are used to identify patterns in the data (chi-square, Rasch, and logistic regression). There is evidence of substantial DIF, but purification (i.e., removal of items flagged for DIF) may not change the conclusions from group comparisons. Differences remain between groups with different values in four risk factors (over-age, urban/rural area of residence, ethnicity, and gender) and this highlights the impact that factors of exclusion bring into the education system. Risk factors act in concert to create sources of bias, but the differences in test scores between the “privileged” and “underprivileged” cannot be explained in terms of item bias.
The second of the empirical studies investigates the relationship of exposure to Spanish and socioeconomic status (SES) with reading and math achievement in third and sixth grade. The results of a multigroup structural equation model (SEM) and multivariate analysis of covariance lead to the conclusion that SES and exposure to Spanish are associated with achievement across grades, ethnicity, and area of residence. Exposure to Spanish is more relevant in lower grades, whereas area of residence is a consistent predictor across grades. An analysis of the invariance of the associations across the various groups suggests that background factors and educational achievement are linked in the same way across ethno-linguistic groups. These results confirm previous research indicating that socioeconomic status is a positive predictor of school achievement and that the pupils’ familiarity with the language in which they are tested is positively related to their achievement results.
The third empirical study employed data from the Second Regional Comparative and Explanatory Study of the Latin American Laboratory of Evaluation of Educational Quality (SERCE of LLECE). An analysis was conducted of the role that reading, tested in Spanish, plays in the relationship of the mother language of pupils and the area of residence (urban or rural) with math tested in Spanish in four Latin American countries with sizable indigenous populations: Ecuador, Guatemala, Mexico and Peru. Findings indicate that non-indigenous pupils in urban and rural areas outperform indigenous pupils in urban and rural areas, that reading plays a mediation role between language and math and between area and math, and that this pattern is constant across countries. These findings stress the need for early literacy instruction of pupils to ensure good results in other curriculum areas (math in this case). Since contemporary research supports the relevance of mother language in literacy acquisition, it is advisable that this early reading instruction take place in the mother language of pupils (i.e., the most dominant language of pupils). The findings also suggest that measures of heterogeneous linguistic competency need to be introduced in large-scale standardized testing of bilingual populations.
Together, the studies suggest that Guatemalan elementary level assessment meets basic psychometric standards; there is significant item bias, but this bias does not change the conclusions about group differences that have been drawn because socioeconomic factors exert a very large influence which is not eliminated by removing item bias. Indigenous, rural pupils who have low exposure to Spanish exhibit lower performance in achievement results, even when item bias is reduced. Reading instruction in the most dominant language of pupils should be a priority and testing, when possible, should be carried out in the same language. However, since instruction in the most dominant language is not always possible due to financial and practical reasons, control measure of competency in the language of assessment should be developed to have a clearer picture of pupils’ achievement.
In terms of the assessment projects, the findings indicate that Guatemalan national assessment has followed a consistent path of development. The particular tests included in this set of studies seem close to reaching Pareto efficiency in terms of item bias, i.e., a level where the costs of further improvements would not translate into commensurate improvements in quality of results. However, there is still significant room to work on background information of pupils, their families, their schools and the processes taking place in their classrooms. Concrete examples from the empirical studies are the poor scales on socioeconomic status and the lack of linguistic competency scales.
In terms of the alignment to national policy the tests seem to meet the requirements set in its “quality assurance” framework. However, given the state of affairs in research on the background variables and their links to achievement results, the assessment is not yet ready to provide the evidence on which a wider scope of policies can be based.
These conclusions lead to four recommendations: (1) To continue collecting and improve the data on risk factors; (2) To develop measures of linguistic competence; (3) To develop links between assessment and multiple policy initiatives; (4) To inform stakeholders about the technical issues of the assessment in a way that is attuned to their policies.
The first study reviews the assessment experience in Guatemala exploring the country’s context and the logistic and technical demands of educational testing. The paper describes how assessment units in heterogeneous contexts needs to develop valid information that is free of bias to orient and support the decisions made by policy administrators. However, local stakeholders (especially parents and teachers) and policy administrators will interpret assessment results according to their own background, experience, and needs. The policy administrator will seek to adhere to political and budgetary calendars, adopt procedures with low costs and aggregate data. At the local level, teachers and parents will prefer results that provide information for individuals and that are easily aligned to teaching and learning activities. As a result of the divergent demands on assessment, tensions may emerge between consumers and producers of assessment information. Psychometricians and test-developers will have to negotiate quality considerations in the design of the assessment that will affect the interpretation of the data (such as sampling strata) from those that will invalidate them (such as administering tools that have not been appropriately piloted). The outcome could alter the scope of the assessment, but should not alter its scientific approach. A common perspective is required to link the underlying assumptions that justify the policy with the psychometric and theoretical characteristics of the assessment.
The second paper is the first of three empirical studies (all of which use secondary data from large-scale educational testing). It explores Differential Item Functioning (DIF) and educational risk factors in Guatemalan reading tests. Different DIF estimation methods and effect size measures are used to identify patterns in the data (chi-square, Rasch, and logistic regression). There is evidence of substantial DIF, but purification (i.e., removal of items flagged for DIF) may not change the conclusions from group comparisons. Differences remain between groups with different values in four risk factors (over-age, urban/rural area of residence, ethnicity, and gender) and this highlights the impact that factors of exclusion bring into the education system. Risk factors act in concert to create sources of bias, but the differences in test scores between the “privileged” and “underprivileged” cannot be explained in terms of item bias.
The second of the empirical studies investigates the relationship of exposure to Spanish and socioeconomic status (SES) with reading and math achievement in third and sixth grade. The results of a multigroup structural equation model (SEM) and multivariate analysis of covariance lead to the conclusion that SES and exposure to Spanish are associated with achievement across grades, ethnicity, and area of residence. Exposure to Spanish is more relevant in lower grades, whereas area of residence is a consistent predictor across grades. An analysis of the invariance of the associations across the various groups suggests that background factors and educational achievement are linked in the same way across ethno-linguistic groups. These results confirm previous research indicating that socioeconomic status is a positive predictor of school achievement and that the pupils’ familiarity with the language in which they are tested is positively related to their achievement results.
The third empirical study employed data from the Second Regional Comparative and Explanatory Study of the Latin American Laboratory of Evaluation of Educational Quality (SERCE of LLECE). An analysis was conducted of the role that reading, tested in Spanish, plays in the relationship of the mother language of pupils and the area of residence (urban or rural) with math tested in Spanish in four Latin American countries with sizable indigenous populations: Ecuador, Guatemala, Mexico and Peru. Findings indicate that non-indigenous pupils in urban and rural areas outperform indigenous pupils in urban and rural areas, that reading plays a mediation role between language and math and between area and math, and that this pattern is constant across countries. These findings stress the need for early literacy instruction of pupils to ensure good results in other curriculum areas (math in this case). Since contemporary research supports the relevance of mother language in literacy acquisition, it is advisable that this early reading instruction take place in the mother language of pupils (i.e., the most dominant language of pupils). The findings also suggest that measures of heterogeneous linguistic competency need to be introduced in large-scale standardized testing of bilingual populations.
Together, the studies suggest that Guatemalan elementary level assessment meets basic psychometric standards; there is significant item bias, but this bias does not change the conclusions about group differences that have been drawn because socioeconomic factors exert a very large influence which is not eliminated by removing item bias. Indigenous, rural pupils who have low exposure to Spanish exhibit lower performance in achievement results, even when item bias is reduced. Reading instruction in the most dominant language of pupils should be a priority and testing, when possible, should be carried out in the same language. However, since instruction in the most dominant language is not always possible due to financial and practical reasons, control measure of competency in the language of assessment should be developed to have a clearer picture of pupils’ achievement.
In terms of the assessment projects, the findings indicate that Guatemalan national assessment has followed a consistent path of development. The particular tests included in this set of studies seem close to reaching Pareto efficiency in terms of item bias, i.e., a level where the costs of further improvements would not translate into commensurate improvements in quality of results. However, there is still significant room to work on background information of pupils, their families, their schools and the processes taking place in their classrooms. Concrete examples from the empirical studies are the poor scales on socioeconomic status and the lack of linguistic competency scales.
In terms of the alignment to national policy the tests seem to meet the requirements set in its “quality assurance” framework. However, given the state of affairs in research on the background variables and their links to achievement results, the assessment is not yet ready to provide the evidence on which a wider scope of policies can be based.
These conclusions lead to four recommendations: (1) To continue collecting and improve the data on risk factors; (2) To develop measures of linguistic competence; (3) To develop links between assessment and multiple policy initiatives; (4) To inform stakeholders about the technical issues of the assessment in a way that is attuned to their policies.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 20 Jun 2017 |
Place of Publication | S.L. |
Publisher | |
Publication status | Published - 2017 |