Development and evaluation of the Ingwavuma receptive vocabulary test: A tool for assessing receptive vocabulary in isiZulu-speaking preschool children

Background This study used local resources- community members, photographer and speech therapists to develop a new test for screening receptive language skills and sought to determine its feasibility for use with a larger population in KwaZulu-Natal province, South Africa. Objectives The aim of this study was to develop a one-word receptive vocabulary test appropriate for screening and diagnosis of isiZulu-speaking preschool-aged children. The objectives were (1) to determine sensitivity and specificity of the Ingwavuma Receptive Vocabulary Test (IRVT) and (2) to determine the relationship of IRVT scores with age, gender, time and the confounding variables of stunting and school. Method The study was quantitative, cross-sectional and descriptive in nature. The IRVT was piloted before being administered to 51 children (4–6 years old). Statistical analysis of test item prevalence, correlations to confounding variables and validity measurements were conducted using Statistical Package for Social Scientists version 25 (SPSS 25). Results The IRVT was able to profile the receptive skills for the preschool children in Ingwavuma. The mean raw score for boys was 35, and 32 for girls. There was a significant Pearson correlation between test scores and age (0.028, p < 0.05) with a high effect size (Cohen’s d = 0. 949), gender (r = –0.032, p < 0.05) with a medium effect size (Cohen’s d = 0.521) and school (r = 0.033, p < 0.05) with a small effect size (Cohen’s d = 0.353). The sensitivity and specificity values were 66.7% and 33%, respectively. The test reliability (Cronbach’s alpha) was 0.739, with a good test–retest reliability. Conclusion The IRVT has potential as a screening test for isiZulu receptive vocabulary skills amongst preschool children. This study contributes to a development of clinical and research resources for assessing language abilities.


Background
Language is a combination of expressive and receptive communication skills, including the ability to speak, listen, read and write. Language development informs many aspects of child development, including emotional wellness, cognitive abilities and literacy. Information on language development can be sensitive in determining the effects of various medical conditions, such as meningitis, cerebral malaria and human immunodeficiency virus (HIV) (Alcock & Alibhai, 2013). Both expressive and receptive skills are important to understand a child's overall language development. However, receptive language development precedes language production and ultimately launches expressive abilities, thus building the foundation for the development of literacy (Muller & Brady, 2016). childhood (Gilkerson et al., 2018). Language prediction is important for the development of intervention programmes for supporting parents, preschool teachers and therapists to create optimal learning environments. Strong receptive language skills give preschool children a head start in incremental language processing (Venker, Edwards, & Weismer, 2019). Incremental language processing allows the child, as a listener, to understand pieces of conversation before the utterance is even finished and is a discriminating feature in differential diagnosis of autism spectrum and developmental disorders (Venker et al., 2019). Receptive vocabulary has a significant effect on literacy skills in South African studies (Pretorius & Stoffelsma, 2017;Wilsenach, 2015). Children with severe reading fluency and comprehension problems showed poor receptive language, which lead to problems with expressive language (Frazier, 2011). Hence, when probing into language development and preliteracy skills in preschool children, receptive vocabulary is one of the crucial skills to evaluate.
As much as there is a strong case for obtaining information on children's receptive language skills and despite available policy frameworks, there is a challenge in gathering such information using formal clinical assessment methods. There are difficulties in developing clinical tests for assessing children using their own home languages taking into account various cultures and multilingualism in South Africa. We acknowledge the known historical challenges in South Africa relating to the paucity of research on language acquisition, limited publications on minority indigenous languages, few therapists who are able to service clients from various indigenous African languages and an overburdened healthcare system (HPCSA, 2019). Thus, not much has been reported on preschool children in relation to language acquisition for clinical purposes to assist with setting realistic therapeutic expectations for isiZulu speakers (Jordaan & Kunene-Nicolas, 2016).
It is therefore imperative that progressive work conducted in isiZulu should be highlighted and lessons learnt from those processes should be used. These include the Zulu Expressive Receptive Language Assessment, which looked into various aspects of morphology, syntax and semantics for preschool children (Bortz, 1995); the problem-solving Test of Ability to Explain for Zulu-speaking children (Solarsh & Alant, 2006); the vocabulary checklist, used mainly in research, for children aged 2-4 years called the Communicative Development Inventory, which has been adapted to various African languages including isiZulu (Alcock et al., 2015); as well as adaptations of language tests into isiZulu, including the Renfrew Action Picture Test (Mdlalo, 2015) and the British Picture Vocabulary Scale (Cockcroft, 2016).
Challenges such as translations influencing sensitivity (Knauer, Karinger, Jakiela, Ozia, & Fernald, 2019), bias and limited comparison to norms (Weber, Fernald, Galasso, & Ratsifandrihamanana, 2015) experienced in test adaptations justify the need to develop local tests especially for isiZulu, the most widely spoken South African language. The aim of our study was to develop a one-word receptive vocabulary test appropriate for screening and diagnosis of isiZuluspeaking preschool children. The objectives were (1) to determine the sensitivity and specificity of the Ingwavuma Receptive Vocabulary Test (IRVT) and (2) to determine the relationship of IRVT scores with age, gender, time and the confounding variables of stunting and school.
The Bio-ecological Model of Development framed this study considering that a multitude of micro-and macro-systems interact to affect cognitive and language development of any typical child (Bronfenbrenner & Morris, 2007). Since some factors have a distal and some proximal influences on children's cognitive and language development, our study considered elements from the school-, community-and home-level systems. These include poverty, child-headed homes, poor educational settings and the shortage of water and sanitation ( Figure 1). Policy-level issues in the political, educational, health and socio-economic systems affect variability in the linguistic system and have a distal effect on culture and language in the area (Mazibuko, Flack, & Kvalsvig, 2019). Considering that receptive language has an effect on literacy, its impairment may contribute to a language-based learning disability (Lb-LD), a condition highly influenced by genetic or neurological conditions and other systemic imports as indicated in Figure 1.
The interaction of microsystems contributes to not only variations in lateral extensions (variability across all children of a specific age) during development but also in temporal variations (normal variations over time) on language development (McGuinness, 2005).Therefore, socio-economic status (SES) was considered an influencing demographic factor for language development and formed part of risk factors considered in this study as supported by literature on language development in children (Gurgel, Vidor, Joly, & Reppold, 2014  causes irreversible impact on physical and cognitive growth in children (UNICEF, 2020). The effect of stunting on receptive vocabulary was therefore considered in this study because of its direct link with SES and its effect on cognitive development.
We considered that isiZulu-speaking children's inventory has been linked to linguistic input at home (Kunene-Nicolas & Ahmed, 2016). Therefore, the influence of a child's background and developmental history on both the quality and quantity of language development are critical to language development (McLaughlin, Sheridan, & Nelson, 2017). Since the risk factors for speech and language impairment include gender, ongoing hearing problems and a reactive temperament whilst protective factors are maternal well-being, biological and psychosocial factors intrinsic to the child, we sought to investigate the effect of gender, age and school on receptive vocabulary (Harrison & McLeod, 2010).

Study design
This study was a quantitative, cross-sectional, descriptive study. The study area was Ingwavuma, a small town located in the northern KwaZulu-Natal province ( Figure 2). It is close to the borders of eSwatini to the west and Mozambique to the north. It is managed by the Jozini Municipality under the uMkhanyakude district municipality. This municipality is rural, with 89% of the population residing under the jurisdiction of traditional authorities (Statistics South Africa, 2011).
Participants were recruited from 15 preschools and four early childhood development (ECD) centres located in the traditional rural areas of Ndumo, Makwakwa, Manyiseni and Skhemelele within Ingwavuma (red flags in Figure 2). All children above 4 years of age in each preschool were considered for the study, but only those who returned the consent forms and satisfied the inclusion criteria participated. Stratified sampling was used in order to achieve adequate gender and age representation. To be eligible for the study, a child had to be of age between 4.0 and 6.11 years, attend an isiZulu medium preschool or ECD centre in the target area, have no developmental delays, be monolingual isiZulu speaker and pass a hearing screening test.
The government-supported ECD centres in Ingwavuma are located in communities as places of care for children between 3 and 5 years old, operating independently from schools or other non-governmental structures like churches. Based on the teachers's reports, children between 5 and 6 years, depending on their birthday, are placed in grade R at the nearest preschool attached to a primary school and sent to  grade 1 on the year they turn 7 years. The teachers from ECD centres had high school education and were trained locally through workshops organised annually by the Department of Basic Education, whilst teachers placed in preschools attached to a primary school had a basic teaching diploma. The ratio of teacher to students was 1:60 in ECD centres and 1:45 in grade R classes. All preschools that participated had pit toilets and no running tap water. They also had at least one rainwater harvest tank within the facility and provided one meal a day for the children. In our observations on all test dates, the meals did not include green leafy vegetables or meat. Although all schools had government-supplied grade R books, only eight preschools had puzzles or board games and only two schools had a computer for the use of the teacher. None of the preschools had a TV for the children to watch and none had Internet access.

Procedures
The first step of the assessment process was hearing screening, which entailed otoscopic examination and tympanometry (Harrison & McLeod, 2010). A nutritionist determined the participant's anthropometric data by calculating body mass index for age (weight in kg) and height for age (height and arm circumference in cm) to determine stunting (World Health Organization, 2011). The children were classified as having: (1) mild, (2) moderate or (3) severe levels of stunting. The prevalence of stunting was 26% with the majority of the participants found to have adequate nutrition. The IRVT was then administered to each child in a separate area or room. Each participant was tested individually by a clinician, in the presence of a community research assistant to make observations of behaviour. The child sat side by side so that both the child and the clinician could see the pictures.

Data analysis
Data analysis measured Pearson's chi-square and independent samples T-tests identified relationships and effect size of those correlations between receptive vocabulary scores and dependent variables (Price & Jhangiani, 2013). Multivariate logistic regression model was fitted to identify independent predictors of receptive vocabulary amongst the children. Scale analysis was conducted via Cronbach's alpha and split half correlations, where a p-value less than 0.05 was considered statistically significant. The study was approved by the University of KwaZulu-Natal. Parental consent forms were signed for all participants.

Developed test description
The IRVT was designed to evaluate receptive vocabulary in isiZulu, including nouns, verbs, adjectives and categories. The blueprint of the test was influenced by typical one-word vocabulary tests (Dunn & Dunn, 2006;Brownwell, 2000). It was essential that the test was developed in conjunction with mother-tongue speakers of the language to increase cultural sensitivity and the development of procedures and items evolved from our pilot study (Holding, Abubakar, & Kitsao-Wekulo, 2010). Appropriate isiZulu words for the age group of 4-6 years were selected by the principal researcher, as a first-language isiZulu speaker and a speech-language therapist with a 20-year practice experience. The word choices were informed by the northern Zululand dialect and isiZulu grammatical structure, which entails a high number of noun classes and a prefixed grammatical morphology, that is, use of vowels as prefixes before nouns and verbs (Magagula, 2009). Code-switching between English and isiZulu is very common in townships and urban areas but not in rural areas or small towns as it is influenced by local culture (Magagula, 2009). However, code-switching was allowed in the IRVT to accommodate learners from different backgrounds and the sound R was included in the word Zebra as this was an acceptable choice of the word in our sample.
High-frequency vocabulary was selected from a South African Curriculum Assessment Policy Statements (CAPS) grade R and grade 1 life skills student book, whilst pictures were collected from public inventory on the Internet (Department of Basic Education, 2019). Words in noun class were the majority in the target list based on the findings that nouns formed the majority of initial words used by toddlers from ethnolinguistically diverse backgrounds in South Africa (Gonasillan, Bornman, & Harty, 2013).

Pilot study
The first pilot study was conducted in Durban with 10 preschool children from a crèche in Inanda and five PhD students involved in developmental studies. This entailed development of the initial blue print of the test. The second part of the pilot study was conducted on the study site in Ingwavuma in order to finalise the word list adjust for dialectal changes and determine grading of the word list. Observations and experiences of two community research assistants, a second speech therapist and local preschool teachers were used to evaluate the vocabulary choices. The 80% acceptance level was used for inclusion of word choices in the pilot study. The word list was then graded in complexity by the principal researcher, using the grade R and grade 1 CAPS life skills books, with 20 targets at level 1 approximately to 4-5 years of age, 15 at level 2 approximately to ages 5-6 years and 15 at level 3 approximately to 6 years and above. A 50-page picture book was prepared with four pictures per page, full colour and black-and-white pictures, arranged in graded level of difficulty (Table 1). The illustrations contained one target and three distractors where the primary distractor had to be in the same category; the secondary distractor could be associated with the target and one random picture. The instruction was for the child to point to the correct picture amongst three other distractors. A raw score of one point per target was allocated using a kobo collect app, an open-source platform used for collecting and analysing data (Palla, LeBel, & Chavernac, 2016).
For the purpose of this research, all children were started off in item 1 as a baseline and completed all 50 items bearing no ceiling item. Using floor and ceiling items adjusts a starting level according to age and allows the tester to stop the test when a specific number of errors are made (Weitz, 2008). However, in feasibility studies, using floor and ceiling items is known to reduce the ability to distinguish between children of differing abilities (Prado et al., 2010).

Test evaluation
The pilot study indicated challenges with picture legibility and vocabulary choices because of the lack of exposure, attention to detail or a combination of distractors. Example of distractor error was the 'doctor versus nurse' pictures where the presence of both choices introduced confusion. Similarly, for the target word ladder, it included a picture of stairs as a distractor, which led to confusion as the isiZulu word isi-tebhisi could apply to both ladder and stairs. The picture of a net resulted in errors because of the lack of familiarity with the western image of a handheld net as opposed to a fishing net. For the target word vegetable or imifino, the picture choice was initially a carrot, which turned out to be an unpopular choice as imifino usually refers to all green leafy vegetables in isiZulu culture. A picture of spinach was then a solution. Fourteen words matched to easy level with a prevalence between 90% and 100%, whilst moderate had a prevalence between 60% and 80% and 10 difficult items had a prevalence below 50%. These items were not in the expected sequence, implying that some words were more difficult than expected as shown in Appendix 1.

Ethical consideration
This article followed all ethical standards for a research article.

Sample characteristics
The majority of the children fell within the 5-year-old group (mean age = 5.7 years, standard deviation [SD] = 0.852 years). There was less representation of children in the 4-year-old group and above 6.0 years of age. A Shapiro-Wilk's test (p < 0.05), a visual inspection of the histograms, normal Q-Q plots and box plots showed that the test scores were approximately normally distributed for both males and females, with a skewness of -0.401 (standard error [SE] = 0.464) and a kurtosis of −0.288 (SE = 0.902) for girls and a skewness of 0.107 (SE = 0.464) and a kurtosis of 0.514 (SE = 0.902) for boys. The mean for the total raw score was 34 (SD = 5.13) and the range was 26. Figure 3 shows the distribution of mean scores, indicating a minimum raw score of 20 and a maximum of 46. The 25th percentile was 30, 50th percentile was 34.5 and 75th percentile was 38.
The results indicate significant differences in performance by gender (r = 0.032, 1 tailed p < 0.05), which is interesting since gender was equally represented in the sample. The males had a higher mean than females, with a correlation to gender was supported by a moderate Cohen's d (0.521). The results show that children between 4.0 and 5.0 were able to achieve a minimum raw score of 20. Table 2 shows the performance of children in different age bands on the IRVT age bands.
The results indicate significant differences in performance by age with Pearson's correlation of r = 0.028 (1 tailed, p < 0.05), which was supported by a high effect size as Cohen's d = 0.949. The mean time taken to complete the test (TTT) was 23.14 min, with no correlation of TTT to gender or age.

Effect of socio-economic factors: Stunting and school
The prevalence of stunting was 26% in our sample and the results showed no significant correlation of test scores to stunting. Bivariate correlation measurement of school attended to test scores was significant as r = -0.262 (1 tailed, p < 0.05.) and showed a small effect as Cohen's d = 0.353. There was no correlation between school to age and TTT.

Reliability
Reliability across items was measured through two tests of internal consistency, namely, the Cronbach's alpha and split half correlation (Price & Jhangiani, 2013). The results indicated adequate Cronbach's alpha (α = 0.74) and less than ideal Guttman's s split half coefficient (0.642), whilst the split half range was satisfactory at 0.834 for part 1 and 0.957 for part 2. For analysis of reliability over time, we used testretest reliability. Eight children (15%) were available for retesting and the results shown in Table 3 indicate a correlation above the ideal of +0.8 and a standard error of measure close to a score of 1.

Validity
A diagnostic test is usually validated against a golden standard that evaluates the same subjects (Wong & Lim, 2011), and in the absence of such an equivalent standard, four measurements of test performance can be used, namely, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) where the ideal result would be 100% (Wong & Lim, 2011). For our study, the PPV indicated the likelihood that the child who scores well on the IRVT has good receptive vocabulary and the NPV indicated the proportion of the children who have a poor vocabulary amongst those who test negatively on the test. The results showed that five items had both the PPV and NPV less than 50%, thus requiring a review. The IRVT showed an overall sensitivity of 66.7%, reflecting that those who tested positively were true positives, whilst a 33% specificity reflects a correct identification of true negative results or delayed receptive vocabulary.

Discussion
The IRVT included all possible candidates for assessment whilst aiming for absence of bias in language, equal opportunities, equitable content and use of comprehensive but clear illustrations. Legibility of illustrations (how illustrations and response format have the capability of being deciphered with ease) was ensured though qualitative observations by local people (Johnstone, Altman, Thurlow, & Thompson, 2006). We also monitored item length, avoided words with dual meanings and used high-frequency local vocabulary in the blue print as recommended and these were reflected in the high item prevalence scores (Dempster & Reddy, 2007).
For the analysis of reliability, test items that carried a variance of 1% or 100% score were automatically excluded by the software which negatively influenced the Cronbach's alpha but the average was still adequate, suggesting that all remaining items measured the same construct. Since there was only one tester, the inter-rater reliability could not be measured; to compensate for this limitation, rigorous analysis of internal consistency and validity were done, which showed satisfactory reliability and sensitivity for language scores.
The TTT mean of 23 min was much higher than the reported mean of 8 min with similar standardised tests (Prado et al., 2018;Weber et al., 2015). The longer TTT can be explained by the lack of basal or floor and ceiling items when the test was administered. By not using floor and ceiling items, the IRVT matched methodological modifications seen in test adaptation studies (Pakendorf & Alant, 1997). Qualitative observation by the tester also indicated extended verbal interaction between the examiner and the children, when establishing rapport, especially the younger participants who contributed to longer TTT. Additionally, in our study, the test was untimed and allowed children enough time to explore the pictures. These are factors that are usually controlled in standardised tests (Educational Psychological Service, 1991). Our findings show, through item prevalence reports, that item grading and test item sequences need adjustments for possible determination of floor and ceiling items with a larger sample.
The prevalence of stunting in our study matched trends in other African countries and similar to other studies conducted in other rural areas of KwaZulu-Natal, where there was a need for additional school resources, poor infrastructure, overcrowded classrooms, a need for inservice training for teachers and a need for more meaningful parental involvement (Hannaway, Govender, Marais, & Meier, 2019). We noted that children from low SES background performed below the normative mean in receptive vocabulary tests, such as the Peabody Picture Vocabulary Test (PPVT) (Allison, Robinson, Hennington, & Bettagere, 2011;Horton-Ikard & Weismer, 2007). Our results showed no correlation between test performance and stunting, probably because of the mild degree of stunting in the sample. We noted a high risk of poverty conditions in the area and that the participant's home nutrition needs further investigation.
Similarly, the school factor introduced no variation in the performance of children on the IRVT. This similarity does not reduce the significance of considering the children's context and bio-psycho-social factors when assessing language. Factors such as teacher experience, level of skills, play behaviour and learning resources have been proven to directly impact learning outcomes, whilst factors like water source and sanitation contribute to cultural variations within societies (Feza, 2018;Spencer et al., 2017). The performance of the children on this test is comparable to other studies in terms of the age effect on the vocabulary (Kunene-Nicolas & Ahmed, 2016). The results are unique in that boys performed better than girls, whilst being male has been associated with risk factors of speech and language impairment (Harrison & McLeod, 2010).

Limitations of the study
The IRVT needs to be evaluated with a larger sample size for evaluation of the adjusted word list, illustrations and for generalisability of the results. Although the IRVT achieved moderate sensitivity, there is an indication from the PPV and NPV that adjustments to the vocabulary choice, grading and presentation order of items should provide better results. Furthermore, the test item selection was based on the opinions of experienced adult language users and CAPS books. We acknowledge the subjectivity associated with the process and the technical exclusion of the 4-year-old group that was not the target for the books. Another limitation of the study was a small sample size and a lack of an in-depth individualised socio-economic questionnaire which could have contextually ascertained the level of risk for speech and language delays (Hoff, 2013). The strength of the IRVT is that it has a single purpose and does not require in-depth training.

Conclusion
Receptive language lays the foundation for expressive language development and allows for scaffolding of literacy skills, such as reading and writing. A test of receptive vocabulary in isiZulu is a valuable tool in baseline school readiness assessment of isiZulu-speaking preschool children to achieve early intervention. The children in this study took an average of 23 min to complete the one-word receptive vocabulary test in isiZulu and obtained an average score of 35, with boys performing statistically better than girls, whilst a significant variation in scores depended on age and school attended. The IRVT is a promising screening tool appropriate for preschool children from low SES communities. However, there is a need to adjust the test structure, evaluate it using a larger sample size, develop a complimentary intervention programme and align it with educational goals.