Narrative Discourse Productions in Older Language Impaired Learning Disabled Children: Employing Stricter Reliability Measures

This study aimed to describe narrative discourse productions of older language impaired learning disabled (LrLD) children, using stringent reliability measures. Coherence and cohesion were the measures of analysis employed. Content and clarity ratings provided a subjective analysis of narrative productions. Interrater and intrarater reliability measures were calculated and testing for stability of scores across three testing sessions were undertaken. The results indicated subtle differences in the coherence and cohesion of narrative productions in the LILD compared with controls. The findings ofthis study support past literature, which calls for greater research in this area using stricter reliability measures.


/
If narrative is an org'aniser of human experience, what better and more relevhnt mode of expression can we use to probe the abilities land disorders that shape human communication? (Bruper, 1990, pg 286) Assessment of discoJrse in older children and adolescents allows a clinician tb exa,mine a child's ability to manage larger units of discourse as well as to examine their ability to process coherent oral and written texts (Nelson, 1993). Learning disabled children have been found to present with problems in discourse forms, which either supersede linguistic deficits or occur in the absence ofstructurallinguistic problems (Roth and Spekman, 1986).
According to Klein (1991) learning disabled children may find it difficult to communicate in a free flowing, creative and interesting manner. Children who present with expressive difficulties may produce circumlocutions and confabulations while searching for the correct word, while others may talk endlessly but their language is empty and repetitive (Klein, 1991).
The popularity of obtaining and analysing samples of children's discourse in combination with traditional language assessment procedures has increased over the past two decades (Gallagher, 1983;Stickler, 1987;Owens, 1995 in Hux, Sanger, Ried & Maschka, 1997). The popularity can be attributed to the limitations of standardised tests and the limited information these tests provide a clinician ( Hux, Sanger, Ried & Maschka, 1997).
Narrative discourse tasks have been found to be good instruments to assess higher level language and cognitive skills (Paul & Smith, 1993). Past research on narrative abilities in learning disabled children has presented with many contradictory results. Roth and Spekman (1986) and Ripich and Griffith (1988) found that learning disabled students perform poorer than normal peers in some aspects of narrative productions, but in other aspects perform equally well. Other studies have stated that learning disabled children produce less cohesive narratives than non-learning disabled persons with the main difference being attributed to the use of pronouns as referents and conjunctions as tie elements (Strong & Shaver, 1991;Liles, Duffy, Merritt & Purcell, 1995).
On the other hand, some studies have found that in many aspects learning disabled children were not found to proc duce narratives that are significantly different to those of normal children. Ripich and Griffith (1988) found that children with learning disabilities are able to organise their stories according to an appropriate story grammar. Liles et al. (1995) support these findings as they state that learning disabled children appear to be using, or attempt to use information typical of a normally developing child regarding production of narrative discourse. In addition, past studies have shown that the differences between the language impaired learning disabled and control groups are small and insignificant or none at all (Ripich & Griffith, 1988). This indicates there are often more similarities between the groups than differences (Roth & Spekman, 1986). Strong and Shaver (1991) attribute the conflicting results to the unreliability of scores given the analysed elements of the discourse. With all the attention being placed on the method of sampling language, there is still little agreement as to the most reliable and valid procedure (Morris-Friehe & Sanger, 1992).
In more recent studies validity and reliability have become increasingly important aspects to language sampling and analysis of discourse productions (Strong & Shaver, 1991~ Morris-Friehe & Sanger, 1992. Hux et al. (1997) highlight many methodological issues that can contribute to the reduced reliability in such research. The variability oflanguage across context and tasks, the diverse nature of lan-.guage, little normative data on older children and the heterogeneous nature of clients all contribute to the controversy in this area of research (Morris-Friehe & Sanger, 1992;Nelson, 1993;Hux et aI., 1997).
Reliability becomes an important concern when discourse analysis highlights the pragmatic nature oflanguage resulting in analysis that sometimes offers vague descriptions (Hux et aI., 1997). In order for results to be interpreted with confidence, researchers have collected data across a number of different testing times spaced at brief intervals to determine whether the scores do not fluctuate greatly across testing sessions (Strong & Shaver, 1991;Morris~Friehe & Sanger, 1992). Strong and Shaver (1991) further recommended the use of important reliability measures such as intra-coder agreement; internal consistency of responses and stability of scores across testing sessions before 'the researcher can generalise any results to the general population.
In summary, from a review of the literature, conflicting findings show a clear need for further research in the production of discourse in the language impaired learning disabled popula,tion. Furthermore, strict reliability measures are required. This study has two aims. Firstly to assess and describe the narrative abilities of older language impaired learni~g disabled children and secondly to employ strict reliability measures.

AIMS Primary Aims
The primary aims of this study are: 1) to describe the narratIve discourse productions of older language impaired learning disabled children; 2) to employ stringent measures of reliability. . '

M,;;re Sp'(/ci/ically
The specific aims of this study are: 1) to analyse the language impaired learning disabled narrative discourse productions on a macro'structure·level, using the measures of coherence' and 'cohesion; 2) to determine inter and intracoder agreement and stability of scores across testing sessions. .

RESEARCH DESIGN
A parallel case study design was employed in this study as it was considered the most appropriate method of research in the learning disabled population due to the heterogeneity of this population. The heterogeneous nature of the learning disabled population has been recognised for many years (Wiig & Semel, 1986;Morris-Friehe & Sanger, , 1992;Nelson, 1993).

Sample size
Three language impaired learning disabled children (LILD 1, LILD 2 and LILD 3) between the ages of 11-13 years, were assessed in this study. Three non-learning disabled persons with no history oflearning disabilities or any other neurological or behavioural disorders were included in this study as control subjects (C1, C2 and C3). These control subjects were matched for age and sex. The control subjects were not required to match for grade as the LILD children have repeated grades at school. The inclusion of these non-learning disabled subjects allows for the validation of the tasks and the obtained results.

Subject selection criteria
A number of criteria were applied in the process of selecting the three subjects. The subjects were required to have: 1) been diagnosed as language impaired learning disabled by a speech and language pathologist and other professionals ;2) to be attending a school for the learning disabled ;3) to be impaired in either the receptive and/or expre'ssive areas oflanguage ; 4) been diagnosed as learning disabled not attributed to cultural differences; 5) have an average or above average non-verbal I.Q., ranging from 85 onwards, as determined by a formal intellectual ability test; 6) the subjects were required to be in the 11.0 to 13.0 year range. This choice of age group was influence by developmental aspects of narrative discourse. Applebee (1978) stated that children over the age of six years are able to produce an ideal or adult like narrative structure, although development continues over the age of ten years; 7) tb be first language English speaking.

Subject Description
The biographical and clinIcal information of the three LILD subjects is presented in Table 1.

DATA COLLECTION AND INSTRUMENTATION
The researcher met with each subject and control subject three times. In each session one story was elicited. A total of18 ~arrative samples were collected, three from e'ach subject. The procedure of collecting a narrative sample three times was undertaken for the following reasons: Firstly, to examine the stability of narrative productions' across a number of testing sessions. Secondly, to determine whether more than one sample of the child's narrative yielded more reliable results when analysing the appropriate elements.
Thirdly and finally, three narrative samples were collected to increase the length of the narratives by combining the length of all three stories. Cole

Reproduced by Sabinet Gateway under licence granted by the Publisher ( dated 2012)
/ Narrative Discourse Productions in Older Language Impaired Learning Disabled Children: 47 Employing Stricter Reliability Measures stated that multiple samples taken over a short period of time were more useful than one sample taken at a single point in time and presented with more lexical information than one sample. Furthermore, Liles (1993) stated that a story of substantial length ensures the narrative is representative of the child's narrative ability.

STIMULUS MATERIAL
The narrative samples were elicited using three wordless picture books: Story A: "Moonlight" by Jan Ormerod; Story B: "The Snowman" by Ramond Briggs and Story C: "The angel and the soldier boy" by Peter Collington.

NARRATIVE LANGUAGE SAMPLE
A story-retelling task was employed in this study for the following reasons: past research has shown that story retelling is much easier than story generation or creation (Ripich & Griffith, 1988). Story generation requires a person to formulate a story without a model, which can lead to problems in the LILD where their ability to organise events on their own and construct cohesive narrative language may be reduced (Ripich & Griffith, 1988). Story retelling also provides information regarding the more salient features of narrative such as memory for structure and cohesive devices (Ripich & Griffith, 1988).
An advantage of story retelling is that it allows the child to tell a story that is more complete and has sufficient length (Strong & Shaver, 1991). Merritt and Liles (1989) state that story retelling was found to be more useful as the stories contained more grammatical components and more complete episodes for both LILD and normal language subjects. They also found that story retelling allowed for easiertran~ scription and were more reliably scored than story gimeration.

LISTENER FAMILIARITY
In this study the naIve listener condition was adopted. Research has found that when the listener adopts a naIve listen~r role the narrator will produce a higher numb~r of cohesive devices especially personal references (Liles, 1993). In addition, Purcell and Liles (1992) state that the naIve listener condition also results in longer narratives and more coherent productions than a non-naIve listener condition.

Transcription of the narrative sample
The researcher transcribed the videotaped discourse samples. T.he transcribed data was prepared for analysis by bracketing all false starts, repetitions and unintelligible utterances. The bracketed words were not included in the final word count but counted as a separate unit. Rules for counting the number of words were adopted from Strong and Shaver's (1991) rules for counting words. The transcribed data was segmented according to an "event". For the purpose ofthis study an event was defined as a change in place, character or action.

Coherence
The narrative samples obtained in this study were analysed according to Labov's (1977) narrative structure elements that constitute a well-formed narrative: • Abstract: This refers to the one or two clauses summarising the whole story. • Orientation: At the beginning ofthe narrative the time, place, persons and their activity or the situation is defined. • Complicating action: This is the sequence of events, which is presented chronologically. • Evaluation: Various elements are used to express the narrator's feelings about the characters or events. • Resolution: These are one or more statement, which reflect the final events or end the experience. • Coda: These are the free clauses that indicate the narrative is finished.

Evaluation
Evaluation functions in order to specify why the narrative is being told and what the point of the narrative is (Labov,197'i} This is achieved through the narrator's comments. Evaluation is also a means by which the narrator can attribute emotion to any character or event (Liles, 1993). Labov (1977) proposed that evaluation consists offour major categories: intensifiers, comparators, correlatives and explication. Each category consists of a number of subtypes. See Labov(1977) for full description of subtypes.

Cohesion
According to Halliday and Hasan (1976) cohesion refers to the relations of meaning that exits within the text and define it as a text. Cohesion is expressed through grammar and vocabulary, wAich is referred to as grammatical cohesion and lexical cohesion, respectively. Grammatical cohesion consists of different types ofties namely: reference ties, substitution ties and ellipsis ties, while conjunction ties are mainly grammatical but contain a lexical component. Lexical ties are another important tie in cohesion (Halliday and Hasan, 1976).

Cohesive Ties
This refers to the percentages of reference ties, substitution ties, ellipsis ties and conjunction ties and lexical ties Luanne Henshilwood and Dale Ogilvy used in the text (Halliday & Hasan, 1976).The percentage of each tie was calculated for each narrative sample and statistically analysed. The types of cohesive ties analysed in this' study (as described by Halliday and Hasan, 1976) were as follows: .
• Anaphoric reference. This is defined as pronouns that refer to previously identified nouns. For example: ... Suzie was at home playing with her dolls. • Demonstrative reference: This refers to the use of terms 'this, that, here and there'. For example: ... there was a girl and her mother and her father. • Ellipsis: This allows the speaker to reduce redundancy in a message by only encoding the essential elements. For example: ... she went back. • Substitution: This refers to items other than personal pronouns thatreplace previously identified elements. For example: .. she likes big boats because the little one was small. • Conjunctions: These serve a cohesive function as they relate successive utterances to each other. For example: they were quite happy because it was quite big. • Lexical ties: A lexical item refers back to another lexical item, and is related by having a common referent. For example: ... he steals some fruit for the nose and some berries for the eyes. The items fruit and berries are a lexical tie as well as nose and eyes.

Cohesive adequacy
Cohesive adequacy refers to the percentages of complete, incomplete, and erroneous ties (Liles, 1985). The definitions adopted for this study for complete, incomplete and erroneous ties were as follows (Liles et al., 1995): • Complete tie: A complete tie is determined when the cohesive item that the tie refers to is easily identifiable and can be defined with no ambiguity. • Incomplete tie: A tie is incomplete ifthe item referred to by the cohesive marker is not given in the text. • Erroneous tie: A tie is erroneous if the cohesive marker refers to an ambiguous or erroneous item.

CONTENT AND CLARITY RATINGS
To supplement the objective analysis ofthis study, a rating system was devised to evaluate the "content" and "clarity" ratings of the subjects' narrative discourse sampl~s.
This rating system also allows the researcher to compa~e the objective analysis with the subjective evaluation ofthe listener (Ulatowska et al. 1983). Ulatowska et al. (1983) suggested that "content" could be referred to as a rough measure of coherence, and "clarity" as a rough estimate of cohesion. The rating system used in this study was adapted from Ogilvy (1995) and Ulatowska et al. (1983). , Three raters rated the content ofthe narrative, using a three-point scale and answeririg specific questions. Five questions were used in this section. Examples include "Do. you know what is happening in this story?" and "'Does the sequence. of events make sense?" The clarity ofthe narrative was also rated using a threepoint scale. Three questions were used in this section. An example ofa question, "Is it clear to whom the narrator is referring to throughout the story?"

RESULTS AND DISCUSSION
The findings ofthis study involve two parts. Firstly the results of the reliability measure,s and the implications thereof, will be presented. This will be followed by a discussion of the results. of the analyses of the LILD and control subjects' discourse productions.

MEASURES OF RELIABILITY
Three raters, who are qualified speech and language pathologists, were involved in this study. Each rater received training regarding tile methods of analysis. The subjective content and clarity ratings were rated buy adults who were not qualified speech and language pathologists as formal training was not required.

Interrater reliability
In this study, inter-observer agreement was determined. This served to assess the extent to which different observers or raters agree that they 'see' the same phenomena (Hux et aI., 1997). According to Hux et aI. (1997) inter-observer agreement consists of a number of assumptions. Firstly, the raters must share an understanding of what trait is being rated. Secondly, the raters must be able to determine the occurrence or non-occurrence of what is being measured.
Finally, the raters must have a common means of recording the occurrence of the targeted behaviour or trait. Pearson product -moment correlation coefficients were calculated to determine transcription and score reliability.

Transcription reliability
Transcription accuracy was determined by a word-byword reliability. This kind of transcription agreement is most commonly used in speech pathology research using transcriptions as a means of analysing data. The transcription agreemenUndices ai'e generally calculated by a word by word agreement procedure. 20% of each sample was ran-/domly selected by two trahscribers who independently transcribed each selected slample of narrative productions (Strong & Shaver, 1991

Coder reliability
An additional coder was trained to analyse a randomly selected 20% of each narrative sample (Strong & Shaver, 1991). The coder randomly selected the samples. The coder was required to analyse coherence and cohesion. This allowed the researcher to determine whether the method of analysis used was consistent across all narrative samples. The coder was trained for approximately 4 to 6 hours before interrater coding began. The coder randomly selected 20% of each sample and coded each transcription independently.
Point -by -point agreement was calculated and an agreement percentage of 95% to 98% for intercoder reliability was obtained. A correlation coefficient of .99 was obtained indicating high intercoder reliability.

Intrarater reliability
Transcription reliability 15% of each narrative sample was randomly selected for transcribing a second time by the researcher. Word -by -word reliability measures resulted in intrarater reliability for transcription averaging at 97%. A correlation coefficient of .996, was obtained for intrarater transcription reliability.
Coder reliability 15% of each narrative sample was randomly selected and scored a second time by the researcher. Word-by-word,reliability measures resulted in an intrarater reliability for coding averaging at 98%. The correlation coefficient for intracoder reliability was .96, indicating a high reliability.

Stability coefficients
The main fundionofthe reliability measures presented in this research was to determine the relationship among scores obtained across the three testing times, i.e., to determine whether the measurement was consistent over the testing sessions.
According to Hux et aI. (1997) and Strong and Shaver (1991) calculating correlation coefficients is a conimon way of assessing the reliability of scores in discourse analysis. For the purpose ofthis study Pearson product -moment correlation coefficients and an Analysis of Variance (ANOVA) were computed to 'examine the reliability of the obtained scores. An alpha level of .05 was used for all statistical levels'. Correlation coefficients were computed across the three stories to establish the stability of scores of evaluation, cohesive ties and cohesive adequacy across the testing sessions. As the stories were administered with identical instructions, they were considered suitable to be used as parallel forms for the calculation of the coefficients of stability.

Stability coefficients for evaluation
Analysis of variance (ANOVA) was computed for each subcategory of evaluation. These results are presented in Table 2. No significant scores were obtained indicating there was no fluctuation of these scores across the three testing sessions. Therefore, it can be stated that there wasstability across all testing sessions for evaluative elements.

Cohesive ties
Pearson product-moment correlation coefficients were computed to determine the stability of cohesive ties and cohesive adequacy across the three testing sessions. Tne results are presented in Table 3. The results indicate that' the values for demonstrative reference, substitution, conjunction and lexical ties were not significantly different and therefore were stabie across the three testing sessions. Values for anaphoric reference and ellipsis were significant and therefore were not found to be stable across' testing sessions.

Cohesive adequacy
Statistical analysis of cohesive adequacy revealed significant correlation for most of the ana phoric reference and lexical ties. This shows that these scores were not stable Die Suid-Afrikaanse Tydskrif vir Kommunikasieafwykings, Vol. 46, 1999

Reproduced by Sabinet Gateway under licence granted by the Publisher ( dated 2012)
Substitution was the least used category, and was only used by C2, C3 and LILDl. Ellipsis and substitution were used less frequently than the other categories such as reference and conjunctions. This finding supports research documented by Rumble and Malan (1990).

Cohesive adequacy
No marked differences were noted amongst the LILD subjects and the control subjects regarding the cohesive adequacy of their narrative productions. In addition, as noted previously, measures of reliability indicated that these scores were unstable across testing sessions and need to be considered with caution.
However, it was of interest that the results from the analysis of the narrative productions revealed erroneous use of anaphoric referencing of LILD2, ellipsis by LILD1 and the lexical ties by LILD3. These erroneous perceptions may account for the lower clarity ratings obtained by the' LILD subjects. This is discussed in more detail below.

CONTENT AND CLARITY RATING
Content and clarity ratings were developed to compare the results of the objective analysis with a subjective analysis regarding the content and elarity of each subjects' narrative sample of story B. In order to determine the interrater reliability for the content and clarity ratings, Pearson product-moment correlation coefficients were computed. A significant value was found between rater 1 and rater 2 and between rater 2 and 3 for cohesion. Therefore these raters presented with high correlation values and these ratings can be considered as being reliable. Most of the subjects received mixed ratings for content except for C 1, who received "high" ratings. This could be explained as C1 produced the most accurate narrative according to the model, including all or most of the appropriate information. The other subjects were· reported by the raters, to omit or include infol")1lation, which was not part of the model story.
The LILD subjects received lower ratings for clarity than the control group. This means that the raters judged the LILD group to produce narratives without the appropriate use of language, referencing and conjunctions. Objective analysis of the LILD narratives regarding cohesion concurs with the clarity ratings as the LILD group tended to use a higher number of incomplete and e,rroneous cohesive ties.
LILD2 was shown to use the least percentage of complete anaphoric ties, and was judged to have the lowest rating of cohesion in his narrative of story B. LILD3 produced the least amount of complete lexical ties in story B. This could explain why LILD3 received a lower rating of cohesion than LILD1 and the control group.

SUMMARY OF THE FINDINGS
A summary ofthe findings of the subjective and objective analysis are presented below: • Amount of information: No difference was noted between the LILD and control subjects. • Length of the narrative discourse productions: No variations noted regarding length of narratives produced. • Temporal sequence:_AlI subjects showed general preservation of temporal sequence. • Coherence: According to the objective and subjective Luanne Henshilwood and Dale Ogilvy analysis of coherence the LILD and control subjects generally used the necessary items needed for a well-formed narrative.
• Evaluation: All subjects used necessary elements of evaluation although subtle variations were noted between the LILD and control subjects. Furthermore, the LILD subjects tended to use less complex elements and increased ritual utterances and intensifiers than the control subjects. • Cohesion: No marked differences were noted amongst the subjects for cohesive ties. Analysis of cohesive adequacy indicated that there could be difficulties with anaphoric referencing and lexical ties in the LILD subjects' narrative productions. These subtle deficits noted across all the LILD subjects in the objective analysis were supported by the lower subject ratings received by all three LILD subjects on the clarity ratings.

CONCLUSION AND IMPLICATIONS
The use of reliable measures in the analysis of narrative discourse productions, in this study, allowed for findings'to be adopted with greater confidence. Stability coefficients enable the researcher to know that the scores are representative of the samples produced by the subjects. Results that are not stable across testing sessions need to be considered with caution until further investigation can clarify the reliability of the scores being analysed.
One of the most important clinical findings of this study was that the LILD subjects were able to tell stories. This supports previous research, which states that LILD children contain the basic ability to produce stories (Roth & Spekman, 1986, Morrls-Friehe & Sanger, 1992. These findings further support Liles et al. (1995) as they state that LILD children appear to be using, or· attempting to use information typical of a normal developing child regarding the production of narrative discourse.
At times, objective analysis of the coherence and cohesion of all the subjects' narrative productions did not present with marked differences but rather indicated subtle differences. These findings support past studies which have shown that differences between LILD subjects and control subjects are small and insignificant or none at aU (Ripich & Griffith,1988).
: Furthermore, the heterogeneous nature of the learning disabled population has often been highlighted in past literature. The variations found within learning disabled!children often overlap with the variations found within the nonlearning disabled population (Morris-Friehe & Sanger, 1992, Nelson, 1993. ' Possible factors thought to have contributed to the findings presented in this study include the nature and complexity of the task employed. The material used in this study could have been too simple, with too few events or lacking the appropriate complexity to tap any existing ling\listic deficits. It is believed that other additional discourse tasks and perhaps more complex tasks are required in or~er to tap the more subtle language deficits found in older language impaired learning disabled children. / Another variable that could have influenced the findings in this study is the amount of speech and language therapy the subjects received prior to the study. Ulatowska, Hill, Thompson ,Parsons and Wertz (1998) stated that this could play ~m important role in this area of narrative discourse research.

/'
The South African Journal of Communication Disorders, Vol. 46, 1999 Reproduced by Sabinet Gateway under licence granted by the Publisher ( dated 2012)

Employing Stricter Reliability Measures
Finally, the amount of variation in the normal controls needs to be highlighted. The results presented in this study not only illustrated the subtle differences in the LILD subjects but also quite noticeably highlighted the individual variations found within the control subjects.
The differences amongst the LILD and control subjects were not always marked and sometimes the differences related merely to a matter of degree. Sonnenberg and Penn (1998) state that there is a wide variability across discourse measures that illustrate the features of normal subjects' discourse that are found to overlap with those of the clinical subjects. Sonnenberg and Penn (1998) provided a quote that captures the morality and issues involved in the research of narrative discourse: Whose culture, experience, and value system are we to follow in making Judgements of propriety? On what basis are we to say that a narration is too much or too little? One person's embellishment is another person's ingenuity. One person's digression is another person's interesting trip. (Davis, 1993in Sonnenberg & Penn, 1998

FUTURE RESEARCH IMPLICATIONS
Numerous future research implications have emerged from this study: A need to continue to establish valid and reliable indices of measurement for narrative ability has been highlighted. In addition, further research regarding the normal variation in narrative discourse productions in the nonlearning disabled population is needed.
Without this, interpretation of the analysis of the narrative discourse productions oflanguage impaired populations is limited and the· diagnosis oflanguage impairments from the analysis of narrative discourse remains questionable.
As stated previously, language impairments in the learning disabled, become increasingly subtle and harder to identify with age. At present little information is available on narrative discourse in older language impaired children. Hence, further investigation in this population would enhance the treatment and rehabilitation of such a disorder. ./ Other aspects, which could be considered for further in-/vestigation are the compairison of spoken and written discourse samples in the LILD population and the measurement of discourse across 8.ifferent elicitation tasks. Meas-I urement of discourse ability across various levels of complexity could yield intere~ting findings and may even provide a means to. tap the more subtle deficits noted in older LILD children's narratives.