Assessment of Speech in Neurological Disorders: Development of a Swahili Screening Test Methods Assessment Rationale Validation Procedure Results Validity and Feasibility Differences between Groups Table 1. Summary Statistics for Groups

Communication changes are a feature of many neurological conditions. Yet in many African languages there is a lack of specifically designed assessments of communication. Swahili is one such language. It is spoken as a first or second language by over 60 million people, predominantly in East Africa. In Tanzania, Kenya, Uganda, the Comoros and the Democratic Republic of Congo it serves as a national language. Nevertheless there are no standardised assessments for motor-speech disorders. This article reports the development and initial validation of a Swahili protocol to evaluate speech changes in neurological disorders. It is intended for use by health professionals involved in managing people with neurological problems. As such it reflects an example of the challenge to devise what Pascoe and Norman (2011) have termed contextually relevant resources. Speech (taken here to include voice) disorders affect an individual's ability to pronounce the sounds they need to say words and make themselves understood. Language disorders affect the ability to understand or retrieve words and sentences, whether in spoken or written form. Language and speech impairments can appear together or arise independently of each other. Our focus here is on speech. Speech changes may represent the first or only signs of changing Typically though, they co-occur with a range of other motor and cognitive impairments. Speech assessment is therefore important from a (differential) diagnostic perspective. Clinically it can also provide sensitive, non-invasive outcome measures of decline or recovery and of the effects of drug, surgical and behavioural interventions. Speech evaluation plays an important role too, since communication changes exercise a profound social and psychological influence on the person with the disorder, and their family, even when changes are not so severe as to make speech Speech impairment can be measured in different ways. A subsystems approach (Duffy, 2005; Kent, 2009) singles out breathing (capacity; control over inspiration-expiration; coordination with vocalisation), voice (laryngeal function; voice loudness, pitch and quality), velopharyngeal efficiency (degree of hyper-or hyponasality) and articulation (pronunciation of sounds and syllables) to assess underlying speech motor performance. It is also vital to assess the consequences that changes across these subsystems might have for overall intelligibility. In so far as breathing for speech, voice loudness, rate of tongue movements in producing sounds and similar measures would appear to reflect universal aspects of motor performance, it is tempting to assume that these batteries can be applied without further consideration to any language. This …

in which positions in words they differ in the nature and functions of stress and intonation patterns used.So, for instance, English does not have the implosives of Igbo (Nigeria) and Hausa (Nigeria, Niger and regions of West Africa) or the clicks of Nama (Namibia, Botswana).Luganda (Uganda) operates with 5 vowels, Twi (Ghana, Ivory Coast) with 15, English, including diphthongs, with nearer 20.Hindi (India) and English both use aspirated and non-aspirated /p/ sounds.While in Hindi they signal differences in meaning (/kapi/ (copy) -/kap h i/ (meaningful)), in English they do not (their distribution is determined by surrounding sounds).Swahili and English both have a /ŋ/ sound, but in English this cannot occur in initial position in words.Both languages have /m/, /t∫/ and /t/ sounds, but while the combinations /mt∫/ (mchuzi, curry/sauce) and /mt/ (mtori, banana soup/porridge) at the start of a word are permissible in Swahili, they are not in English.
Contrasting intonation patterns form part of the grammatical systems of languages (e.g.signalling a statement versus a question), but how these are applied across languages is not uniform.Further, in most European languages changes in tone on a vowel signal predominantly affective nuances.In many other languages a system of tones operates that signifies contrasts in word and grammatical meaning, depending on whether the word is spoken with a rising, falling, high or low pitch (Wong, Perrachione, Gunasekera & Chandrasekaran, 2009).
A further important observation regarding cross-language differences in speech disorders centres on cultural variables in perceptions of change.While one may be able to ascertain that a person can generate a given force in the lips when pronouncing /p/ or sustain /a:/ at 60 decibels for 10 seconds, such measures bear no direct relationship to the speaker's and listeners' perception of if, how and why that represents a problem or not.The acceptability of different rates of speech or loudness levels, the significance of different alterations to voice quality (breathy, creaky; harsh, soft), the tolerance for amount and degree of dysfluencies are all strongly rooted in language-and culture-specific variables (Altenberg & Ferrand, 2006;Bebout & Arthur, 1992;Kita, 2009;Mackey, Finn & Ingham, 1997;Yiu, Murdoch, Hird, Lau & Ho, 2008).
Given that languages differ in their sound structure and use and the perception of change in neurological disorders is strongly influenced by sociolinguistic and cultural variables, it follows, as argued above, that simply translating an assessment devised around the structure and rules of one language will be invalid if applied to a structurally different one.The solution is either to adapt an extant test to the structure of the new language or create a test specifically tailored to the new language.
We aimed to create a screening assessment for speech-motor dysfunction in Swahili speakers covering quantification of underlying speech-motor impairment and activity limitation levels (intelligibility and speech naturalness) (Kent & Kent, 2000;Yorkston, Strand & Kennedy, 1996).Given the context in which the assessment is designed to be applied (by clinicians from all backgrounds, since it may well be the case that there is no trained speech-language therapist (SLT) available; health services with minimum resources), the aim was to select measures that could be accomplished with minimum training in application, scoring and interpretation, without technical equipment beyond paper, pencil and (stop)watch and if possible a simple audio-recording device.In this initial development phase we also aimed to ascertain whether the assessment was suitable for purpose, i.e. to detect differences between speakers with and without a neurological illness and be sensitive to possible changes in speakers over time.

Impairment measures
The underlying impairment to speech in neuromuscular disorders stems from alteration in the range, strength, sustainability, stability and co-ordination of movements of the muscles/movements involved in breathing, phonation, velopharyngeal function and pronunciation (Duffy, 2005).Assessment of these variables is typically achieved through maximum performance speech tasks (Duffy, 2005;Kent, Kent & Rosenbek, 1987) that challenge the patient to produce a sound or word as fast, loud, long, high or low as possible.We followed a subsystems approach to assessment, adapting standard recommended clinical tasks with demonstrated validity (Kent, 2009) that assess breath capacity and control for speech, voice loudness, pitch and stability, and tongue and lip movement, and fulfilled the conditions required for minimal equipment and training (Appendix A).The following paragraphs elucidate.
Prolongation of /a:/ for as long as possible gives an estimate of air reserve for speech (Kent, 2009).The task can also serve as a basis for voice assessment through attention to perceived control and appropriateness of loudness and pitch, stability (tremor; inappropriate swings in pitch or loudness), and voice quality (e.g.harsh/strained (spastic) v. breathy/ weak (flaccid), diplophonic (cord palsies)).These can be measured instrumentally, but for present purposes of minimal technical outlay they can be scored perceptually (naked ear) using rating scales.Control over loudness, pitch level and range can be further evaluated by asking the speaker to produce /a:/ at gradually increasing and decreasing loudness and gradually rising and lowering pitch levels.
Speech diadochokinetic tasks (Ackermann, Hertrich & Hehr, 1995;Gadesmann & Miller, 2008;Ziegler, 2002) gauge tongue and lip movement parameters.The sound in the syllable is chosen to challenge a given movement, e.g./pa/ for lips, /ta/ for tongue tip, /ka/ for tongue dorsum.One can measure time to produce 10 repetitions or number of repetitions in 5 seconds.Qualitative observations record how well the individual is able to remain on target -for instance do repetitions of /ba/ drift to what is heard as /ma/ because of velopharyngeal insufficiency?Does /pa/ drift to and from /ba/ from misco-ordination between oral and laryngeal gestures, or to /fa/ from decreased excursion or strength of lip movements?Important information (co-ordination of movements; speech planning; apraxic difficulties v. neuromuscular, dysarthric impairment) can be gained through alternating syllable tasks (see diadochokinetic tasks above).The individual repeats sounds as fast as possible that contrast in place of articulation, e.g./pa-ta-ka/.Time to produce 5 or 10 repetitions and ability to remain fluent and on target are measured.Ideally real words are used (as done in the protocol with Swahili words paa, taa, kaa), (i) since this aids understanding of the task, and (ii) because they relate more closely to real speech performance (Clark, 2003;Kent, 2004).

Activity limitation
Impairment measures do not necessarily relate to how far changes affect communication (Hartelius & Miller, 2010).To gauge the impact of changes on day-to-day activity other assessments are required.For this purpose we included a diagnostic intelligibility screen and speech naturalness rating (Appendices B & C).
Diagnostic intelligibility tests (DIT) (Kent, Weismer, Kent & Rosenbek, 1989;Weismer & Martin, 1992) address the problem of extremely poor intra-and inter-rater reliability of rating scales for assessment of intelligibility (Schiavetti, 1992).In DIT patients repeat a list of words and a listener (with no knowledge of what the intended words are) responds with what they believe has been said.Depending on availability and/ or aims of the assessment, listeners can be clinical colleagues, family members, or untrained strangers unfamiliar with the person's speech.Since scores can differ between listener groups it is essential that on retest (e.g. after therapy) the same scorers are used.By totalling words recognised and analysing the pattern of mishearings one achieves a measure of intelligibility, as well as suggestions for sound contrasts that a speaker might have difficulty signalling.
DIT depend on devising matched parallel lists of words that differ by one sound from each other (minimal pairs, e.g.tea-pea, pay-pie, coat-code), and reflect the sound distributions, combinations and range of sound frequencies of the language.The protocol offers four parallel lists for Swahili following these principles.For administration, the examiner can either select one of the lists, or to minimise rater learning effects or retest familiarity effects where the test is frequently applied, select one word randomly from each row to arrive at varied but matched sets of 25 words.
To estimate the overall impression of speech acceptability in the context of the gender, age and cultural expectations of the community the screening test employs a 1 -5 naturalness/ disorderedness rating scale (1 -definitely a problem with speech; 5 -definitely no problem with speech) (Appendix C).Evaluation may be based on impressions from speech during general case history taking.The sentences included in Appendix B provide a more controlled task for comparisons across time and persons which are attuned to syllable (articulatorily simple v. complex), word (frequency), phrase length (shorter v. longer) and grammar and associated intonation (commands v. questions v. statements) patterns that constitute prominent variables in change perception.The sentences can also provide data to supplement ratings of pitch, loudness, stability of speech/voice (see above).Time to say the sentence(s) can be used as a reliable measure of speaking rate alongside previous diadochokinetic timing for maximum syllable rate.
In many neurological disorders speech output is more greatly affected when a speaker has to formulate responses themselves rather than repeat or read a prepared sentence (Bunton & Keintz, 2008;Ho, Iansek, & Bradshaw, 2002).Accordingly, speech examinations commonly include having a person describe an everyday activity (Kent, 2009).This affords a more realistic appraisal of the impact of the underlying impairment on day-to-day communication, as well as how speech production interacts with broader language and cognitive status.In the protocol here contrasting tasks offer the possibility of examining contrasts in loudness, pitch, stability, voice quality and naturalness between simple repetition tasks (saying 'paa' , repeating single words in the intelligibility test) and the self-formulated speech while describing a common activity (Appendix A, making porridge).

Swallowing assessment
Swallowing and speech disturbances are not directly related to each other, but they do frequently co-occur and management of both often falls to the same person, in westernised countries typically an SLT.Hence we included as part of the screen the 150 ml water swallow test (Nathadwarawala, Nicklin, & Wiles, 1992) that has been shown to be valid and reliable at quantifying swallowing efficiency.

Validation procedure
The speech-motor and intelligibility measures chosen for the test have proven validity as measures of speech performance (Duffy, 2005;Kent, 2009;Kent et al., 1989;Ziegler, 2002).The current protocol was also reviewed independently by six SLTs specialising in acquired neurological speech disorders to judge its suitability as a screen for assessing acquired motor-speech disorders.
Our aim was also to examine whether conducting the test was feasible in a community with minimal training (given the lack of SLTs and requirement to conduct brief training of other professionals); whether it was acceptable to users; and whether the newly devised materials could potentially detect differences in performance across individuals with and without a neurological disorder.To this end we piloted the tasks on a group of people with Parkinson's disease (PD) and a group matched overall for age and gender who were non-neurologically impaired.This was to establish the feasibility of the materials and tests, not specifically to examine differences between people with and without PD, which is the subject of a separate report.

Participants
We assessed people with PD and control participants.They were recruited from a community-based prevalence survey (Dotchin et al., 2008) in Hai District, Tanzania.Participation was by voluntary informed consent following UK and Tanzanian Ethics Committee approved procedures.
Results are based on 26 people with PD (7 female) and the overall matched group without PD from the same district.The people with PD were assessed before they commenced medical therapy, and 19 of them again 12 months later, after 3 months on medication.Four had died and 3 were too ill for reassessment.

Procedures
A Tanzanian PD nurse specialist received 3 hours' induction and training from an experienced UK SLT covering the rationale and procedures for the test, how to make the sound recordings (Edirol R1 and AKG C420 headmounted microphone) and scoring of items.Laminated directions sheets for field use for all tasks and for live scoring were provided.
People with PD were assessed before they commenced pharmacotherapy and approximately 12 months later after 3 months of levodopa treatment.Speech assessment by the nurse specialist took place in the participant's home at the same time as assessments of their motor, cognitive, mood and social status (often in the presence of other family members).Recordings were downloaded to laptop computer on site and returned on compact discs for cross-checking and analysis in the UK.

Data processing
Time for sustained 'ah' was noted at the time of assessment.Counts for number of repetitions in 5 seconds of 'paa' (roof in Swahili), 'kaa' (charcoal) and 'paa-taa (light)-kaa' were made at the time of assessment.Following standard practice speakers attempted each maximum performance task twice.The better performance was taken as their score.Results for /a:/ and syllable repetitions were compared across groups and time.
Speech rate for the sentence 'Wale watoto wanafanya kazi kwa bidii shambani' (those children are working hard in the field) was calculated from the acoustic waveform in syllables per second using PRAAT (Boersma & Wennink, 2011).Sound pressure level variability (standard deviation (SD) of mean fundamental frequency) was measured from PRAAT based on the same sentence.
To complete intelligibility test scoring, six native Swahili-speaking Tanzanian medical elective students studying in the UK heard recordings in random order of participants with and without PD saying words from the four parallel word lists (Appendix B).Recordings were played free field (Dell Inspirion laptop connected to Fostex Personal Monitor 6301B loudspeaker) in a quiet clinic office with volume setting the same for all tracks.They were blind to word-list number, speaker identity and group.Half the listeners heard tracks in reverse order.For each speaker they wrote down which word they believed they heard.The derived intelligibility score was the percentage of words correctly recognised across all listeners.
Measurement of the 150 ml swallowing test (ml per second) was calculated from volume drunk and time taken from records at the time of testing.

Validity and feasibility
The screening test was deemed to have sufficient face validity as independently judged by health and other workers in the community where it was to be applied.Content validity was independently confirmed by review from a panel of speech-language pathologists experienced in neurological speech disorders asked to judge whether the test adequately and appropriately screened key dimensions for a speech-voice assessment in people with neurological disturbances.Neither group recommended any changes to the content or delivery of the protocol.

SWAHILI SPEECH ASSESSMENT
Feasibility was confirmed.Time to complete all sections typically took around 15 -25 minutes.With the exception of one control speaker who did not understand the nature of the syllable repetition task, all participants were able to comprehend instructions and carry out the tasks correctly.
The nurse specialist was able to detail performance counts/times for items requiring live scoring.Ten per cent of counts and timing for 'paa'/'kaa' and 'paa-taa-kaa' were randomly selected and calculated from the audiorecordings by an experienced SLT blind to initial measurements.There was a high correlation (Spearman's r 0.96) for counts between raters with no significant difference for either control speaker recordings or people with PD.The time to repeat 'paa-taa-kaa' correlation of measures was similarly high (r 0.95) and there was no statistically significant difference between raters.There were some issues around audio-recording (see Discussion) which impacted on the quality and completeness of some data sets.For this reason there were variable numbers of individual scores employed for the analyses that follow.Group comparisons were conducted only on pairs where there were valid matched recordings available.

Differences between groups
Table 1 displays results obtained from the participants with and without PD.Columns 2, 3, and 5 present the descriptive summaries for the different groups/times while columns 4 and 6 record results for statistical tests looking at possible differences between groups/times.Statistically significant results are shown in bold.
On the prolonged /a:/ task, single syllable repetition rates and overall speech rate people with and without PD as groups did not perform statistically significantly differently to each other.On the more taxing multisyllable alternation task (paa-taa-kaa) there was a statistically significant difference between people with PD and controls and between baseline and follow-up for people with PD.
The effects of altered voice and articulation were also clear in the intelligibility and disorderedness ratings.People with versus without PD scored significantly differently (p=0.03) on words correctly recognised by naïve listeners.People with PD were perceived by listeners to be significantly more disordered (p=0.007) in their speech than those without, based on perceptual rating of the sentence repetitions.
Differences between people with and without PD were statistically significant for the water swallow test (ml/sec).There was no statistically significant change (p=0.09) in performance between baseline and follow-up for the people with PD.

Discussion
We have developed the first preliminarily validated screening test in Swahili for speech changes in neurological disorders that is not simply an unadapted translation from English and that addresses activity limitation measures (intelligibility; naturalness) as well as impairment performance.The nurse specialist was able to acquire the skills to apply and score the test after minimal training, indicated by the absence of any data loss due to misinstructions for tasks, misapplication of tasks or misscoring.
The test was acceptable to participants.There were no objections to or questioning of words and tasks used.No one refused to carry it out, whether for ethical, practical or comprehension reasons.From the large variability in performance on some tasks it appears participants occasionally appeared to give suboptimal responses.It remains to be established from further observation and analysis whether this relates to cultural influences in carrying out unaccustomed testing or whether it pertains to issues around examiner training in eliciting maximum performance.
A major problem encountered, that affected quality and analysis of data, concerned difficulties with audio-track labelling and with simultaneously controlling audio-recording equipment and attending to speech performance in order to deliver live scores.These point to issues in training and methods employed for detailing live performance that must be addressed in later training development.The test tasks were able to differentiate performance levels and correctly detected differences that were expected between people with and without PD ('paa-taa-kaa' repetitions, Ackermann et al., 1995;Ho, Bradshaw, Cunnington, Phillips & Iansek, 1998;Ziegler, 2002), intelligibility (Miller et al., 2007) and swallowing (Miller et al., 2009).The fact that between-and within-group differences on the single-syllable repetition tasks did not reach statistical significance is unsurprising, given that the nature of speech changes in PD may not be sufficiently severe to register on impairment measures.The fact that the more challenging syllablealternating task (Ho et al., 1998) did detect significant differences supports this interpretation.Similarly, on the water swallow test, the time between baseline and follow-up may not have been sufficient to expect significant changes in people with PD, especially given that they received medical intervention in the interim.As PD is a progressive condition one would expect deterioration in function over 1 year but this is likely to be counter-balanced by the drug therapy they received for 3 months before the second assessment, which had a major impact on motor function in some cases (Dotchin, Jusabani, & Walker, 2011).
The protocol is ready to use.However, there are several features that ideally require further development or need local norms against which to interpret performance.The screen should also be tested on larger numbers and on other groups of people with neurological illness (e.g.stroke).Monitoring the ability of a wider group of people to apply the protocol would also be helpful rather than the one trained tester employed here.Next steps also include the development of culturally appropriate questionnaire measures of perceptions of change and perceived impact of speech changes.