SELECTED ACOUSTIC CHARACTERISTICS OF EMERGING ESOPHAGEAL SPEECH: CASE STUDY

The development of esophageal speech was examined in a laryngectomee subject to observe the emergence of selected acoustic characteristics, and their relation to listener intelligibility ratings. Over a two-and-a-half month period, the data from five recording sessions was used for spectrographic and perceptual (listener) analysis. There was evidence to suggest a fairly reliable correlation between emerging acoustic characteristics and increasing perceptual ratings. Acoustic factors coincident with increased intelligibility ratings appeared related to two dimensions: firstly, the increasing pseudoglottic control over esophageal air release; secondly the presence of a mechanism of pharyngeal compression. Increased pseudoglottic control manifested in a reduction of tracheoesophageal turbulence, and a more efficient burping mode of vibration with clearer formant structure. Spectrographic evidence of a fundamental frequency did not emerge. These dimensions appeared to have potential diagnostic and therapeutic value, rendering an analysis of the patient's developing vocal performance more explicit for both clinician and patient.


SUMMARY
The development of esophageal speech was examined in a laryngectomee subject to observe the emergence of selected acoustic characteristics, and their relation to listener intelligibility ratings.Over a two-and-a-half month period, the data from five recording sessions was used for spectrographic and perceptual (listener) analysis.There was evidence to suggest a fairly reliable correlation between emerging acoustic characteristics and increasing perceptual ratings.Acoustic factors coincident with increased intelligibility ratings appeared related to two dimensions: firstly, the increasing pseudoglottic control over esophageal air release; secondly the presence of a mechanism of pharyngeal compression.Increased pseudoglottic control manifested in a reduction of tracheoesophageal turbulence, and a more efficient burping mode of vibration with clearer formant structure.Spectrographic evidence of a fundamental frequency did not emerge.These dimensions appeared to have potential diagnostic and therapeutic value, rendering an analysis of the patient's developing vocal performance more explicit for both clinician and patient.
Esophageal speech is one of two alternative methods of functional postlaryngectomy rehabilitation.Investigators over the past few decades have realised the importance of testing its efficacy both subjectively and objectively -one of the more recent objective methods being the use of spectrographic analysis. 2' 5 ' 6 ' 7 ' 10 '"' 13 >j 17 > 28 > 29 The purpose of this article is a) to discuss how spectrographic analysis may be used as a diagnostic and therapeutic tool, and also b) to examine the relationship between subjective and objective data obtained.

It would be interesting to follow the development of speech from a mechanism that at first is halting, and seemingly acquires a great deal of effort
to produce faltering sounds, to the effortless smooth production of sustained speech, 4 Despite Brooks Hunt and Va's 4 valid comment, more than a decade has passed with a dearth of information available on the developmental aspects of esophageal speech acquisition.The importance of measuring a laryngectomized patient's progress in treatment has likewise been stressed by very few writers. 3' 31 However, in the general field of vocal pathology, more recent impetus has been provided by Rontal, Rontal and Rolnick 20 who have advocated the use of serially-made spectrograms.These would provide the referring Ear Nose and Throat Specialist, clinician and patient with a visual display reflecting the nature of the vocal rehabilitation process.The value of assessing Rontal et al's 20 finding specifically in the progression of esophageal speech development, was strongly indicated.The paucity of literature with regard to the acoustic and perceptual developmental aspects of esophageal speech acquisition is apparent.The establishment of a description of emerging acoustic characteristics correlated with subjective listener ratings, would be of diagnostic, prognostic, and therapeutic value.It would provide the speech therapist with objective, quantitative criteria against which each laryngectomee's esophageal speech development could be assessed.Furthermore, the understanding of the relationship between the emerging dimensions would facilitate a more explicit approach to therapy for both clinician and client.The term 'esophageal speech' refers to speech production resulting from an air supply from the vicarious air chamber located within the esophageal lumen.The cricopharyngeus has been localized as the pseudoglottic site.The outflowing air is interrupted by crude pseudoglottic vibrations which alternatively open and close the esophageal lumen.The Kay Sonograph has been used to provide spectrograms which permit an objective, reliable analysis of the phonetic -acoustic elements of esophageal speech.The Sonograph provides: a) narrow-band displays in which harmonic structure is highlighted, and b) broad-band displays which highlight formant structure (represented as dark, thick vertical bands).Formants are resonances of the supraglottal vocal tract, 'upon which the distinctive quality of vowels and resonant consonants depend.' 9 (See Figure 5, 5-9, and 11-14, a and c, along the horizontal and vertical axes respectively).Kerr and Lanham 9 have used broad-band spectrograms to analyze esophageal speech, and have described the following phenomena which have been corroborated by other investigators: 10 ' "' l3 ' 29 Parameter I: Vibratory cycles resulting from the manner of esophageal sphincter control over air release.A. Rapping: A term used by Kerr and Lanham 9 to refer to the slow, irregular, rap and tap-like pulses received by the vocal tract functioning as a resonating system.(See Figure 6, 1,8,9,10,11,17,18,19 along the horizontal axis).

B. Esophageal burping:
The raps smooth out into a succession of air puffs more rapid and regular, and lower in amplitude . . .still lacking in harmonic structure. 9These more rapid cricopharyngeal vibrations facilitate the emergence of a clearer formant structure (See Figure 3, 6-11, a and b, along the horizontal and vertical axes respectively).
C. Friction Noise: This random, aperiodic ill-defined turbulence emanates from the more or less open esophageal sphincter.Friction noise has been found to play a major role in producing quality changes 9 ' 10 (See Figure 6, 20-22 along the horizontal axis).

D. Possible absence of fundamental frequency:
Fundamental frequency is the physical correlate of pitch perception, 10 and is a measure of the frequency of components of periodic waves.Despite reference made to the low pitch of esophageal speech, 2 ' 5 ' 6 • 25 ' 2< " 28 ' 29 confusions are prevalent in spectrographic findings related to fundamental frequency characteristics of esophageal speakers.Lanham and Kerr 13 have suggested that these confusions have arisen from a failure to recognize that a measurable fundamental frequency is often lacking 13 in esophageal speech.Hence, the need to examine approximate vibratory rates, as opposed to fundamental frequency is apparent.

E. Other less obvious vibratory features:
(i) vibration of the tone is more rapid at the onset of phonation, and slows down as the amount of air in the .esophagus decrease; (ii) the opposite phenomenon to (i) (occuring more rarely 10 ), where subglottic vibrations increase; and (iii) pseudoglottic vibrations may cease for a moment, and then continue at a slower rate than before, thus splitting the word or vowel into two parts. 10rameter II: Variables dependent upon degree of neuromuscular control, and affecting periodicity of vibrations: A. Sound caused by air intake into the esophagus (swallowing noise): Spectrographically the energy in swallowing is fairly evenly distributed over the entire frequency scale.Its presence represents what the writer referred to as an 'esophageal click' i.e. one considerably intense pulse (rap) followed by a gap of relative silence prior to the onset of phonation.(See Figure 1, 9 and lli along the horizontal axis).Parameter III: Formant Structure.

B. Leakage of air from the mouth of the esophagus: This may occur prior to and in anticipation of the onset of voicing, and is indicative of an inadequate neuromuscular mechanism (See
A case study was therefore undertaken with the following aims: I. To identify the emergence over time of selected acoustic characteristics which might contribute to a description of the emerging features of esopheal phonation.II.A further aim was to relate intelligibility ratings of developing esophageal speech to the emergence of these acoustic characteristics.

SUBJECT (S)
The S was a 37 year old white Portuguese-speaking male.He underwent a total laryngectomy in February 1977 after laryngoscopy biopsy had revealed the presence of widely infiltrating, poorly differentiated squamous cell carcinoma on both his vocal folds.Total laryngectomy involved removal of the larynx from above, leaving the esophagus and cricopharyngeal muscles essentially intact.It was not necessary to perform a radical neck dissection.Prior to the operation he was a healthy man, with no reported speech or hearing problems.
Criteria for selection: (a) The S was required to have undergone a laryngectomy operation and to have commenced speech therapy for the first time postoperatively.(b) He was to be capable of producing an esophageal sound involuntarily.This indicates the intactness of his mechanism for esophageal sound production, making an evaluation of his progress from this most basic level possible. 3' (c) Potential anatomical and physiological variables that could interfere with the S's development of esophageal speech had to be considered.Thus, before testing commenced, the Ear Nose and Throat Surgeon who performed the laryngectomy was asked questions pertaining to medical information from the Questionnaire for Biographical, Medical, Social Communication and Speech Training Variables. 22d) Hearing was bilaterally within normal limits.
Therapy: Therapy was administered by a fourth year undergraduate Speech and Hearing Therapy student at the Speech and Hearing Clinic, University of the Witwatersrand.Over a seventy-five day testing period, the S had received approximately 54 sessions of speech therapy.Therapy was directed towards the development of esophageal speech as based primarily upon the work of Berlin, 3 and Snidecor. 27

METHODOLOGY I TAPE RECORDING PROCEDURES.
A. A Revox tape recorder was used to produce high quality recordings at random time intervals over a 2 x k month testing period.During each recording session the S would read the following data: (i) A list of exercises which were constructed by the E.These were based upon Snidecor's 27 principles of a nine-stage plan of esophageal speech development.
(ii) A further group of test words based upon the six groups of consonants found in the Portuguese language.
(iii) The S used only one overt inflation in repeating plosive syllables (da) as many times as possible.This was based upon Berlin's 3 measure of mean number of syllables per air charge.
(iv) The S used only one inflation in repeating the following two sentences.
a.The first sentence contained many liquids: («Luisoc iu LiuneJ^nccJeram ocm Bcxriocnum ojpitaj) b.The second sentence contained an equal number of voiceless plosives: (Paula i PeJxuJ.aumuituf kocuj ikocj ojjuJ emkasoc) The construction of these two sentences (by the' E), was based upon Moolenaar-Bijl's 16 ' 17 observation that some form of air conservation, or esophageal reinflation is accomplished by plosive consonants.
(v) On the final recording session, the S read twenty unseen minimal pair words.
B. On one occasion, a normal (laryngeal) Portuguese-speaking S read and recorded the same set of stimuli (i) -(v), under the same testing conditions as the S.

II SPECTROGRAPHIC/OBJECTIVE ASSESSMENT:
A Kay Sonagraph Model 6061-B (Kay Elemetrics 6, Pine Brook, N.J.) was used to produce type B/65 broad-band spectrograms from the tape recordings.
Selection of Speech Samples and Methods of Spectrographic Recording: Note: Several narrow band spectrograms were made of utterances produced by the S on the final session.However, these failed to show harmonic components of the complex speech signals, so that the data in the present study was derived from broad band spectrograms only.
For practical reasons it was not possible to spectrographically analyze all the data recorded over the 75 day testing period.Thus: (i) Ninety-five broad band spectrograms were produced from a list of 19 words recorded by the S at 15 day intervals.These words were selected, as they constituted a representative sample of consonants in the Portuguese language (eg.(pej), (bo'd°c), ( l pik«), (doiz), (soblimu)).
(ii) 20 broad band spectrograms were also obtained from the S's production of minimal pair utterances on the final recording session.
(iii) 39 broad band spectrograms were likewise made of the normal S's recordings of the 19 words, and 20 minimal pair word list.

The South African Journal of Communication Disorders, Vol. 25, 1978
Reproduced by Sabinet Gateway under licence granted by the Publisher (dated 2012)

III SUBJECTIVE PERCEPTUAL ASSESSMENT:
Perceptual ratings were made in the following manner: (i) A Portuguese-speaking listener who did not know the S, and who was unaware of the nature of the experiment was asked to subjectively rate the intelligibility of all the data recorded by the S at each of the five recording periods selected for analysis.Intelligibility of single utterances was determined by the number of correct responses made by the listener to the recorded speech stimuli.Intelligibility of sentence material was rated on a five-point equal appearing interval scale.
(ii) The Ε was assisted by a Phonetician in perceptually calculating the following data on each of the five recording sessions: (a) The mean number of plosive syllables (da) per overt inflation in five trials, 3 and (b) The number of words per breath charge in the sentences constructed."" 17Although these measures were made, together with an evaluation of the relationship between the S's quantified performance over time on skills (a) and (b) above, the findings will not be discussed in the present article.

IV PHONETIC ANALYSIS:
A Phonetician aided the Ε in phonetically transcribing the 115 broadband spectrograms.This permitted a direct comparison of unusual auditory qualities perceived by the listener with the spectrographic correlates.

RESULTS AND DISCUSSION
A. OBJECTIVE DATA.The analysis of spectrographic data was descriptive in nature.Based on the work of several authorities 9 ' 10 '"' 13 ' 29 the development of the acoustic parameters I -III and their related variables (described above), were examined in the data obtained in five recording sessions.However, for the purpose of this article, it will be sufficient to document the major changes that occurred in the utterance (bo'd°<) on sessions I and V, thereby highlighting the emergence of these parameters.Spectrographic representations of utterances produced by the S on the first recording session (Session Ϊ) revealed the inadequacy of his neuromuscular mechanism in the following manner:

Figure I: Spectrogram showing utterance (bo d <*): Session I. Die Suid-Afrikaanse
There was a prominence of random respiratory and tracheostomy turbulence over the whole of the visible frequency scale.Resonance amplification created wide horizontal formants with ill-defined edges (See Formant 2, Figure J, 21-31 along the horizontal axis).The overriding turbulence made clear definition of most utterances virtually impossible.This turbulence appeared to be due to the S's attempts to produce louder, more intelligible speech.This resulted in a burst of outrushing stomal noise frequently louder than the concomitant esophageal voice.Most utterances were characterized by several random pulses (raps) before, and immediately after, indicative of swallowing noise caused by air intake into the esophagus.Gurgling noises were spebtrographically and auditorily prominent in several utterances (See Figure 2 below, 22-38, -(til)).

Figure 2: Spectrogram showing utterance (til) -gmvling snitnd.
In contrast to the inconsistency that was a noticeable feature of the spectrographic displays of the S's utterances on the first recording session, spectrograms of utterances produced on the final recording session (Session V) revealed the greatest consistency.This seemed to be due to the S's high morale, in conjunction with increased muscular control and co-ordination over his esophageal air release.Figure 3. below reveals the presence of the burping mode of vibration, with considerable energy located in aperiodic components outside the pulses.According to Kerr and Lanham, 1 ' and Kytta, 1 " this turbulence is due to the probably incompletely interrupted airflow, characteristic of the burping vibratory mode.Narrow band spectrograms (eg.See Figure 4, below) made of several utterances revealed an absence of harmonic structure and measureable fundamental frequency.This appeared a characteristic feature of the S's low pitched esophageal phonation, which Lanham and Kerr 13 have referred to as 'pitchless'.

Figure 4: Spectrogram (narrow) showing utterance (pe").
General developmental trends viewed spectrographically Several interesting and significant developmental trends appeared to have emerged.The most marked feature throughout was one of inconsistency, postulated to be related partly to the S's neuromuscular, and psychological status at various points in time.
A noticeable progression occurred from a rapping to a burping mode of esophageal sphincter vibration in which pulses became significantly more rapid and lower in amplitude.Vibrations, however, were conspicuously lacking in the rapidity and periodicity reportedly found in superior esophageal, and laryngeal speakers. 10' 20 ' 23 ' 29 Compare the vibratory pattern in Figure 3, above, with that in Figure 5 below.

Figure 5: Spectrogram showing utterance (bo'd ) of a laryngeal speaker. Die Suid-Afrikaanse
Generally, spectrographic evidence refuted the concept of fundamental frequency in esophageal phonatitfn.It revealed a vibratory pattern lacking in periodicity, and devoid of a harmonic structure.These measureable rates of irregular vibration appeared to be comparable to what investigators have measured in assessing the fundamental frequency of esophageal speech, «ίο, 11,23,24,» if this is indeed the case, then Tato's finding 29 that fundamental frequency increases with training, was supported.
Continuous and more clearly defined formant structure appeared dependent upon a reduction of tracheo-esophageal turbulence, as well as an increasingly regular and rapid vibratory pattern.
B. SUBJECTIVE DATA.
Listener Ratings of Intelligibility.These findings would appear to indicate the S's progression over the observed period of speech acquisition towards becoming a more proficiently-rated esophageal speaker.The listener experienced the greatest ease in understanding the carry-over task items, corroborating the spectrographically-observed evidence of generalized cricopharyngeal control.
Comparison of Listener Intelligibility Ratings with Spectrographic Displays.
This revealed the following: I. Evidence to suggest a fairly reliable correlation between emerging acoustic characteristics and increasing intelligibility ratings.The latter appeared related to the following acoustic variables: (i) a more rapid and regular burping mode of vibration; (ii) clearer, more continuous formant structure; and (iii) a reduction of tracheo-esophageal noise.The variable (iii) appeared to be the most important independent factor in speech intelligibility ratings, and supported findings reported in the literature. 10' 15 ' 23 II.The S's attempts to produce effective voice-voiceless plosive and fricative distinctions manifested in interesting inconsistent discrepancies between perceptual and spectrographic data.The presence of these discrepancies supported Nichol's 18 contention that voicing is a vulnerable feature in esophageal speech intelligibility.Only two of the numerous discrepancies noted, shall be described in the present article, so as to illustrate the presence of certain phenomena: (i) Target: Voiceless unreleased plosive (p,t,k) (Note: In Portuguese, as opposed to English, voiceless plosives are unreleased).Word initial position: (N=48) Fifteen discrepancies occurred between spectrographic and subjective ratings.The listener heard the powerful voiced cognate predominantly, which was spectrographically represented in the following decreasing order of prominence as a: a. Voiceless released plosive.(N= 11)  Example: (pe J) -recording session II (See Figure 6, 7) Listener's response: (be Γ/ pel)  GENERAL CONCLUSION From the results, the writer has postulated that the above-described discrepancies, revealing the S's ability to produce perceptually powerful, apparently ejected plosive and fricative consonants, may have been due to a mechanism of pharyngeal compression or 'squeezing'.It is apparent, that the intensity and power with which the S produced these consonants could not have been achieved by the small and variable amount of air reported to be present in the upper esophagus. 12' 27 ' 28 Rather, the presence of an eggressive glottalic-type of air mechanism, in which a substantial amount of pharyngeal air compression occurred, would appear to facilitate this powerful auditory impression.
To produce fricatives, it seemed necessary for the S to block his vocal tract first with a tight labial closure, in order to build up sufficient pressure.Affrication resulted both auditorily and spectrographically.This appeared due to the S's inadequate control and consequent inability to produce this articulatory movement of closure lightly.Similarly, when the S produced a voiceless plosive, a powerful auditory impression was heard.This was caused by a strong intra-oral pressure bursting through a very tight closure made by the S in an attempt to produce a sufficiently audible voiceless stop.Spectrographic evidence likewise supported these above-described findings.For example, when producing medial voicedreleased stops, the 'voice' bar would frequently disappear, and spectrographically a gap would appear prior to the release burst.This gap likewise, may have been indicative; of pressure build-up by the S, to produce an effective auditory impression. 12(See Figure 9, 20-22, for an illustrative example).The only reference to this squeezing-type mechanism appears to have been made by Kerr and Lanham, 9 and Pellegrini and Raaglini, cited by Brooks Hunt and Va. 4 The latter investigators concluded that a squeezing mechanism forces air through a narrow channel, thus producing effective sound.Esophageal speech has been described as being a compensation of high degree. 1' 10 ' 21 This pharyngeal eggressive air mechanism, postulated to be operating, would appear to constitute a compensatory mechanism, rendering support for Tikofsky's comment that intelligibility results from modifications 'other than those introduced by the use of a different sound-producing mechanism'. 30

IMPLICATIONS
Observations made in this study suggested the use of a developmental framework to more effectively quantify and assess the patient's process of esophageal speech acquisition.By using serially-made spectrograms, diagnosis and therapy would indeed be interrelated parts of a continuous process of trying to understand an individual and to help him learn. 14The regular clinical use of these objective measures would: (i) Provide the Ear Nose and Throat Specialist, patient and clinician a meaningful mutual evaluation of the changes in the emergence of esophageal speech.The patient would then be reinforced for intermediary successes, with a concomitant reduction of frustration of having to work towards a remote, sometimes obscure goal of intelligible esophageal speech; and (ii) Offer a motivating device which recognizes the need of every patient to gauge his own rehabilitation.In addition, spectrographic data would provide interested family members and friends the opportunity to visualize the patient's progress, and support his effort more realistically.These objective acoustic measures should constantly be compared with subjective perceptual evaluations.Therapy goals and tasks would then remain more realistically orientated towards the goal of communicative effectiveness in the patient's linquistic environment, as opposed to an unobtainable level of intelligibility.For example, with regard to the 'vulnerable feature of voicing in esophageal speech', 18 constant comparison of objective and subjective data would enable the clinician to assess whether: (i) the patient attempted to produce the voiceless Die Suid-Afrikaanse Tydskrif vir Kommunikasieafwykings Vol. 25, 1978   Reproduced by Sabinet Gateway under licence granted by the Publisher (dated 2012) consonant and failed to do so audibly, or (ii) whether he omitted it completely.
Finally, spectrograms should never be used as a substitute for good clinical acumen.Judgements in the initial assessment and during the therapy process must be based upon an evaluation of all the parameters available to the clinician, and not on the basis of a single test.However, within these limitations, the use of an analytical, developmental framework would make intelligibility a somewhat more attainable goal for the laryngectomized patient.

Figure 1 , t- 5
along the horizontal axis).C. Tracheostomal (Stomal) noise: This refers to the powerful expulsion of air accompanied by an undesirable murmur from the patient's tracheostoma.Strong amplification may occur of those frequencies falling within the bandwidths of the resonators, resulting in horizontal, formant-like bands (See Figure 1, A, along the horizontal axis).D. Gurgling: This occurs auditorily before, during, and/or after The South African Journal of Communication Disorders, Vol. 25, 1978 Reproduced by Sabinet Gateway under licence granted by the Publisher (dated 2012) phonation, and is commonly represented spectrographically by highly irregular wide-spaced raps (See Figure 2, 22-38 along the horizontal axis).

Disorders, Vol. 25 1978Figure 3 , 1 - 3 )
Reproduced by Sabinet Gateway under licence granted by the Publisher (dated 2012)The S's neuromuscular control still appeared insufficiently adequate and this in relation to an inadequate esophageal airflow, manifested spectrographically as follows: (i) Widening of pulses were still evident at the end of utterances, disturbing the continuity of the generally clearly defined formant structure.(See Figure3, 13-14, where the rate of vibration is ~£ 36,99 per second, as compared with around 86,31 per second at 8-9).(ii) Apparent air leakage from the esophageal mouth prior to (eg. , and during certain of S's utterances.

Figure 9 :
Figure 9: Spectrogram illustrating mechanism of pharyngeal compression in word-medial position.The only reference to this squeezing-type mechanism appears to have been made by Kerr and Lanham,9 and Pellegrini and Raaglini, cited by Brooks Hunt and Va.4The latter investigators concluded that a squeezing mechanism forces air through a narrow channel, thus producing effective sound.Esophageal speech has been described as being a compensation of high degree. 1' 10 '21 This pharyngeal eggressive air mechanism, postulated to be operating, would appear to constitute a compensatory mechanism, rendering support for Tikofsky's comment that intelligibility results from modifications 'other than those introduced by the use of a different sound-producing mechanism'.30 TableIbelow reveals an increase in the Portuguese-speaking listener's mean intelligibility ratings over the twoand-a-half month recording period.

TABLE I :
Mean Intelligibility Ratings as a Function of Stage of Esophageal Speech Development