Phonetic Aspects of Laughter

Here is a link to an article in The Scotsman, written by Michael Stroh, on expressive and laughter synthesis.

Here you can find some information on the pilot study reported in

Trouvain, J. & Schröder, M. (2004):

How (Not) to Add Laughter to Synthetic Speech

Proceedings of the Workshop on Affective Dialogue Systems (ADS), Kloster Irsee, pp. 229-232. [article: pdf 43 kb] [poster: ppt 601 kb]

Material

Synthetic laughs were tested in two dialogue extracts. The envisaged function of laughter was social bonding. The two dialogues were generated with two synthetic voices with the German speech synthesiser Mary: a male voice (speaker A) and a female voice (speaker B)

Dialogue extract 1:
A: Sollen wir es dann so festhalten? (Shall we do it this way?)
B: Ja, okay. [lachen] Dann sehen wir uns am Montag. (Okay [Laugh] Then we see each other on Monday.)

Dialogue extract 2:
A: Am Freitag kann ich aber erst ab zwölf Uhr. Dann treffen wir uns am besten gleich um dreizehn Uhr am Freitag. (On Friday, I am only free after twelve. The best thing will be if we meet at one on Friday.)
B: Ja, gut. [lachen] Das sollte auch bei mir klappen. (All right. [Laugh] That should be fine with me.)

Audio stimuli

Various versions of "synthetic laughs" were generated (see table). The audio files are in wav-format; their size is ca. 200 kb for the versions with dialogue 1, and ca. 350 kb for versions with dialogue 2.

no. method voice dialogue 1 dialogue 2
1 diphone-based modal voice [audio] [audio]
2 diphone-based soft voice [audio] [audio]
3 diphone-based loud voice [audio] [audio]
4 natural signal same speaker, high intensity [audio] [audio]
5 natural signal same speaker, medium intensity [audio] [audio]
6 natural signal other speaker, mild intensity [audio] [audio]
baseline [audio] [audio]


Here are two spectograms with the portion of speaker B saying "Ja, okay [laugh]" to illustrate the difference between an intense (left) and a mild laughter (right). In the wave form (top), the laughter is in yellow colour.




Perception test

14 German speaking subjects were asked to mark on a 6 point-scale:

How well do both speakers like each other?





In a second round, the listeners were asked:

How well does the laughter fit into the dialogue?


Discussion

The best results are for version 6 with the other speaker giving showing a laughter with mild intensity.
The worst results are for version 4 with the same speaker but a laughter with a high intensity.

The results for the diphone-based laughter versions do not show any improvement in comparison to the baseline version. Usually, for synthetic speech generated with mixed methods, e.g. diphones with natural signals, sound bad. Moreover, speech signals with different speakers require more attention from the listeners. The results here are in contradiction to these expectations.

What seems very important for laughter modelling is not only the placement of the laughter but also the careful control of the laugh intensity for the envisaged function. Thus, many more functions and forms of laughter need to be investigated. Furtheron, the phonetic criteria for their acceptability must be examined. An extension to other affect bursts could be a way to integrate more convincing features of an emotionally empowered synthetic voice.

 

Last change: 26-05-04

Seite drucken