Onoma-to-wave: Environmental sound synthesis from onomatopoeic words

Author: Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryosuke Yamanishi, Takahiro Fukumori, Yoichi Yamashita


This is a demonstration of environmental sound synthesis from onomatopoeic words [1]. We propose two methods of environmental sound synthesis from onomatopoeic words on the basis of the sequence-to-sequence conversion framework [2] as follows:

As sounds of the dataset, we used 10 different sound events (manual coffee grinder, cup clinking, alarm clock ringing, whistle, maracas, drum, electric shave, trash box banging, tearing paper, bell ringing) contained in the RWCP-SSD (Real World Computing Partnership-sound Scene Database) [3]. For the onomatopoeic words corresponding to each sound sample, we used the dataset in RWCP-SSD-Onomatopoeia [4].

Natural sound KanaWave Seq2seq
Seq2Seq + event label
 Phoneme sequence: / ch i: q /
Whistle Cup Shaver Whistle Tearing paper
 Phoneme sequence: / b o N q /
Trash box Drum Trash box
 Phoneme sequence: / r i N r i N /
Bell Bell Clock
 Phoneme sequence: / b i i i i /
Shaver Tearing paper Whistle Shaver
 Phoneme sequence: / sh a r i sh a r i /
Maracas Maracas Manual coffee grinder

Comparison of synthesized sounds with different input onomatopoeic words

Seq2Seq + event label
 Sound event label: Cup
 Phoneme sequence: / k a ch i N /  Phoneme sequence: / k a ch i q /  Phoneme sequence: / p i N q /
 Sound event label: Shaver
 Phoneme sequence: / b a: u a /  Phoneme sequence: / b e: /  Phoneme sequence: / j i i: j i i: i /

[1] Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryosuke Yamanishi, Takahiro Fukumori, and Yoichi Yamashita, "Onoma-to-wave: Environmental Sound Synthesis from Onomatopoeic Words," APSIPA Transactions on Signal and Information Processing, Vol. 11, No. 1, e13, 2022.
[2] Ilya Sutskever, Oriol Vinyalsa and Quoc V. Le, "Sequence to Sequence Learning with Neural Networks," arXiv preprint, arXiv:1409.3215, 2014.
[3] S. Nakamura, K. Hiyane, F. Asano, and T. Endo, “Sound scene data collection in real acoustic environments,” The Journal of the Acoustic Society of Japan (E), vol. 20, No. 3, pp. 225–231, 1999.
[4] Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryosuke Yamanishi, Takahiro Fukumori, and Yoichi Yamashita, "RWCP-SSD-Onomatopoeia: Onomatopoeic Word Dataset for Environmental Sound Synthesis," Proc. Detection and Classification of Acoustic Scenes and Events (DCASE), pp. 125-129, 2020.