Onoma-to-wave: Environmental sound synthesis from onomatopoeic words

Onoma-to-wave: Environmental sound synthesis from onomatopoeic words
　　

Author: Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryosuke Yamanishi, Takahiro Fukumori, Yoichi Yamashita

Paper

This is a demonstration of environmental sound synthesis from onomatopoeic words [1]. We propose two methods of environmental sound synthesis from onomatopoeic words on the basis of the sequence-to-sequence conversion framework [2] as follows:

Environmental sound synthesis using only onomatopoeic words (seq2seq)
Environmental sound synthesis using onomatopoeic words and sound event labels (seq2seq + event label)

As sounds of the dataset, we used 10 different sound events (manual coffee grinder, cup clinking, alarm clock ringing, whistle, maracas, drum, electric shave, trash box banging, tearing paper, bell ringing) contained in the RWCP-SSD (Real World Computing Partnership-sound Scene Database) [3]. For the onomatopoeic words corresponding to each sound sample, we used the dataset in RWCP-SSD-Onomatopoeia [4].

Natural sound KanaWave Seq2seq
(Proposed) Seq2Seq + event label
(Proposed)

Phoneme sequence: / ch i: q /
Whistle Cup Shaver Whistle Tearing paper

Phoneme sequence: / b o N q /
Trash box Drum Trash box

Phoneme sequence: / r i N r i N /
Bell Bell Clock

Phoneme sequence: / b i i i i /
Shaver Tearing paper Whistle Shaver

Phoneme sequence: / sh a r i sh a r i /
Maracas Maracas Manual coffee grinder

Comparison of synthesized sounds with different input onomatopoeic words

Seq2Seq + event label
(Proposed)

Sound event label: Cup
Phoneme sequence: / k a ch i N / Phoneme sequence: / k a ch i q / Phoneme sequence: / p i N q /

Sound event label: Shaver
Phoneme sequence: / b a: u a / Phoneme sequence: / b e: / Phoneme sequence: / j i i: j i i: i /

Seq2Seq + event label (Proposed)
Sound event label: Cup
Phoneme sequence: / k a ch i N /	Phoneme sequence: / k a ch i q /	Phoneme sequence: / p i N q /

Sound event label: Shaver
Phoneme sequence: / b a: u a /	Phoneme sequence: / b e: /	Phoneme sequence: / j i i: j i i: i /

[1] Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryosuke Yamanishi, Takahiro Fukumori, and Yoichi Yamashita, "Onoma-to-wave: Environmental Sound Synthesis from Onomatopoeic Words," APSIPA Transactions on Signal and Information Processing, Vol. 11, No. 1, e13, 2022.
[2] Ilya Sutskever, Oriol Vinyalsa and Quoc V. Le, "Sequence to Sequence Learning with Neural Networks," arXiv preprint, arXiv:1409.3215, 2014.
[3] S. Nakamura, K. Hiyane, F. Asano, and T. Endo, “Sound scene data collection in real acoustic environments,” The Journal of the Acoustic Society of Japan (E), vol. 20, No. 3, pp. 225–231, 1999.
[4] Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryosuke Yamanishi, Takahiro Fukumori, and Yoichi Yamashita, "RWCP-SSD-Onomatopoeia: Onomatopoeic Word Dataset for Environmental Sound Synthesis," Proc. Detection and Classification of Acoustic Scenes and Events (DCASE), pp. 125-129, 2020.

Natural sound	Seq2Seq + event label (Proposed)
Phoneme sequence: / ch i: q /
Whistle	Cup	Shaver	Whistle	Tearing paper

Phoneme sequence: / b o N q /
Trash box	Drum	Trash box

Phoneme sequence: / r i N r i N /
Bell	Bell	Clock

Phoneme sequence: / b i i i i /
Shaver	Tearing paper	Whistle	Shaver

Phoneme sequence: / sh a r i sh a r i /
Maracas	Maracas	Manual coffee grinder