Environmental sound synthesis from vocal imitations and sound event labels
  

Author: Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryotaro Nagase, Takahiro Fukumori, Yoichi Yamashita


This is a demonstration of environmental sound synthesis from vocal imitations and sound event labels [1]. We conducted environmental sound conversion using two proposed method and one comparison methods as follows:

Our dataset of vocal imitations for environmental sounds is available here.



Examples of synthesized sounds

Vocal imitations Reconsrructed sound Label
(Baseline)
Label and vocal
(Proposed)
 Sound event: clock alarm  Sound event: clock alarm

Vocal imitations Reconsrructed sound Label
(Baseline)
Label and vocal
(Proposed)
 Sound event: clock tick  Sound event: clock tick

Vocal imitations Reconsrructed sound Label
(Baseline)
Label and vocal
(Proposed)
 Sound event: rooster  Sound event: rooster

Vocal imitations Reconsrructed sound Label
(Baseline)
Label and vocal
(Proposed)
 Sound event: cat  Sound event: cat



[1] Y. Okamoto, K. Imoto, S. Takamichi, R. Nagase, T. Fukumori, Y. Yamashita, "Environmental Sound Synthesis from Vocal Imitations and Sound Event Labels," Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. XXX-XXX, 2024. (Accepted)