Environmental sound synthesis from vocal imitations and sound event labels

Environmental sound synthesis from vocal imitations and sound event labels
　　

Author: Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryotaro Nagase, Takahiro Fukumori, Yoichi Yamashita

This is a demonstration of environmental sound synthesis from vocal imitations and sound event labels [1]. We conducted environmental sound conversion using two proposed method and one comparison methods as follows:

Synthesis method using sound event labels (Label) (Baseline)

Synthesis method using vocal imitations and sound event labels (Label and vocal) (Proposed)

Our dataset of vocal imitations for environmental sounds is available here.

Examples of synthesized sounds

Vocal imitations	Reconsrructed sound	Label (Baseline)	Label and vocal (Proposed)
		Sound event: clock alarm	Sound event: clock alarm

Vocal imitations	Reconsrructed sound	Label (Baseline)	Label and vocal (Proposed)
		Sound event: clock tick	Sound event: clock tick

Vocal imitations	Reconsrructed sound	Label (Baseline)	Label and vocal (Proposed)
		Sound event: rooster	Sound event: rooster

Vocal imitations	Reconsrructed sound	Label (Baseline)	Label and vocal (Proposed)
		Sound event: cat	Sound event: cat

[1] Y. Okamoto, K. Imoto, S. Takamichi, R. Nagase, T. Fukumori, Y. Yamashita, "Environmental Sound Synthesis from Vocal Imitations and Sound Event Labels," Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. XXX-XXX, 2024. (Accepted)