This is a demonstration of environmental sound synthesis from vocal imitations and sound event labels [1].
We conducted environmental sound conversion using two proposed method and one comparison methods as follows:
Synthesis method using sound event labels (Label) (Baseline)
This method is uses a sound event labels as an input.
Synthesis method using vocal imitations and sound event labels (Label and vocal) (Proposed)
This method is uses a vocal imitations and sound event labels as an input.
Our dataset of vocal imitations for environmental sounds is available here.
[1] Y. Okamoto, K. Imoto, S. Takamichi, R. Nagase, T. Fukumori, Y. Yamashita, "Environmental Sound Synthesis from Vocal Imitations and Sound Event Labels," Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. XXX-XXX, 2024. (Accepted)