This is a demonstration of environmental sound synthesis from vocal imitations and sound event labels [1]. We conducted environmental sound conversion using two proposed method and one comparison methods as follows:
Our dataset of vocal imitations for environmental sounds is available here.
Vocal imitations | Reconsrructed sound | Label (Baseline) |
Label and vocal (Proposed) |
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Vocal imitations | Reconsrructed sound | Label (Baseline) |
Label and vocal (Proposed) |
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Vocal imitations | Reconsrructed sound | Label (Baseline) |
Label and vocal (Proposed) |
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Vocal imitations | Reconsrructed sound | Label (Baseline) |
Label and vocal (Proposed) |
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
[1] Y. Okamoto, K. Imoto, S. Takamichi, R. Nagase, T. Fukumori, Y. Yamashita, "Environmental Sound Synthesis from Vocal Imitations and Sound Event Labels," Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. XXX-XXX, 2024. (Accepted)