Accent conversion

Learners of a second language practice their pronunciation by listening to and imitating utterances from native speakers. Recent research has shown that choosing a well-matched native speaker to imitate can have a positive impact on pronunciation training. Towards this goal, we are developing speech-modification techniques that can generate utterances with the vocal properties of the learner and the accent of a native speaker. This is accomplished by altering both prosodic and segmental characteristics of speech.

Our articulatory-based accent conversion is a two-step process. In the first stage, we build an articulatory synthesizer for the nonnative learner. In the next step, we drive the synthesizer with articulatory gestures recorded from a native speaker.

We have also developed an accent conversion method that relies exclusively on acoustic information.  The technique is based on the standard voice conversion model but uses a different pairing of source-target frames. Unlike conventional voice conversion, where the source-target mapping is trained on time-aligned source and target spectral vectors from parallel utterances, in our approach the mapping is trained on pairs selected based on their acoustic similarity following vocal tract length normalization.

(a) Conventional approach to voice conversion; source and target utterances are paired based on their ordering in a forced-aligned parallel corpus. (b) Our approach to accent conversion: source and target utterances are paired based on their acoustic similarity following vocal-tract-length normalization (VTLN). MCD: Mel Cepstral Distortion

2024

A. Das W. Quamer, R. Gutierrez-Osuna

Speech synthesis and pronunciation teaching Book Chapter

In: J. Levis C. A. Chapelle, M. Munro; Huensch, A. (Ed.): 2024.

Links | BibTeX

Quamer, W.; Gutierrez-Osuna, R.

End-to-end streaming model for low-latency speech anonymization Proceedings Article

In: Proc. IEEE Spoken Language Technology Workshop (SLT 2024), 2024.

Links | BibTeX

Das, A.; R. Gutierrez-Osuna,

Improving mispronunciation detection using speech reconstruction Journal Article Forthcoming

In: IEEE/ACM Transactions on Audio, Speech and Language Processing, Forthcoming.

Links | BibTeX

2023

Anurag Das Waris Quamer, Ricardo Gutierrez-Osuna

Decoupling segmental and prosodic cues of non-native speech through vector quantization Proceedings Article

In: Proc. Interspeech, 2023.

Links | BibTeX

2022

Quamer, W.; Das, A.; Levis, J.; Chukharev-Hudilainen, E.; Gutierrez-Osuna, R.

Zero-Shot Foreign Accent Conversion without a Native Reference Proceedings Article Forthcoming

In: Proc. Interspeech, Forthcoming.

Links | BibTeX

Liberatore, C.; Gutierrez-Osuna, R.

Minimizing residuals for native-nonnative voice conversion in a sparse, anchor-based representation of speech Proceedings Article

In: Proc. ICASSP, 2022.

BibTeX

2021

Ding, S.; Zhao, G.; Gutierrez-Osuna, R.

Accentron: Foreign accent conversion to arbitrary non-native speakers using zero-shot learning Journal Article

In: Computer Speech & Language, 2021.

Links | BibTeX

Silpachai, A.; Rehman, I.; Barriuso, T. A.; Levis, J.; Chukharev-Hudilainen, E.; Zhao, G.; Gutierrez-Osuna, R.

Effects Of Voice Type And Task On L2 Learners’ Awareness Of Pronunciation Errors Proceedings Article

In: Proc. Interspeech, 2021.

BibTeX

Liberatore, C.; Gutierrez-Osuna, R.

An Exemplar Selection Algorithm For Native-Nonnative Voice Conversion Proceedings Article

In: Proc. Interspeech, 2021.

Links | BibTeX

Zhao, G.; Ding, S.; Gutierrez-Osuna, R.

Converting Foreign Accent Speech Without a Reference Journal Article

In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2367, 2021.

Links | BibTeX

2020

Ding, S.; Zhao, G.; Gutierrez-Osuna, R.

Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition Proceedings Article

In: Proc. Interspeech, 2020.

Links | BibTeX

Das, A.; Zhao, G.; Levis, J.; Chukharev-Hudilainen, E.; Gutierrez-Osuna, R.

Understanding the Effect of Voice Quality and Accent on Talker Similarity Proceedings Article

In: Proc. Interspeech, 2020.

Links | BibTeX

Lučić, I.; Silpachai, A.; Levis, J.; Zhao, G; Gutierrez-Osuna, R.

The English Pronunciation of Arabic Speakers - A Data-Driven Approach to Segmental Error Identification Journal Article

In: Language Teaching Research, 2020.

Links | BibTeX

2019

Ding, S.; Zhao, G; Liberatore, C.; Gutierrez-Osuna, R.

Learning Structured Sparse Representations for Voice Conversion Journal Article

In: IEEE Transactions on Audio, Speech and Language Processing, vol. 28, pp. 343-354, 2019.

Links | BibTeX

Ding, S.; Liberatore, C.; Sonsaat, S.; Lučić, I.; Silpachai, A.; Zhao, G; Chukharev-Hudilainen, E.; Levis, J.; Gutierrez-Osuna, R.

Golden speaker builder – An interactive tool for pronunciation training Journal Article

In: Speech Communication, vol. 115, pp. 51-66, 2019.

Links | BibTeX

Ding, S.; Gutierrez-Osuna, Ricardo

Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion Proceedings Article

In: Proc. Interspeech, 2019.

Links | BibTeX

Zhao, G; Ding, S.; Gutierrez-Osuna, Ricardo

Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams Proceedings Article

In: Proc. Interspeech, 2019.

Links | BibTeX

Zhao, G; Gutierrez-Osuna, R

Using Phonetic Posteriorgram Based Frame Pairing for Segmental Accent Conversion Journal Article

In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 10, pp. 1649-1660, 2019, ISSN: 2329-9290.

Abstract | Links | BibTeX

2018

Levis, J.; Chukharev-Hudilainen, E.; Gutierrez-Osuna, R.; Lucic, I.; Silpachai, A.; Sonsaat, S.

Golden Speaker: Learner Experience with Computer-assisted Pronunciation Practice Proceedings Article

In: Proc. Pronunciation in Second Language Learning and Teaching Conference, 2018.

BibTeX

Zhao, G; Sonsaat, S; Silpachai, A; Lucic, I; Chukharev-Hudilainen, E; Levis, J; Gutierrez-Osuna, R

L2-ARCTIC: A Non-Native English Speech Corpus Proceedings Article

In: Proc. Interspeech, 2018.

Links | BibTeX

Ding, S.; Zhao, G; Liberatore, C.; Gutierrez-Osuna, R.

Improving Sparse Representations in Exemplar-Based Voice Conversion with a Phoneme-Selective Objective Function Proceedings Article

In: Proc. Interspeech, 2018.

Links | BibTeX

Ding, S.; Liberatore, C.; Gutierrez-Osuna, R.

Learning Structured Dictionaries for Exemplar-based Voice Conversion Proceedings Article

In: Proc. Interspeech, 2018.

Links | BibTeX

Liberatore, C; Zhao, G; Gutierrez-Osuna, R

Voice Conversion through Residual Warping in a Sparse, Anchor-Based Representation of Speech Proceedings Article

In: Proc. ICASSP, 2018.

Abstract | Links | BibTeX

Zhao, G; Sonsaat, S; Levis, J; Chukharev-Hudilainen, E; Gutierrez-Osuna, R

Accent conversion using phonetic posteriorgrams Proceedings Article

In: Proc. ICASSP, 2018.

Links | BibTeX

2016

Aryal, S; Gutierrez-Osuna, R

Comparing Articulatory and Acoustic Strategies for Reducing Non-Native Accents Proceedings Article

In: Proc. Interspeech, 2016.

Links | BibTeX

Aryal, S; Gutierrez-Osuna, R

Data driven articulatory synthesis with deep neural networks Journal Article

In: Computer Speech and Language, vol. 36, pp. 260-273, 2016.

Links | BibTeX

2015

Liberatore, C; Aryal, S; Wang, Z; Polsley, S; Gutierrez-Osuna, R

SABR: Sparse, Anchor-Based Representation of the Speech Signal Proceedings Article

In: Proc. Interspeech 2015, pp. 608-612, 2015.

Abstract | Links | BibTeX

Aryal, S; Gutierrez-Osuna, R

Articulatory-based conversion of foreign accents with deep neural networks Proceedings Article

In: Proc. Interspeech, pp. 3385-3389, 2015.

Links | BibTeX

Liberatore, C; Gutierrez-Osuna, R

Joint Optimization of Anatomical and Gestural Parameters in a Physical Vocal Tract Model Proceedings Article

In: ICASSP, IEEE 2015.

Links | BibTeX

Aryal, S; Gutierrez-Osuna, R

Reduction of non-native accents through statistical parametric articulatory synthesis Journal Article

In: Journal of the Acoustical Society of America, vol. 137, no. 1, pp. 433-446, 2015.

Links | BibTeX

2014

Felps, D; Aryal, S; Gutierrez-Osuna, R

Normalization of articulatory data through Procrustes transformations and analysis-by-synthesis Proceedings Article

In: Proc. 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 3051-3055, 2014.

Links | BibTeX

Aryal, S; Gutierrez-Osuna, R

Can voice conversion be used to reduce non-native accents Proceedings Article

In: Proc. 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 7929-7933, 2014.

Links | BibTeX

Aryal, S; Gutierrez-Osuna, R

Accent conversion through cross-speaker articulatory synthesis Proceedings Article

In: Proc. 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 7744-7748, 2014.

Links | BibTeX

2013

Aryal, S; Felps, D; Gutierrez-Osuna, R

Foreign Accent Conversion through Voice Morphing Proceedings Article

In: Interspeech, pp. 3077-3081, 2013.

Links | BibTeX

2012

Felps, D; Geng, C; Gutierrez-Osuna, R

Foreign accent conversion through concatenative synthesis in the articulatory domain Journal Article

In: IEEE Transactions on Audio, Speech and Language Processing, 2012.

Abstract | Links | BibTeX

2010

Gutierrez-Osuna, R; Felps, D

Foreign Accent Conversion through Voice Morphing Technical Report

2010.

Abstract | Links | BibTeX

Felps, D; Geng, C; Berger, M; Richmond, K; Gutierrez-Osuna, R

Relying on critical articulators to estimate vocal tract spectra in an articulatory-acoustic database Conference

Interspeech, 2010.

Abstract | Links | BibTeX

Felps, D; Gutierrez-Osuna, R

Developing objective measures of foreign-accent conversion Journal Article

In: Audio, Speech, and Language Processing, IEEE Transactions on, vol. 18, no. 5, pp. 1030–1040, 2010.

Abstract | Links | BibTeX

2009

Felps, D; Bortfeld, H; Gutierrez-Osuna, R

Foreign accent conversion in computer assisted pronunciation training Journal Article

In: Speech communication, vol. 51, no. 10, pp. 920–932, 2009.

Abstract | Links | BibTeX

2008

Felps, D; Bortfeld, H; Gutierrez-Osuna, R

Prosodic and segmental factors in foreign-accent conversion Technical Report

2008.

Abstract | Links | BibTeX