Accent conversion – PSI Lab

Accent conversion

Learners of a second language practice their pronunciation by listening to and imitating utterances from native speakers. Recent research has shown that choosing a well-matched native speaker to imitate can have a positive impact on pronunciation training. Towards this goal, we are developing speech-modification techniques that can generate utterances with the vocal properties of the learner and the accent of a native speaker. This is accomplished by altering both prosodic and segmental characteristics of speech.

Our articulatory-based accent conversion is a two-step process. In the first stage, we build an articulatory synthesizer for the nonnative learner. In the next step, we drive the synthesizer with articulatory gestures recorded from a native speaker.

We have also developed an accent conversion method that relies exclusively on acoustic information. The technique is based on the standard voice conversion model but uses a different pairing of source-target frames. Unlike conventional voice conversion, where the source-target mapping is trained on time-aligned source and target spectral vectors from parallel utterances, in our approach the mapping is trained on pairs selected based on their acoustic similarity following vocal tract length normalization.

(a) Conventional approach to voice conversion; source and target utterances are paired based on their ordering in a forced-aligned parallel corpus. (b) Our approach to accent conversion: source and target utterances are paired based on their acoustic similarity following vocal-tract-length normalization (VTLN). MCD: Mel Cepstral Distortion

2025

Quamer, W.; Gutierrez-Osuna, R.

Disentangling Segmental and Prosodic Factors to Non-Native Speech Comprehensibility Journal Article

In: IEEE Transactions on Audio, Speech and Language Processing, vol. 33, 2025.

2024

A. Das W. Quamer, R. Gutierrez-Osuna

Speech synthesis and pronunciation teaching Book Chapter

In: J. Levis C. A. Chapelle, M. Munro; Huensch, A. (Ed.): 2024.

Quamer, W.; Gutierrez-Osuna, R.

End-to-end streaming model for low-latency speech anonymization Proceedings Article

In: Proc. IEEE Spoken Language Technology Workshop (SLT 2024), 2024.

Das, A.; R. Gutierrez-Osuna,

Improving mispronunciation detection using speech reconstruction Journal Article Forthcoming

In: IEEE/ACM Transactions on Audio, Speech and Language Processing, Forthcoming.

2023

Anurag Das Waris Quamer, Ricardo Gutierrez-Osuna

Decoupling segmental and prosodic cues of non-native speech through vector quantization Proceedings Article

In: Proc. Interspeech, 2023.

2022

Quamer, W.; Das, A.; Levis, J.; Chukharev-Hudilainen, E.; Gutierrez-Osuna, R.

Zero-Shot Foreign Accent Conversion without a Native Reference Proceedings Article Forthcoming

In: Proc. Interspeech, Forthcoming.

Liberatore, C.; Gutierrez-Osuna, R.

Minimizing residuals for native-nonnative voice conversion in a sparse, anchor-based representation of speech Proceedings Article

In: Proc. ICASSP, 2022.

2021

Ding, S.; Zhao, G.; Gutierrez-Osuna, R.

Accentron: Foreign accent conversion to arbitrary non-native speakers using zero-shot learning Journal Article

In: Computer Speech & Language, 2021.

Liberatore, C.; Gutierrez-Osuna, R.

An Exemplar Selection Algorithm For Native-Nonnative Voice Conversion Proceedings Article

In: Proc. Interspeech, 2021.

Silpachai, A.; Rehman, I.; Barriuso, T. A.; Levis, J.; Chukharev-Hudilainen, E.; Zhao, G.; Gutierrez-Osuna, R.

Effects Of Voice Type And Task On L2 Learners’ Awareness Of Pronunciation Errors Proceedings Article

In: Proc. Interspeech, 2021.

Zhao, G.; Ding, S.; Gutierrez-Osuna, R.

Converting Foreign Accent Speech Without a Reference Journal Article

In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2367, 2021.

2020

Ding, S.; Zhao, G.; Gutierrez-Osuna, R.

Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition Proceedings Article

In: Proc. Interspeech, 2020.

Das, A.; Zhao, G.; Levis, J.; Chukharev-Hudilainen, E.; Gutierrez-Osuna, R.

Understanding the Effect of Voice Quality and Accent on Talker Similarity Proceedings Article

In: Proc. Interspeech, 2020.

Lučić, I.; Silpachai, A.; Levis, J.; Zhao, G; Gutierrez-Osuna, R.

The English Pronunciation of Arabic Speakers - A Data-Driven Approach to Segmental Error Identification Journal Article

In: Language Teaching Research, 2020.

2019

Ding, S.; Zhao, G; Liberatore, C.; Gutierrez-Osuna, R.

Learning Structured Sparse Representations for Voice Conversion Journal Article

In: IEEE Transactions on Audio, Speech and Language Processing, vol. 28, pp. 343-354, 2019.

Ding, S.; Liberatore, C.; Sonsaat, S.; Lučić, I.; Silpachai, A.; Zhao, G; Chukharev-Hudilainen, E.; Levis, J.; Gutierrez-Osuna, R.

Golden speaker builder – An interactive tool for pronunciation training Journal Article

In: Speech Communication, vol. 115, pp. 51-66, 2019.

Ding, S.; Gutierrez-Osuna, Ricardo

Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion Proceedings Article

In: Proc. Interspeech, 2019.

Zhao, G; Ding, S.; Gutierrez-Osuna, Ricardo

Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams Proceedings Article

In: Proc. Interspeech, 2019.

Zhao, G; Gutierrez-Osuna, R

Using Phonetic Posteriorgram Based Frame Pairing for Segmental Accent Conversion Journal Article

In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 10, pp. 1649-1660, 2019, ISSN: 2329-9290.

Abstract | Links | BibTeX

2018

Levis, J.; Chukharev-Hudilainen, E.; Gutierrez-Osuna, R.; Lucic, I.; Silpachai, A.; Sonsaat, S.

Golden Speaker: Learner Experience with Computer-assisted Pronunciation Practice Proceedings Article

In: Proc. Pronunciation in Second Language Learning and Teaching Conference, 2018.

Zhao, G; Sonsaat, S; Silpachai, A; Lucic, I; Chukharev-Hudilainen, E; Levis, J; Gutierrez-Osuna, R

L2-ARCTIC: A Non-Native English Speech Corpus Proceedings Article

In: Proc. Interspeech, 2018.

Ding, S.; Liberatore, C.; Gutierrez-Osuna, R.

Learning Structured Dictionaries for Exemplar-based Voice Conversion Proceedings Article

In: Proc. Interspeech, 2018.

Ding, S.; Zhao, G; Liberatore, C.; Gutierrez-Osuna, R.

Improving Sparse Representations in Exemplar-Based Voice Conversion with a Phoneme-Selective Objective Function Proceedings Article

In: Proc. Interspeech, 2018.

Liberatore, C; Zhao, G; Gutierrez-Osuna, R

Voice Conversion through Residual Warping in a Sparse, Anchor-Based Representation of Speech Proceedings Article

In: Proc. ICASSP, 2018.

Abstract | Links | BibTeX

Zhao, G; Sonsaat, S; Levis, J; Chukharev-Hudilainen, E; Gutierrez-Osuna, R

Accent conversion using phonetic posteriorgrams Proceedings Article

In: Proc. ICASSP, 2018.

2016

Aryal, S; Gutierrez-Osuna, R

Comparing Articulatory and Acoustic Strategies for Reducing Non-Native Accents Proceedings Article

In: Proc. Interspeech, 2016.

Aryal, S; Gutierrez-Osuna, R

Data driven articulatory synthesis with deep neural networks Journal Article

In: Computer Speech and Language, vol. 36, pp. 260-273, 2016.

2015

Liberatore, C; Aryal, S; Wang, Z; Polsley, S; Gutierrez-Osuna, R

SABR: Sparse, Anchor-Based Representation of the Speech Signal Proceedings Article

In: Proc. Interspeech 2015, pp. 608-612, 2015.

Abstract | Links | BibTeX

Aryal, S; Gutierrez-Osuna, R

Articulatory-based conversion of foreign accents with deep neural networks Proceedings Article

In: Proc. Interspeech, pp. 3385-3389, 2015.

Liberatore, C; Gutierrez-Osuna, R

Joint Optimization of Anatomical and Gestural Parameters in a Physical Vocal Tract Model Proceedings Article

In: ICASSP, IEEE 2015.

Aryal, S; Gutierrez-Osuna, R

Reduction of non-native accents through statistical parametric articulatory synthesis Journal Article

In: Journal of the Acoustical Society of America, vol. 137, no. 1, pp. 433-446, 2015.

2014

Felps, D; Aryal, S; Gutierrez-Osuna, R

Normalization of articulatory data through Procrustes transformations and analysis-by-synthesis Proceedings Article

In: Proc. 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 3051-3055, 2014.

Aryal, S; Gutierrez-Osuna, R

Can voice conversion be used to reduce non-native accents Proceedings Article

In: Proc. 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 7929-7933, 2014.

Aryal, S; Gutierrez-Osuna, R

Accent conversion through cross-speaker articulatory synthesis Proceedings Article

In: Proc. 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 7744-7748, 2014.

2013

Aryal, S; Felps, D; Gutierrez-Osuna, R

Foreign Accent Conversion through Voice Morphing Proceedings Article

In: Interspeech, pp. 3077-3081, 2013.

2012

Felps, D; Geng, C; Gutierrez-Osuna, R

Foreign accent conversion through concatenative synthesis in the articulatory domain Journal Article

In: IEEE Transactions on Audio, Speech and Language Processing, 2012.

Abstract | Links | BibTeX

2010

Gutierrez-Osuna, R; Felps, D

Foreign Accent Conversion through Voice Morphing Technical Report

2010.

Abstract | Links | BibTeX

Felps, D; Geng, C; Berger, M; Richmond, K; Gutierrez-Osuna, R

Relying on critical articulators to estimate vocal tract spectra in an articulatory-acoustic database Conference

Interspeech, 2010.

Abstract | Links | BibTeX

Felps, D; Gutierrez-Osuna, R

Developing objective measures of foreign-accent conversion Journal Article

In: Audio, Speech, and Language Processing, IEEE Transactions on, vol. 18, no. 5, pp. 1030–1040, 2010.

Abstract | Links | BibTeX

@article{felps2010talsp,

title = {Developing objective measures of foreign-accent conversion},

author = {D Felps and R Gutierrez-Osuna},

url = {https://psi.engr.tamu.edu/wp-content/uploads/2018/01/felps2010talsp.pdf},

year  = {2010},

date = {2010-01-01},

journal = {Audio, Speech, and Language Processing, IEEE Transactions on},

volume = {18},

number = {5},

pages = {1030--1040},

publisher = {IEEE},

abstract = {Various methods have recently appeared to transform foreign-accented speech into its native-accented counterpart. Evaluation of these accent conversion methods requires extensive listening tests across a number of perceptual dimensions. This article presents three objective measures that may be used to assess the acoustic quality, degree of foreign accent, and speaker identity of accent-converted utterances. Accent conversion generates novel utterances: those of a foreign speaker with a native accent. Therefore, the acoustic quality in accent conversion cannot be evaluated with conventional measures of spectral distortion, which assume that a clean recording of the speech signal is available for comparison. Here we evaluate a single-ended measure of speech quality, lTV -T recommendation P.563 for narrow-band telephony. We also propose a measure of foreign accent that exploits a weakness of automatic speech recognizers: their sensitivity to foreign accents. Namely, we use phoneme-level match scores given by the HTK recognizer trained on a large number of English American speakers to obtain a measure of native accent. Finally, we propose a measure of speaker identity that projects acoustic vectors (e.g., Mel cepstral, F0) onto the linear discriminant that maximizes separability for a given pair of source and target speakers. The three measures are evaluated on a corpus of accent-converted utterances that had been previously rated through perceptual tests. Our results show that the three measures have a high degree of correlation with their corresponding subjective ratings, suggesting that they may be used to accelerate the development of foreign-accent conversion tools. Applications of these measures in the context of computer assisted pronunciation training and voice conversion are also discussed.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

2009

Felps, D; Bortfeld, H; Gutierrez-Osuna, R

Foreign accent conversion in computer assisted pronunciation training Journal Article

In: Speech communication, vol. 51, no. 10, pp. 920–932, 2009.

Abstract | Links | BibTeX

2008

Felps, D; Bortfeld, H; Gutierrez-Osuna, R

Prosodic and segmental factors in foreign-accent conversion Technical Report

2008.

Abstract | Links | BibTeX

L2-ARCTIC corpus

Childhood apraxia of speech