Buscar

Síntese de fala audiovisual_ uma visão geral do estado da arte - ScienceDirect

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Você viu 3, do total de 3 páginas

Prévia do material em texto

28/04/2023, 12:39 Síntese de fala audiovisual: uma visão geral do estado da arte - ScienceDirect
https://www.sciencedirect.com/science/article/abs/pii/S0167639314000818 1/3
Previous Next 
Comunicação por Voz
Volume 66, fevereiro de 2015 , páginas 182-217
Síntese de fala audiovisual: uma visão geral do estado da arte
Wesley Mattheyses ,Werner Verhelst 
Mostre mais
Contorno
https://doi.org/10.1016/j.specom.2014.11.001 ↗
Obtenha direitos e conteúdo ↗
Destaques
• Visão geral abrangente das várias técnicas de síntese de fala audiovisual .
• Categorização inovadora das técnicas com base em múltiplos aspectos.
• Diretrizes futuras importantes para o campo da síntese audiovisual de fala.
• Bundles a lot of information that was scattered in the scientific literature.
Abstract
We live in a world where there are countless interactions with computer systems in every-day situations.
In the most ideal case, this interaction feels as familiar and as natural as the communication we experience
with other humans. To this end, an ideal means of communication between a user and a computer system
consists of audiovisual speech signals. Audiovisual text-to-speech technology allows the computer system
to utter any spoken message towards its users. Over the last decades, a wide range of techniques for
performing audiovisual speech synthesis has been developed. This paper gives a comprehensive overview
on these approaches using a categorization of the systems based on multiple important aspects that
determine the properties of the synthesized speech signals. The paper makes a clear distinction between
the techniques that are used to model the virtual speaker and the techniques that are used to generate the
appropriate speech gestures. In addition, the paper discusses the evaluation of audiovisual speech
synthesizers, it elaborates on the hardware requirements for performing visual speech synthesis and it
describes some important future directions that should stimulate the use of audiovisual speech synthesis
technology in real-life applications.
a a b
Compartilhar Citar
https://www.sciencedirect.com/science/article/pii/S016763931400082X
https://www.sciencedirect.com/science/article/pii/S016763931400079X
https://www.sciencedirect.com/journal/speech-communication
https://www.sciencedirect.com/journal/speech-communication/vol/66/suppl/C
https://doi.org/10.1016/j.specom.2014.11.001
https://s100.copyright.com/AppDispatchServlet?publisherName=ELS&contentID=S0167639314000818&orderBeanReset=true
https://www.sciencedirect.com/topics/social-sciences/logopedics
https://www.sciencedirect.com/
28/04/2023, 12:39 Síntese de fala audiovisual: uma visão geral do estado da arte - ScienceDirect
https://www.sciencedirect.com/science/article/abs/pii/S0167639314000818 2/3
Keywords
Audiovisual speech synthesis; Visual speech synthesis; Speech synthesis
Artigos recomendados
Cited by (45)
Social fidelity in virtual agents: Impacts on presence and learning
2021, Computers in Human Behavior
Citation Excerpt :
…These studies were done with natural (i.e. not synthetic) speech, but they illustrate that visible speech can provide
benefits in a pedagogical context by increasing comprehension of spoken messages. Various approaches to animating
virtual agent speech have been used and evaluated in terms of subjective acceptability or naturalness, with considerably
less work evaluating whether those approaches result in the improved comprehension observed with natural audio-
visual speech (Mattheyses & Verhelst, 2015). When speech animation is based on performance capture from human
actors, perception is similar to that of natural visible speech, except in the case of phonemic contrasts that rely on the
visibility of internal articulators (Files, Tjan, Jiang, & Bernstein, 2015), which are generally difficult to capture (e.g., Crary,
Kotzur, Gauger, Gorham, & Burton, 1996).…
Show abstract
Features and results of a speech improvement experiment on hard of hearing children
2019, Speech Communication
Citation Excerpt :
…Facial movements are carried out within a parametric model, i.e. a collection of polygons is manipulated using a set of
parameters (terminal-analog system). This process allows control of a wide range of motions using a set of parameters
associated with different articulation functions (Parke 1972, Mattheyses – Verhelst 2015). These features can be directly
matched to particular movements of the lip, tongue, chin, eyes, eyelids, eyebrows and the whole face, and can vary from
−1 to 1.…
Show abstract
Video-realistic expressive audio-visual speech synthesis for the Greek language
2017, Speech Communication
Citation Excerpt :
…Under the same assumption it is also desirable that the levels of expressiveness of both information channels are
correlated - i.e., a full blown facial expression of anger is accompanied with a full blown vocal expression of anger.
Audio-Visual Text-To-Speech synthesis (AVTTS) explores the generation of audio-visual speech signals (Mattheyses and
Verhelst, 2015) (i.e., a talking head), and video-realistic AVTTS, more specifically, explores the generation of talking
heads that highly resemble a human being as if a camera was recording it. Although naturalness in video-realistic
AVTTS systems has increased greatly, the addition of expressions has proven to be a challenging task (Anderson et al.,
2013; Schröder, 2009) due to the large variability they introduce, especially in extreme expressions such as expressions
of anger or happiness, in both acoustic and visual modeling.…
https://www.sciencedirect.com/science/article/pii/S0747563220303113
https://www.sciencedirect.com/science/article/pii/S0167639318301274
https://www.sciencedirect.com/science/article/pii/S0167639317300419
28/04/2023, 12:39 Síntese de fala audiovisual: uma visão geral do estado da arte - ScienceDirect
https://www.sciencedirect.com/science/article/abs/pii/S0167639314000818 3/3
Mostrar resumo
CodeTalker: animação facial 3D orientada por fala com movimento discreto anterior
2023, arXiv
Difícil de ouvir, mas fácil de ver: percepção audiovisual do contraste /r/-/w/ em anglo-inglês
2022, Jornal da Sociedade Acústica da América
Deep Learning para análise visual da fala: uma pesquisa
2022, arXiv
Veja todos os artigos de citação no Scopus
Ver texto completo
Copyright © 2014 Elsevier BV Todos os direitos reservados.
Copyright © 2023 Elsevier BV ou seus licenciadores ou colaboradores.
ScienceDirect® is a registered trademark of Elsevier B.V.
https://doi.org/10.48550/arXiv.2301.02379
https://doi.org/10.1121/10.0012660
https://doi.org/10.48550/arXiv.2205.10839
http://www.scopus.com/scopus/inward/citedby.url?partnerID=10&rel=3.0.0&eid=2-s2.0-84912553696&md5=1b70b98eeb1ae815646fa5b65f75677d
https://www.sciencedirect.com/science/article/pii/S0167639314000818
https://www.elsevier.com/
https://www.relx.com/