Page 40 - IJEEE-2022-Vol18-ISSUE-1
P. 40
36 | Shakir & Al-Azza
[45] G. Borshukov, J. Montgomery, and W. Werner, SIGGRAPH/Eurographics conference on Computer
“Playable universal capture: compression and real- Animation, 2012, pp. 275–284.
time sequencing of image-based facial animation,”
[58] J. Ma, R. Cole, B. Pellom, W. Ward, and B. Wise,
in ACM SIGGRAPH 2006 Courses. “Accurate visible speech synthesis based on
[46] V. Barrielle, N. Stoiber, and C. Cagniart, concatenating variable length motion capture data,”
“Blendforces: A dynamic framework for facial
animation,” in Computer Graphics Forum, vol. 35, IEEE Transactions on Visualization and Computer
no. 2. Wiley Online Library, 2016, pp. 341–352.
Graphics, vol. 12, no. 2, pp. 266–276, 2006.
[59] G. Englebienne, T. Cootes, and M. Rattray, “A
[47] Z. Deng, P.-Y. Chiang, P. Fox, and U. Neumann, probabilistic model for generating realistic lip
“Animating blendshape faces by cross-mapping movements from speech,” in Advances in neural
motion capture data,” in Proceedings of the 2006 information processing systems, 2008, pp. 401–
symposium on Interactive 3D graphics and games, 408.
2006, pp. 43–48. [60] S. Deena, S. Hou, and A. Galata, “Visual speech
[48] J. Thies, M. Zollhofer, M. Stamminger, C. synthesis using a variable-order switching shared
Theobalt, and M. Nießner, “Face2face: Real-time gaussian process dynamical model,” IEEE
face capture and reenactment of rgb videos,” in
transactions on multimedia, vol. 15, no. 8, pp.
Proceedings of the IEEE conference on computer 1755–1768, 2013.
vision and pattern recognition, 2016, pp. 2387–
[61] D.W. Massaro, J. Beskow, M. M. Cohen, C. L. Fry,
2395. and T. Rodgriguez, “Picture my voice: Audio to
[49] R. Ford. Use animoji on your iphone x and ipad visual speech synthesis using artificial neural
networks,” in AVSP’99-International Conference
pro. https://support.apple.com/engb/HT208190
[50] J. M. D. Barros, V. Golyanik, K. Varanasi, and D. on Auditory-Visual Speech Processing, 1999.
Stricker, “Face it!: A pipeline for real-time
Performance driven facial animation,” in 2019 [62] M. Tamura, T. Masuko, T. Kobayashi, and K.
Tokuda, “Visual speech synthesis based on
IEEE International Conference on Image parameter generation from hmm: Speech-driven
Processing (ICIP). IEEE, 2019, pp. 2209–2213. and text-and-speech-driven approaches,” in
[51] K. Olszewski, J. J. Lim, S. Saito, and H. Li, “High- AVSP’98 International Conference on Auditory-
fidelity facial and speech animation for vr hmds,”
Visual Speech Processing, 1998.
ACM Transactions on Graphics (TOG), vol. 35, no. [63] D. Schabus, M. Pucher, and G. Hofer, “Joint
6, pp. 1–14, 2016.
audiovisual hidden semi-markov model-based
[52] S. Laine, T. Karras, T. Aila, A. Herva, S. Saito, R. speech synthesis,” IEEE Journal of Selected Topics
Yu, H. Li, and J. Lehtinen, “Production-level facial in Signal Processing, vol. 8, no. 2, pp. 336–347,
performance capture using deep convolutional 2013.
neural networks,” in Proceedings of the ACM
[64] G. Hofer, J. Yamagishi, and H. Shimodaira,
SIGGRAPH/ Eurographics Symposium on “Speech-driven lip motion generation with a
Computer Animation, 2017, pp. 1–10. trajectory hmm,” 2008.
[53] N. Kholgade, I. Matthews, and Y. Sheikh, “Content
retargeting using parameter-parallel facial layers,” [65] M. Brand, “Voice puppetry,” in Proceedings of the
in Proceedings of the 2011 ACM SIGGRAPH / 26th annual conference on Computer graphics and
interactive techniques, 1999, pp. 21–28.
Eurographics Symposium on Computer Animation, 2011, [66] L. Xie and Z.-Q. Liu, “A coupled hmm approach to
pp.195–204. video-realistic speech animation,” Pattern
[54] M. M. Cohen and D. W. Massaro, “Modeling Recognition, vol. 40, no. 8, pp. 2325–2340, 2007.
coarticulation in synthetic visual speech,” in [67] K. Choi, Y. Luo, and J.-N. Hwang, “Hidden
Models and techniques in computer animation. markov model inversion for audio-to-visual
Springer, 1993, pp. 139–156. conversion in an mpeg-4 facial animation system,”
[55] B.-J. Theobald and I. Matthews, “Relating objective
Journal of VLSI signal processing systems for
and subjective performance measures for aam- signal, image and video technology, vol. 29, no. 1-
based visual speech synthesis,” IEEE transactions 2, pp. 51–61, 2001.
[68] L. D. Terissi and J. C. Gomez, “Audio-to-visual
on audio, speech, and language processing, vol. 20,
no. 8, pp. 2378–2387, 2012. conversion via hmm inversion for speech-driven
facial animation,” in Brazilian Symposium on
[56] W. Mattheyses, L. Latacz, andW. Verhelst, Artificial Intelligence. Springer, 2008, pp. 33–42.
“Comprehensive many-to-many phoneme-to-
viseme mapping and its application for [69] X. Zhang, L.Wang, G. Li, F. Seide, and F. K.
concatenative visual speech synthesis,” Speech Soong, “A new language independent, photo-
Communication, vol. 55, no. 7-8, pp. 857–876, realistic talking head driven by voice only.” in
Interspeech, 2013, pp. 2743–2747.
2013.
[57] S. L. Taylor, M. Mahler, B.-J. Theobald, and I. [70] S. Suwajanakorn, S. M. Seitz, and I. Kemelmacher-
Matthews, “Dynamic units of visual speech,” in Shlizerman, “Synthesizing obama: learning lip sync
Proceedings of the 11th ACM