Page 40 - IJEEE-2022-Vol18-ISSUE-1
P. 40

36 |                                                                                               Shakir & Al-Azza

[45] G. Borshukov, J. Montgomery, and W. Werner,           SIGGRAPH/Eurographics conference on Computer
  “Playable universal capture: compression and real-       Animation, 2012, pp. 275–284.
  time sequencing of image-based facial animation,”
                                                           [58] J. Ma, R. Cole, B. Pellom, W. Ward, and B. Wise,

in ACM SIGGRAPH 2006 Courses.                              “Accurate visible speech synthesis based on

[46] V. Barrielle, N. Stoiber, and C. Cagniart,            concatenating variable length motion capture  data,”
  “Blendforces: A dynamic framework for facial
  animation,” in Computer Graphics Forum, vol. 35,         IEEE Transactions on Visualization and        Computer
  no. 2. Wiley Online Library, 2016, pp. 341–352.
                                                           Graphics, vol. 12, no. 2, pp. 266–276,        2006.

                                                           [59] G. Englebienne, T. Cootes, and M. Rattray, “A

[47] Z. Deng, P.-Y. Chiang, P. Fox, and U. Neumann,        probabilistic model for generating realistic lip
  “Animating blendshape faces by cross-mapping             movements from speech,” in Advances in neural
  motion capture data,” in Proceedings of the 2006         information processing systems, 2008, pp. 401–

symposium on Interactive 3D graphics and games,              408.
2006, pp. 43–48.                                           [60] S. Deena, S. Hou, and A. Galata, “Visual speech

[48] J. Thies, M. Zollhofer, M. Stamminger, C.             synthesis using a variable-order switching shared
  Theobalt, and M. Nießner, “Face2face: Real-time          gaussian process dynamical model,” IEEE
  face capture and reenactment of rgb videos,” in
                                                           transactions on multimedia, vol. 15, no. 8, pp.
Proceedings of the IEEE conference on computer             1755–1768, 2013.
vision and pattern recognition, 2016, pp. 2387–
                                                           [61] D.W. Massaro, J. Beskow, M. M. Cohen, C. L. Fry,
2395.                                                        and T. Rodgriguez, “Picture my voice: Audio to

[49] R. Ford. Use animoji on your iphone x and ipad        visual speech synthesis using artificial neural
                                                           networks,” in AVSP’99-International Conference
pro. https://support.apple.com/engb/HT208190

[50] J. M. D. Barros, V. Golyanik, K. Varanasi, and D.     on Auditory-Visual Speech Processing, 1999.
  Stricker, “Face it!: A pipeline for real-time
  Performance driven facial animation,” in 2019            [62] M. Tamura, T. Masuko, T. Kobayashi, and K.
                                                             Tokuda, “Visual speech synthesis based on

  IEEE International Conference on Image                   parameter generation from hmm: Speech-driven
  Processing (ICIP). IEEE, 2019, pp. 2209–2213.            and text-and-speech-driven approaches,” in
[51] K. Olszewski, J. J. Lim, S. Saito, and H. Li, “High-  AVSP’98 International Conference on Auditory-
  fidelity facial and speech animation for vr hmds,”
                                                             Visual Speech Processing, 1998.
ACM Transactions on Graphics (TOG), vol. 35, no.           [63] D. Schabus, M. Pucher, and G. Hofer, “Joint
6, pp. 1–14, 2016.
                                                           audiovisual hidden semi-markov model-based
[52] S. Laine, T. Karras, T. Aila, A. Herva, S. Saito, R.  speech synthesis,” IEEE Journal of Selected Topics
  Yu, H. Li, and J. Lehtinen, “Production-level facial     in Signal Processing, vol. 8, no. 2, pp. 336–347,

performance capture using deep convolutional               2013.
neural networks,” in Proceedings of the ACM
                                                           [64] G. Hofer, J. Yamagishi, and H. Shimodaira,
  SIGGRAPH/ Eurographics Symposium on                        “Speech-driven lip motion generation with a
  Computer Animation, 2017, pp. 1–10.                        trajectory hmm,” 2008.
[53] N. Kholgade, I. Matthews, and Y. Sheikh, “Content
  retargeting using parameter-parallel facial layers,”     [65] M. Brand, “Voice puppetry,” in Proceedings of the

in Proceedings of the 2011 ACM SIGGRAPH /                    26th annual conference on Computer graphics and
                                                             interactive techniques, 1999, pp. 21–28.
  Eurographics Symposium on Computer Animation, 2011,      [66] L. Xie and Z.-Q. Liu, “A coupled hmm approach to
  pp.195–204.                                                video-realistic speech animation,” Pattern
[54] M. M. Cohen and D. W. Massaro, “Modeling                Recognition, vol. 40, no. 8, pp. 2325–2340, 2007.
  coarticulation in synthetic visual speech,” in           [67] K. Choi, Y. Luo, and J.-N. Hwang, “Hidden

  Models and techniques in computer animation.             markov model inversion for audio-to-visual
  Springer, 1993, pp. 139–156.                             conversion in an mpeg-4 facial animation system,”
[55] B.-J. Theobald and I. Matthews, “Relating objective
                                                           Journal of VLSI signal processing systems for

and subjective performance measures for aam-                 signal, image and video technology, vol. 29, no. 1-
based visual speech synthesis,” IEEE transactions            2, pp. 51–61, 2001.
                                                           [68] L. D. Terissi and J. C. Gomez, “Audio-to-visual
on audio, speech, and language processing, vol. 20,
no. 8, pp. 2378–2387, 2012.                                conversion via hmm inversion for speech-driven
                                                           facial animation,” in Brazilian Symposium on
[56] W. Mattheyses, L. Latacz, andW. Verhelst,             Artificial Intelligence. Springer, 2008, pp. 33–42.

“Comprehensive  many-to-many   phoneme-to-

viseme mapping and its application for                     [69] X. Zhang, L.Wang, G. Li, F. Seide, and F. K.
concatenative visual speech synthesis,” Speech               Soong, “A new language independent, photo-
Communication, vol. 55, no. 7-8, pp. 857–876,                realistic talking head driven by voice only.” in
                                                             Interspeech, 2013, pp. 2743–2747.
2013.

[57] S. L. Taylor, M. Mahler, B.-J. Theobald, and I.       [70] S. Suwajanakorn, S. M. Seitz, and I. Kemelmacher-
  Matthews, “Dynamic units of visual speech,” in             Shlizerman, “Synthesizing obama: learning lip sync

Proceedings of the 11th ACM
   35   36   37   38   39   40   41   42   43   44   45