Cover
Vol. 18 No. 1 (2022)

Published: June 30, 2022

Pages: 28-37

Review Article

Facial Modelling and Animation: An Overview of The State-of-The Art

Abstract

Animating human face presents interesting challenges because of its familiarity as the face is the part utilized to recognize individuals. This paper reviewed the approaches used in facial modeling and animation and described their strengths and weaknesses. Realistic face animation of computer graphic models of human faces can be hard to achieve as a result of the many details that should be approximated in producing realistic facial expressions. Many methods have been researched to create more and more accurate animations that can efficiently represent human faces. We described the techniques that have been utilized to produce realistic facial animation. In this survey, we roughly categorized the facial modeling and animation approach into the following classes: blendshape or shape interpolation, parameterizations, facial action coding system-based approaches, moving pictures experts group-4 facial animation, physics-based muscle modeling, performance driven facial animation, visual speech animation.

References

  1. M. Elson, “Displacement” facial animation techniques,” Vol 26: State of the Art in Facial Animation, pp. 21–42, 1990.
  2. J. Kleiser, “A fast, efficient, accurate way to represent the human face,” SIGGRAPH’89 Course Notes 22: State of the Art in Facial Animation, pp. 36–40, 1989.
  3. Z. Deng, J. Bailenson, J. P. Lewis, and U. Neumann, “Perceiving visual emotions with speech,” in International Workshop on Intelligent Virtual Agents. Springer, 2006, pp. 107–120.
  4. J. Fordham, “Middle earth strikes back,” Cinefex, vol. 92, pp. 71–142, 2003.
  5. M. Sagar1, “Facial performance capture and expressive translation for king kong,” in ACM SIGGRAPH 2006 Courses.
  6. B. Flueckiger, “Computer-generated characters in avatar and benjamin button,” Digitalitat und Kino. Translation from German by B. Letzler, vol. 1, 2011.
  7. F. I. Parke and K. Waters, Computer facial animation. CRC press, 2008.
  8. F. I. Parke, “A parametric model for human faces.” UTAH UNIV SALT LAKE CITY DEPT OF COMPUTER SCIENCE, Tech. Rep., 1974.
  9. E. Friesen and P. Ekman, “Facial action coding system: a technique for the measurement of facial movement,” Palo Alto, vol. 3, 1978.
  10. W. F. P. Ekman and J. Hager, “Facial action coding system,” The Manual on CD ROM, A Human Face, Salt Lake City, Tech. Rep., 2002.
  11. C. M. Haase and et al., “Short alleles, bigger smiles? The effect of 5-httlpr on positive emotional expressions.” Emotion, vol. 15, no. 4, p. 438, 2015.
  12. A. A. Gunawan et al., “Face expression detection on kinect using active appearance model and fuzzy logic,” Procedia Computer Science, vol. 59, pp. 268–274, 2015.
  13. I. M.Menne and B. Lugrin, “In the face of emotion: a behavioral study on emotions towards a robot using the facial action coding system,” in Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction, 2017, pp. 205– 206.
  14. P. Tripathi, K. Verma, L. Verma, and N. Parveen, “Facial expression recognition using data mining algorithm,” Journal of Economics, Business and Management, vol. 1, no. 4, pp. 343–346, 2013.
  15. C. Butler, L. Subramanian, and S. Michalowicz, “Crowdsourced facial expression mapping using a 3d avatar,” in Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, 2016, pp. 2798–2804. Shakir & Al-Azza | 35
  16. R. Amini, C. Lisetti, and G. Ruiz, “Hapfacs 3.0: Facs-based facial expression generator for 3d Speaking virtual characters,” IEEE Transactions on Affective Computing, vol. 6, no. 4, pp. 348–360, 2015.
  17. R. Ekman, What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, USA, 1997.
  18. D. Kumar and D. Sharma, “Enhanced waters 2d muscle model for facial expression generation.” in VISIGRAPP (1: GRAPP), 2019, pp. 262–269.
  19. D. Kumar and J. Vanualailai, “Low bandwidth video streaming using facs, facial expression and animation techniques.” in VISIGRAPP (1: GRAPP), 2016, pp. 226–235.
  20. Y. Zhou and B. E. Shi, “Photorealistic facial expression synthesis by the conditional difference adversarial autoencoder,” in 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 2017, pp. 370–376.
  21. A. Pumarola, A. Agudo, A. M. Martinez, A. Sanfeliu, and F. Moreno-Noguer, “Ganimation: Anatomically-aware facial animation from a single image,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 818–833.
  22. K. Zhao.-S. Chu, and H. Zhang, “Deep region and multi-label learning for facial action unit detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3391–3399.
  23. Z. Liu, D. Liu, and Y. Wu, “Region based adversarial synthesis of facial action units,” in International Conference on Multimedia Modeling. Springer, 2020, pp. 514–526.
  24. Z. Liu, J. Dong, C. Zhang, L. Wang, and J. Dang, “Relation modeling with graph convolutional networks for facial action unit detection,” in International Conference on Multimedia Modeling. Springer, 2020, pp. 489–501.
  25. T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv: 1609.02907, 2016.
  26. A. Pakstas, R. Forchheimer, and I. S. Pandzic, MPEG-4 Facial Animation: The Standard, Implementation and Applications. John Wiley & Sons, Inc., 2003.
  27. A. El Rhalibi, C. Carter, S. Cooper, and M. Merabti, “Highly realistic mpeg-4 compliant facial animation with charisma,” in 2011 Proceedings of 20th International Conference on Computer Communications and Networks (ICCCN). IEEE, 2011, pp. 1–6.
  28. S. M. Platt and N. I. Badler, “Animating facial expressions,” in Proceedings of the 8th annual conference on Computer graphics and interactive techniques, 1981, pp. 245–252.
  29. K. Waters, “A muscle model for animation three- dimensional facial expression,” Acm siggraph computer graphics, vol. 21, no. 4, pp. 17–24, 1987.
  30. D. Terzopoulos and K. Waters, “Physically-based facial modelling, analysis, and animation,” The journal of visualization and computer animation, vol. 1, no. 2, pp. 73–80, 1990.
  31. E. Sifakis, I. Neverov, and R. Fedkiw, “Automatic determination of facial muscle activations from sparse motion capture marker data,” in ACM SIGGRAPH 2005 Papers, 2005, pp. 417–425.
  32. M. Cong, M. Bao, J. L. E, K. S. Bhat, and R. Fedkiw, “Fully automatic generation of anatomical face simulation models,” in Proceedings of the 14th ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2015, pp. 175–183.
  33. A. E. Ichim, P. Kadleˇcek, L. Kavan, and M. Pauly, “Phace: physics-based face modeling and animation,” ACM Transactions on Graphics (TOG), vol. 36, no. 4, pp. 1–14, 2017.
  34. W.-C. Ma, Y.-H. Wang, G. Fyffe, B.-Y. Chen, and P. Debevec, “A blendshape model that incorporates physical interaction,” Computer Animation and Virtual Worlds, vol. 23, no. 3-4, pp. 235–243, 2012.
  35. Y. Kozlov, D. Bradley, M. Bacher, B. Thomaszewski, T. Beeler, and M. Gross, “Enriching facial blendshape rigs with physical simulation,” in Computer Graphics Forum, vol. 36, no. 2. Wiley Online Library, 2017, pp. 75–84.
  36. ISO/IEC 14496-2:1999. Information technology – Coding of audio-visual objects – Part 2: Visual. ISO, Geneva, Switzerland. 2010.
  37. D. Bennett, “The faces of” the polar express”,” in ACM Siggraph 2005 Courses.[38] L. Williams, “Performance-driven facial animation,” in Acm SIGGRAPH 2006 Courses.
  38. D. Bradley, W. Heidrich, T. Popa, and A. Sheffer, “High resolution passive facial performance capture,” in ACM SIGGRAPH 2010 papers, 2010, pp. 1–10.
  39. T. Beeler, F. Hahn, D. Bradley, B. Bickel, P. Beardsley, C. Gotsman, R.W. Sumner, and M. Gross, “Highquality passive facial performance capture using anchor frames,” in ACM SIGGRAPH 2011 papers, 2011, pp. 1–10.
  40. T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance models,” IEEE Transactions on pattern analysis and machine intelligence, vol. 23, no. 6, pp. 681–685, 2001.
  41. J. R. Tena, F. De la Torre, and I. Matthews, “Interactive region-based linear 3d face models,” in ACM SIGGRAPH 2011 papers, 2011, pp. 1–10.
  42. C. Cao, D. Bradley, K. Zhou, and T. Beeler, “Real- time high-fidelity facial performance capture,” ACM Transactions on Graphics (ToG), vol. 34, no. 4, pp. 1–9, 2015.
  43. E. Sifakis, A. Selle, A. Robinson-Mosher, and R. Fedkiw, “Simulating speech with a physics-based facial muscle model,” in Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation, 2006, pp. 261–270.
  44. G. Borshukov, J. Montgomery, and W. Werner, “Playable universal capture: compression and real- time sequencing of image-based facial animation,” in ACM SIGGRAPH 2006 Courses.
  45. V. Barrielle, N. Stoiber, and C. Cagniart, “Blendforces: A dynamic framework for facial animation,” in Computer Graphics Forum, vol. 35, no. 2. Wiley Online Library, 2016, pp. 341–352.
  46. Z. Deng, P.-Y. Chiang, P. Fox, and U. Neumann, “Animating blendshape faces by cross-mapping motion capture data,” in Proceedings of the 2006 symposium on Interactive 3D graphics and games, 2006, pp. 43–48.
  47. J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Nießner, “Face2face: Real-time face capture and reenactment of rgb videos,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2387– 2395.
  48. R. Ford. Use animoji on your iphone x and ipad pro. https://support.apple.com/engb/HT208190
  49. J. M. D. Barros, V. Golyanik, K. Varanasi, and D. Stricker, “Face it!: A pipeline for real-time Performance driven facial animation,” in 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019, pp. 2209–2213.
  50. K. Olszewski, J. J. Lim, S. Saito, and H. Li, “High- fidelity facial and speech animation for vr hmds,” ACM Transactions on Graphics (TOG), vol. 35, no. 6, pp. 1–14, 2016.
  51. S. Laine, T. Karras, T. Aila, A. Herva, S. Saito, R. Yu, H. Li, and J. Lehtinen, “Production-level facial performance capture using deep convolutional neural networks,” in Proceedings of the ACM SIGGRAPH/ Eurographics Symposium on Computer Animation, 2017, pp. 1–10.
  52. N. Kholgade, I. Matthews, and Y. Sheikh, “Content retargeting using parameter-parallel facial layers,” in Proceedings of the 2011 ACM SIGGRAPH / Eurographics Symposium on Computer Animation, 2011, pp.195–204.
  53. M. M. Cohen and D. W. Massaro, “Modeling coarticulation in synthetic visual speech,” in Models and techniques in computer animation. Springer, 1993, pp. 139–156.
  54. B.-J. Theobald and I. Matthews, “Relating objective and subjective performance measures for aam- based visual speech synthesis,” IEEE transactions on audio, speech, and language processing, vol. 20, no. 8, pp. 2378–2387, 2012.
  55. W. Mattheyses, L. Latacz, andW. Verhelst, “Comprehensive many-to-many phoneme-to- viseme mapping and its application for concatenative visual speech synthesis,” Speech Communication, vol. 55, no. 7-8, pp. 857–876, 2013.
  56. S. L. Taylor, M. Mahler, B.-J. Theobald, and I. Matthews, “Dynamic units of visual speech,” in Proceedings of the 11th ACM SIGGRAPH/Eurographics conference on Computer Animation, 2012, pp. 275–284.
  57. J. Ma, R. Cole, B. Pellom, W. Ward, and B. Wise, “Accurate visible speech synthesis based on concatenating variable length motion capture data,” IEEE Transactions on Visualization and Computer Graphics, vol. 12, no. 2, pp. 266–276, 2006.
  58. G. Englebienne, T. Cootes, and M. Rattray, “A probabilistic model for generating realistic lip movements from speech,” in Advances in neural information processing systems, 2008, pp. 401– 408.
  59. S. Deena, S. Hou, and A. Galata, “Visual speech synthesis using a variable-order switching shared gaussian process dynamical model,” IEEE transactions on multimedia, vol. 15, no. 8, pp. 1755–1768, 2013.
  60. D.W. Massaro, J. Beskow, M. M. Cohen, C. L. Fry, and T. Rodgriguez, “Picture my voice: Audio to visual speech synthesis using artificial neural networks,” in AVSP’99-International Conference on Auditory-Visual Speech Processing, 1999.
  61. M. Tamura, T. Masuko, T. Kobayashi, and K. Tokuda, “Visual speech synthesis based on parameter generation from hmm: Speech-driven and text-and-speech-driven approaches,” in AVSP’98 International Conference on Auditory- Visual Speech Processing, 1998.
  62. D. Schabus, M. Pucher, and G. Hofer, “Joint audiovisual hidden semi-markov model-based speech synthesis,” IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 2, pp. 336–347, 2013.
  63. G. Hofer, J. Yamagishi, and H. Shimodaira, “Speech-driven lip motion generation with a trajectory hmm,” 2008.
  64. M. Brand, “Voice puppetry,” in Proceedings of the 26th annual conference on Computer graphics and interactive techniques, 1999, pp. 21–28.
  65. L. Xie and Z.-Q. Liu, “A coupled hmm approach to video-realistic speech animation,” Pattern Recognition, vol. 40, no. 8, pp. 2325–2340, 2007.
  66. K. Choi, Y. Luo, and J.-N. Hwang, “Hidden markov model inversion for audio-to-visual conversion in an mpeg-4 facial animation system,” Journal of VLSI signal processing systems for signal, image and video technology, vol. 29, no. 1- 2, pp. 51–61, 2001.
  67. L. D. Terissi and J. C. Gomez, “Audio-to-visual conversion via hmm inversion for speech-driven facial animation,” in Brazilian Symposium on Artificial Intelligence. Springer, 2008, pp. 33–42.
  68. X. Zhang, L.Wang, G. Li, F. Seide, and F. K. Soong, “A new language independent, photo- realistic talking head driven by voice only.” in Interspeech, 2013, pp. 2743–2747.
  69. S. Suwajanakorn, S. M. Seitz, and I. Kemelmacher- Shlizerman, “Synthesizing obama: learning lip sync Shakir & Al-Azza | 37 from audio,” ACM Transactions on Graphics (TOG), vol. 36, no. 4, pp. 1–13, 2017.
  70. J. S. Chung, A. Jamaludin, and A. Zisserman, “You said that?” arXiv preprint arXiv: 1705.02966, 2017.
  71. H. X. Pham, S. Cheung, and V. Pavlovic, “Speech- driven 3d facial animation with implicit emotional awareness: A deep learning approach,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 80–88.
  72. H. X. Pham, Y.Wang, and V. Pavlovic, “End-to-end learning for 3d facial animation from raw waveforms of speech,” arXiv preprint arXiv: 1710.00920, 2017.
  73. L. Chen, R. K. Maddox, Z. Duan, and C. Xu, “Hierarchical cross-modal talking face generation with dynamic pixel-wise loss,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 7832–7841.
  74. Y. Song, J. Zhu, X. Wang, and H. Qi, “Talking face generation by conditional recurrent adversarial network,” arXiv preprint arXiv: 1804.04786, 2018.
  75. M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv: 1411.1784, 2014.
  76. K. Vougioukas, S. Petridis, and M. Pantic, “End-to- end speech-driven realistic facial animation with temporal gans,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 37–40.
  77. K. Vougioukas, S. Petridis, and M. Pantic, “Realistic speech-driven facial animation with gans,” International Journal of Computer Vision, pp. 1–16, 2019.
  78. H. Zhou, Y. Liu, Z. Liu, P. Luo, and X. Wang, “Talking face generation by adversarially disentangled audio-visual representation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 9299–9306.
  79. H. Guo, F. K. Soong, L. He, and L. Xie, “A new gan-based end-to-end tts training algorithm,” arXiv preprint arXiv: 1904.04775, 2019.
  80. O. Aina, and J. Zhang, "Automatic muscle generation for physically-based facial animation." ACM SIGGRAPH 2010 Posters. 2010.