Cover
Vol. 22 No. 1 (2026)

Published: June 15, 2026

Pages: 218-226

Original Article

Fusing Spatial and Temporal Features Extracted Using Convolutional Neural Networks and Gated Recurrent Units for Improved Deepfake Detection

Abstract

Deep falsification of multimedia content, especially videos and photos, threatens social cohesion (e.g., rumour propagation, extortion, and truth distortion) and must not be ignored. In some cases, this issue requires effective detection solutions. Most studies suggest that convolutional neural networks (CNNs) may not be able to extract complex features like those used in deepfake production. Thus, hybrid approaches that can capture complex features and act as powerful descriptors for binary classification are needed to separate bogus from true content. In this paper, a hybrid algorithm is developed to combine gated recurrent units (GRU) and CNN. The proposed model aims to improve the extraction of complex features by simultaneously capturing instantaneous and spatial features. This approach permits the extraction of implicit features that are vital to the final classification process, especially when dealing with a sequential series within video content. Finally, a dense neural network is used to classify these features. Practically, two data sets were used to train the proposed model: the FaceForensics++ (FF++) and DeepFake Detection Challenge (DFDC) datasets. The evaluation results of the proposed model on the FF++ dataset for the Area Under the Curve (AUC) and F1-score metrics reached 0.88% and 0.85%, respectively. While DFDC is 0.95% and 0.86% for the same metrics, respectively.

References

  1. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
  2. Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-df: A large-scale challenging dataset for deepfake forensics,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3207–3216, 2020.
  3. M. Masood, M. Nawaz, K. Malik, A. Javed, A. Irtaza, and H. Malik, “Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward,” Applied Intelligence, vol. 53, no. 4, pp. 3974–4026, 2023.
  4. A. Tiwari, R. Dave, and M. Vanamala, “Leveraging deep learning approaches for deepfake detection: A review,” arXiv preprint arXiv:2304.01908, 2023.
  5. D. Weimer, B. Scholz-Reiter, and M. Shpitalni, “Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection,” CIRP Annals, vol. 65, no. 1, pp. 417–420, 2016.
  6. M. Taye, “Theoretical understanding of convolutional neural network: concepts, architectures, applications, future directions,” Computation, vol. 11, no. 3, p. 52, 2023.
  7. S. Cong and Y. Zhou, “A review of convolutional neural network architectures and their optimizations,” Artificial Intelligence Review, vol. 56, no. 3, pp. 1905–1969, 2023.
  8. F. Shiri, T. Perumal, N. Mustapha, and R. Mohamed, “A comprehensive overview and comparative analysis of deep learning models: Cnn, rnn, lstm, gru,” arXiv preprint arXiv:2305.17473, 2023.
  9. Y. Dong, S. Patil, B. Van Arem, and H. Farah, “A hybrid spatial–temporal deep learning architecture for lane detection,” Computer-Aided Civil and Infrastructure Engineering, vol. 38, no. 1, pp. 67–86, 2023.
  10. N. Dufour et al., “Deepfakes detection dataset by google & jigsaw.” Deepfakes detection dataset by google & jigsaw, 2019.
  11. A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “Faceforensics++: Learning to detect manipulated facial images,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 1–11, 2019.
  12. B. Dolhansky, R. Howes, B. Pflaum, N. Baram, and C. Ferrer, “The deepfake detection challenge (dfdc) preview dataset,” arXiv preprint arXiv:1910.08854, 2019.
  13. M. Rana, M. Nobi, B. Murali, and A. Sung, “Deepfake detection: A systematic literature review,” IEEE Access, vol. 10, pp. 25494–25513, 2022.
  14. L. Stroebel, M. Llewellyn, T. Hartley, T. Ip, and M. Ahmed, “A systematic literature review on the effectiveness of deepfake detection techniques,” Journal of Cyber Security Technology, vol. 7, no. 2, pp. 83–113, 2023.
  15. L. Li et al., “Face x-ray for more general face forgery detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5001– 5010, 2020.
  16. S. Suratkar and F. Kazi, “Deep fake video detection using transfer learning approach,” Arab Journal of Science and Engineering, vol. 48, no. 8, pp. 9727–9737, 2023.
  17. T. Mittal, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha, “Emotions don’t lie: An audio-visual deepfake detection method using affective cues,” in Proceedings of the 28th ACM international conference on multimedia, pp. 2823–2832, 2020.
  18. X. Yang, Y. Li, and S. Lyu, “Exposing deep fakes using inconsistent head poses,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8261–8265, IEEE, 2019.
  19. A. Ismail, M. Elpeltagy, M. Zaki, and K. Eldahshan, “A new deep learning-based methodology for video deepfake detection using xgboost,” Sensors, vol. 21, no. 16, p. 5413, 2021.
  20. A. Haliassos, K. Vougioukas, S. Petridis, and M. Pantic, “Lips don’t lie: A generalisable and robust approach to face forgery detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5039–5049, 2021.
  21. D. E. King, “Dlib-ml: A machine learning toolkit,” The Journal of Machine Learning Research, vol. 10, pp. 1755–1758, 2009.
  22. G. Yadav, S. Maheshwari, and A. Agarwal, “Contrast limited adaptive histogram equalization based enhancement for real time video system,” in 2014 international conference on advances in computing, communications and informatics (ICACCI), pp. 2392–2397, IEEE, 2014.
  23. S. Albawi, T. A. Mohammed, and S. Al-Zawi, “Understanding of a convolutional neural network,” in 2017 international conference on engineering and technology (ICET), pp. 1–6, IEEE, 2017.
  24. S. Suratkar and F. Kazi, “Deep fake video detection using transfer learning approach,” Arab Journal of Science and Engineering, vol. 48, no. 8, pp. 9727–9737, 2023.
  25. H. H. Nguyen, F. Fang, J. Yamagishi, and I. Echizen, “Multi-task learning for detecting and segmenting manipulated facial images and videos,” in 2019 IEEE 10th international conference on biometrics theory, applications and systems (BTAS), pp. 1–8, IEEE, 2019.
  26. D. Zhang, C. Li, F. Lin, D. Zeng, and S. Ge, “Detecting deepfake videos with temporal dropout 3dcnn.,” in IJCAI, pp. 1288–1294, 2021.
  27. F. Alanazi, G. Ushaw, and G. Morgan, “Improving detection of deepfakes through facial region analysis in images,” Electronics (Basel), vol. 13, no. 1, p. 126, 2023.
  28. Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-df: A large-scale challenging dataset for deepfake forensics,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3207–3216, 2020.
  29. H. H. Nguyen, J. Yamagishi, and I. Echizen, “Use of a capsule network to detect fake images and videos,” arXiv preprint arXiv:1910.12467, 2019.
  30. J. Hu et al., “Recap: Detecting deepfake video with unpredictable tampered traces via recovering faces and mapping recovered faces,” arXiv preprint arXiv:2308.09921, 2023.