Page 164 - 2024-Vol20-Issue2
P. 164

160 |                                                                                                       Murad & Alasadi

Rivals like the HTC Vivi, Sony PlayStation VR, Gear VR, and         and Multi-modal Gesture Datasets (MMGD) [74], which in-
Google Cardboard headset have emerged, leading to devel-            clude gesture classes from nine domains, including Italian
opers working on virtual reality games and applications [66].       sign language, pantomime, and actions. The effort to recog-
These technologies have various applications, including video       nize gestures inside a car is described in [75], which provides
games, art, medical applications, education, and flight train-      driver’s hand gestures performed by eight different actors from
ing [67].                                                           one viewpoint against a plain background. Other datasets
                                                                    for sign language include RWTH-BOSTON-50, RWTH-400,
B. Recognition of Sign Language                                     NATOPS, and BIGHands [76]. The BIGHands datasets [77],
Sign language is a primary language for individuals with            designed for the posing of hands, is rich in hand pose variation
speech and hearing impairments, utilizing nonverbal commu-          and joint annotation but not explicitly reflective of gestures.
nication such as gestures. Sign languages are divided into
three main parts: fingerspelling, sign language vocabulary at       X. EVALUATION OF MODEL PERFORMANCE
the word level and non-manual features [68]. Fingerspelling
involves using gestures to spell words, while non-manual fea-       The following parameters are used for evaluating the classi-
tures involve facial expressions and body postures. Due to          fier’s performance [78]:
the limited understanding of sign language, hard-of-hearing         TN: is the number of true negatives.
individuals often require the assistance of a trained interpreter.  TP: is the number of true positives.
However, hiring an interpreter can be costly and not always         FN: is the number of false negatives.
feasible. Using visual cues, a sign language recognition sys-       FP: is the number of false positives.
tem could provide a low-cost, natural, and comfortable way          Precision: the classifier can avoid labeling a negative sample
to interact with deaf or hard-of-hearing individuals with im-       as positive. (See (1))
paired speech.
                                                                                        TP                  (1)
C. Robot and Ambient-Aided Living                                   Precision =
Ambient Assisted Living (AAL) is a sub-field of Ambient
Intelligence that integrates new technologies and social envi-                     (T P + FP)
ronments to improve life quality [69]. Vision-based assistive
systems can benefit patients by observing their daily activities.   Recall-Sensitivity: the classifier can find all positive samples
As robots become more integrated into our daily lives, the          as in (2).
challenge of communicating with them becomes more appar-
ent. Hand gestures play a significant role in natural interaction,  Recall = T P                            (2)
especially when a robot is designed to assist people in daily                   (T P + FN)
tasks.
Hand gestures can be used in rehabilitation treatment, con-         The F1-score is a weighted harmonic mean for recall and
trolling medical equipment, and assisting disabled individu-        precision, with the highest value being one and the lowest
als. Approaches for integrating hand gestures with physician-       value being 0, as in (3).
computer interfaces have been explored, with Gestix being a
tracking system for hand gesture tracking in the operations                            recal l x precision  (3)
room [70].                                                          F1 score = 2X

    IX. DATASETS OF DYNAMIC GESTURE                                                   recall + precision
                       RECOGNITION
                                                                    The support is the number of times in y-true every class ap-
Gesture recognition involves various datasets, including cate-      pears.
gories, scale, annotations type, sensors, and gesture domain.       Micro Average: expresses the function for computing the
The Cambridge hand gestures datasets [71], a recent addi-           metric by considering all false negatives, true positives, and
tion, consists of 900 RGB sequences from nine classes of            false positives (regardless of the dataset’s predictions for each
gestures. Sheffield Kinect Gesture (SKIG) [72], a dynamic           class). If you detect a class imbalance, a micro-average is
gesture datasets, has 2160 hand gesture sequences divided           preferred (i.e., you might have more samples of one class than
into ten classes. ChaLearn Gesture Challenge offers pop-            others might).
ular datasets like the ChaLearn LAP IsoGD, ConGD [73],              Macro Average: states the function that calculates the metric
                                                                    independently for every class, and the average is returned
                                                                    without taking the percentage of every label in the datasets
                                                                    into account. (As a result, all classes are treated equally).
                                                                    Weighted Average: states the function that computes f1 for
                                                                    each label in the datasets, and the average is returned based
                                                                    on the proportion in the datasets for every label.
   159   160   161   162   163   164   165   166   167   168   169