Page 164 - 2024-Vol20-Issue2
P. 164
160 | Murad & Alasadi
Rivals like the HTC Vivi, Sony PlayStation VR, Gear VR, and and Multi-modal Gesture Datasets (MMGD) [74], which in-
Google Cardboard headset have emerged, leading to devel- clude gesture classes from nine domains, including Italian
opers working on virtual reality games and applications [66]. sign language, pantomime, and actions. The effort to recog-
These technologies have various applications, including video nize gestures inside a car is described in [75], which provides
games, art, medical applications, education, and flight train- driver’s hand gestures performed by eight different actors from
ing [67]. one viewpoint against a plain background. Other datasets
for sign language include RWTH-BOSTON-50, RWTH-400,
B. Recognition of Sign Language NATOPS, and BIGHands [76]. The BIGHands datasets [77],
Sign language is a primary language for individuals with designed for the posing of hands, is rich in hand pose variation
speech and hearing impairments, utilizing nonverbal commu- and joint annotation but not explicitly reflective of gestures.
nication such as gestures. Sign languages are divided into
three main parts: fingerspelling, sign language vocabulary at X. EVALUATION OF MODEL PERFORMANCE
the word level and non-manual features [68]. Fingerspelling
involves using gestures to spell words, while non-manual fea- The following parameters are used for evaluating the classi-
tures involve facial expressions and body postures. Due to fier’s performance [78]:
the limited understanding of sign language, hard-of-hearing TN: is the number of true negatives.
individuals often require the assistance of a trained interpreter. TP: is the number of true positives.
However, hiring an interpreter can be costly and not always FN: is the number of false negatives.
feasible. Using visual cues, a sign language recognition sys- FP: is the number of false positives.
tem could provide a low-cost, natural, and comfortable way Precision: the classifier can avoid labeling a negative sample
to interact with deaf or hard-of-hearing individuals with im- as positive. (See (1))
paired speech.
TP (1)
C. Robot and Ambient-Aided Living Precision =
Ambient Assisted Living (AAL) is a sub-field of Ambient
Intelligence that integrates new technologies and social envi- (T P + FP)
ronments to improve life quality [69]. Vision-based assistive
systems can benefit patients by observing their daily activities. Recall-Sensitivity: the classifier can find all positive samples
As robots become more integrated into our daily lives, the as in (2).
challenge of communicating with them becomes more appar-
ent. Hand gestures play a significant role in natural interaction, Recall = T P (2)
especially when a robot is designed to assist people in daily (T P + FN)
tasks.
Hand gestures can be used in rehabilitation treatment, con- The F1-score is a weighted harmonic mean for recall and
trolling medical equipment, and assisting disabled individu- precision, with the highest value being one and the lowest
als. Approaches for integrating hand gestures with physician- value being 0, as in (3).
computer interfaces have been explored, with Gestix being a
tracking system for hand gesture tracking in the operations recal l x precision (3)
room [70]. F1 score = 2X
IX. DATASETS OF DYNAMIC GESTURE recall + precision
RECOGNITION
The support is the number of times in y-true every class ap-
Gesture recognition involves various datasets, including cate- pears.
gories, scale, annotations type, sensors, and gesture domain. Micro Average: expresses the function for computing the
The Cambridge hand gestures datasets [71], a recent addi- metric by considering all false negatives, true positives, and
tion, consists of 900 RGB sequences from nine classes of false positives (regardless of the dataset’s predictions for each
gestures. Sheffield Kinect Gesture (SKIG) [72], a dynamic class). If you detect a class imbalance, a micro-average is
gesture datasets, has 2160 hand gesture sequences divided preferred (i.e., you might have more samples of one class than
into ten classes. ChaLearn Gesture Challenge offers pop- others might).
ular datasets like the ChaLearn LAP IsoGD, ConGD [73], Macro Average: states the function that calculates the metric
independently for every class, and the average is returned
without taking the percentage of every label in the datasets
into account. (As a result, all classes are treated equally).
Weighted Average: states the function that computes f1 for
each label in the datasets, and the average is returned based
on the proportion in the datasets for every label.