Page 161 - 2024-Vol20-Issue2
P. 161

157 |                                                              Murad & Alasadi

B. Depth                                                           not detect gestures based on color and shape, as they are
Depth features calibrated from cameras or depth sensors like       performed sequentially rather than cooperatively. Real multi-
Light Detection and Ranging (LiDAR) [32] or Kinect [33]            cue methods face data fusion issues, similar to traditional
indicate a human face and nearest objects. However, due to         sensor fusion.
coarse-grained and noisy depth, they are often paired with
other image cues like color [34].                                  F. Moment Invariants
                                                                   Hu [41] proposed geometric moment invariants, which include
C. Color                                                           scale, rotation, and translation invariants based on normalized
Skin color [35] is a crucial image feature for detecting and       central moments up to the third order. These invariant global
tracking human hands, but it faces challenges in eliminating       features are commonly used in pattern recognition and im-
similar objects like the arm and face. To address this issue,      age classification to define objects’ unique shape features.
people often wear long-sleeved shirts, restricting the colors of   However, high geometric moment values can cause numerical
other objects. A skin color threshold is a broad range that accu-  instabilities and noise sensitivity. Hu’s moments were used to
rately extracts the current skin color in an image. Fine-tuning    recognize hand gestures [42, 43].
this threshold is necessary to reduce skin color, as human skin
conditions differ in shade, lighting, and pigmentation. Skin       G. Pixel Values
color includes both skin pixels and other skin pixels, and effec-  Numerous classification approaches have been developed to
tive skin segmentation is necessary to eliminate false-positive    find hands based on texture and appearance in gray-level im-
pixels. Choosing the right color space for skin color models is    ages. Mirehi et al [44] examined the appropriateness of vari-
crucial. Popular options include RGB, HSV, normalized RGB,         ous classification approaches for recognizing view-independent
YUV, and YCrCb. Chromatically based color components               hand postures. Many methods use image samples to train
can enhance lighting changes and separate chromaticity and         classifiers to detect hands based on their appearances, with
luminance components [36].                                         the fundamental presumption that hand appearance varies
Color segmentation can be challenging due to background            more between gestures than between people who make the
objects with color distributions similar to human skin. Back-      same gesture. However, automatic feature selection remains a
ground subtraction is the best approach but often relies on        challenge. Boosting, a machine learning method, has shown
camera motion. Research has investigated dynamic correc-           strong hand and face detection results. Boosting involves
tion methods for background models to compensate for these         linear grouping of imprecise or weak classifiers to create a
issues [37].                                                       highly precise or robust classifier [45].

D. Shape                                                           H. 3D Model
Hand detection in images relies on their distinct shape, col-      Using powerful and inexpensive depth sensors, researchers
lected by extracting the contours of the image’s objects [38].     can now recognize 3D scene details, background subtraction,
These contours reflect the hand’s shape and are unaffected by      and hand detection [46]. These sensors are more resistant
skin color, viewpoint, or illumination. Edge detection-based       to light changes, making them suitable for detecting hands
contour extraction often produces many hands’ edges and un-        in images. These methods can detect objects regardless of
related background objects, making complex post-processing         their view and require various image features. Kinematic
methods crucial for increasing reliability. Edges are often        hand models use line and point features to retrieve angles
combined with background subtraction motion cues and skin          formed at hand joints. Hand postures can be estimated if the
color. Local topological descriptors match the model to the        correspondences between the 3D model and observed features
image’s edges, while a shape context descriptor, introduced        are fine-defined.
by Zou et al [39], distinguishes the position of a specific point
on a shape. The theory behind detection is that corresponding       VI. GESTURE RECOGNITION TECHNIQUES
points on two unlike shapes would possess an ideally identical
shape context.                                                     Various classifiers, such as SVM and specialized form clas-
                                                                   sifiers, then process the extraction of features. The primary
E. Multi-cues                                                      objective of hand gesture recognition is to understand the
Many systems can merge information from multiple image             semantics transmitted by the hand’s location, posture, or ges-
cues to improve accuracy and speed. This involves combining        ture [47]. Hand gestures can be classified as static or dynamic
appearance-based hands, body, or face detection with motion-       based on vision, with general classifiers detecting static ges-
based regions of interest (ROI) [40]. However, when one            tures and Hidden Markov Models processing dynamic ges-
cue is removed, performance suffers. Multi-cue methods do          tures. Learning algorithms can be categorized based on their
   156   157   158   159   160   161   162   163   164   165   166