Page 161 - 2024-Vol20-Issue2
P. 161
157 | Murad & Alasadi
B. Depth not detect gestures based on color and shape, as they are
Depth features calibrated from cameras or depth sensors like performed sequentially rather than cooperatively. Real multi-
Light Detection and Ranging (LiDAR) [32] or Kinect [33] cue methods face data fusion issues, similar to traditional
indicate a human face and nearest objects. However, due to sensor fusion.
coarse-grained and noisy depth, they are often paired with
other image cues like color [34]. F. Moment Invariants
Hu [41] proposed geometric moment invariants, which include
C. Color scale, rotation, and translation invariants based on normalized
Skin color [35] is a crucial image feature for detecting and central moments up to the third order. These invariant global
tracking human hands, but it faces challenges in eliminating features are commonly used in pattern recognition and im-
similar objects like the arm and face. To address this issue, age classification to define objects’ unique shape features.
people often wear long-sleeved shirts, restricting the colors of However, high geometric moment values can cause numerical
other objects. A skin color threshold is a broad range that accu- instabilities and noise sensitivity. Hu’s moments were used to
rately extracts the current skin color in an image. Fine-tuning recognize hand gestures [42, 43].
this threshold is necessary to reduce skin color, as human skin
conditions differ in shade, lighting, and pigmentation. Skin G. Pixel Values
color includes both skin pixels and other skin pixels, and effec- Numerous classification approaches have been developed to
tive skin segmentation is necessary to eliminate false-positive find hands based on texture and appearance in gray-level im-
pixels. Choosing the right color space for skin color models is ages. Mirehi et al [44] examined the appropriateness of vari-
crucial. Popular options include RGB, HSV, normalized RGB, ous classification approaches for recognizing view-independent
YUV, and YCrCb. Chromatically based color components hand postures. Many methods use image samples to train
can enhance lighting changes and separate chromaticity and classifiers to detect hands based on their appearances, with
luminance components [36]. the fundamental presumption that hand appearance varies
Color segmentation can be challenging due to background more between gestures than between people who make the
objects with color distributions similar to human skin. Back- same gesture. However, automatic feature selection remains a
ground subtraction is the best approach but often relies on challenge. Boosting, a machine learning method, has shown
camera motion. Research has investigated dynamic correc- strong hand and face detection results. Boosting involves
tion methods for background models to compensate for these linear grouping of imprecise or weak classifiers to create a
issues [37]. highly precise or robust classifier [45].
D. Shape H. 3D Model
Hand detection in images relies on their distinct shape, col- Using powerful and inexpensive depth sensors, researchers
lected by extracting the contours of the image’s objects [38]. can now recognize 3D scene details, background subtraction,
These contours reflect the hand’s shape and are unaffected by and hand detection [46]. These sensors are more resistant
skin color, viewpoint, or illumination. Edge detection-based to light changes, making them suitable for detecting hands
contour extraction often produces many hands’ edges and un- in images. These methods can detect objects regardless of
related background objects, making complex post-processing their view and require various image features. Kinematic
methods crucial for increasing reliability. Edges are often hand models use line and point features to retrieve angles
combined with background subtraction motion cues and skin formed at hand joints. Hand postures can be estimated if the
color. Local topological descriptors match the model to the correspondences between the 3D model and observed features
image’s edges, while a shape context descriptor, introduced are fine-defined.
by Zou et al [39], distinguishes the position of a specific point
on a shape. The theory behind detection is that corresponding VI. GESTURE RECOGNITION TECHNIQUES
points on two unlike shapes would possess an ideally identical
shape context. Various classifiers, such as SVM and specialized form clas-
sifiers, then process the extraction of features. The primary
E. Multi-cues objective of hand gesture recognition is to understand the
Many systems can merge information from multiple image semantics transmitted by the hand’s location, posture, or ges-
cues to improve accuracy and speed. This involves combining ture [47]. Hand gestures can be classified as static or dynamic
appearance-based hands, body, or face detection with motion- based on vision, with general classifiers detecting static ges-
based regions of interest (ROI) [40]. However, when one tures and Hidden Markov Models processing dynamic ges-
cue is removed, performance suffers. Multi-cue methods do tures. Learning algorithms can be categorized based on their