Page 160 - 2024-Vol20-Issue2
P. 160
156 | Murad & Alasadi
TABLE I. B. 3D Model-based Approach
A CONCISE COMPARISON OF HAND GESTURE The hand’s form is modeled and examined using the 3D model
description [28], including the kinematic parameters required
RECOGNITION APPROACHES to project the three-dimensional model into a two-dimensional
one. However, this method can lead to a loss of features. Var-
Approach Pros Cons ious methods include volumetric, geometrical, and skeleton
Vision- Natural interac- Lighting variations, com- types [29]. Volumetric models deal with the three-dimensional
based tion, low cost plex backgrounds, clutter, and visual appearance of the human hand, while skeleton
time and computation re- methods limit the set of parameters needed to form the hand’s
Glove- Accurate com- quirements shape. Geometrical models mimic the efficiency of visual
based putation of Expensive, difficult to images but require several parameters and a time-consuming
hand configura- connect to computers, un- process. Geometric shapes, such as mesh polygons and card-
Colored- tion suitable for virtual reality board models, can also be used to approach visual forms.
markers Simple and in- However, this method has several disadvantages [30], includ-
expensive Limited natural interac- ing the need for initial parameters to be near the solution for
tion each image, noise in the imaging phase, difficulty in extract-
ing features, and incapability to deal with singularities caused
ity environments [22]. This data-obtaining technique is often by ambiguous views. In general, appearance-based detection
used in sign language [23] and Gaming [24]. Moore’s Law in real-time is better than 3D-model-based methods, but it can
predicts sensor size and affordability will increase in the fu- represent a wide range of hand gestures.
ture. Data gloves, including MIT, CyberGlove II, CyberGlove
III, Fifth Dimension Sensor Glove Ultra, P5, and X-IST, are V. FEATURE EXTRACTION
expected to become more prevalent.
Feature extraction techniques collect data on gestures’ po-
C. Colored-Markers Approaches sition, direction, posture, and temporal progression. They
The use of colored gloves for hand tracking and locating process and analyze low-level data (pixel values) to produce
fingers and palms has been developed using markers and wool high-level data like object contours. A manual algorithm is
gloves [25]. These gloves extract geometric characteristics created to recognize and encode specific image features, such
to outline the hand’s shape. However, natural interaction as texture, shape, and color. When an image is a collection
between humans and computers remains insufficient, despite of pixels, a manually defined algorithm is applied to obtain a
sensor or data glove advancements. feature vector that describes the image’s contents. The result-
Table I shows a concise table of hand gesture recognition ing feature vectors are used as inputs for machine-learning
approaches. models. Feature extraction reduces data dimensionality by
encoding relevant information into a compressed representa-
IV. HAND GESTURE RECOGNITION tion and eliminating less discriminative data. The efficiency
APPROACHES of gesture recognition relies heavily on feature extraction,
making the most critical design decisions in hand motion, and
Techniques for gesture hand recognition rely on the bare hand gesture recognition is deciding which features to work with
and extract data, offering properties like simplicity and ease and how to extract them. The following subsections discuss
[26]. They provide direct connections with computers and popular feature types and computation techniques, including
can be categorized into two types. shape and motion features.
A. Appearance-Based Approach A. Motion
Appearance-based approaches design by extracting features Using frame-to-frame comparisons, researchers use motion
from input hand images and comparing them to stored fea- cues to detect hands, which are computationally efficient meth-
tures [27]. This method is simpler and easier than three- ods for finding foreground objects and detecting their motion
dimensional models but can be affected by lighting and back- and position. This method relies on assumptions like a static
ground objects. These patterns are part of a general pattern background, image pre-processing, and a stable camera. Binh
recognition problem consisting of three tasks: extracting fea- et al [31] used the Kalman filter to predict the hand’s position
tures, classifying from labeled training samples, and classify- in one frame based on its observed position in the previous
ing unknown samples. frame. Modern approaches combine motion information with
other visual cues to enhance detection.