Page 64 - IJEEE-2022-Vol18-ISSUE-1
P. 64
60 | Hussein & Ali
in order to ensure that the desired information will be containing 72 satellite images of Dubai were initially used to
obtainable. test the efficiency of the proposed approach U-Net for
? Feature extraction: Image features at different resolutions semantic segmentation, and the results were compared to
are obtained from the same image data. These landmarks CNN. In this field, effective outcomes were obtained, as
are classified into :Global features such as color and demonstrated by CNN. The architecture consists of:
shape. 1. A feature map encoder that shortens the input image.
It is possible to get more complex features related to colors 2. A decoder that uses deconvolutional layers learning to
and shapes in the image.
enlarge the feature map to the size of the input image.
A. Segmentation Architecture
Image segmentation is an important stage of digital image 3. The U-Net architecture's key contribution is the creation
processing, which is the process of the segment images of an
area (image processing) interconnected and homogeneous of shortcut connections. We noticed in FCN that when
regions according to a specific criterion for each color. The we down sample a picture as part of the encoder, we lose
union of these regions should result in a reconstruction of the so, it captures finer information whilst also keeping the
original image. Slicing is an important stage that allows computation at low.
extracting qualitative information about the image, as it
provides a high-level description, as each region is linked to Fig. 3: U-Net architecture.
its neighboring regions within a network of nodes in which
each node represents a region in the image. This node carries The architecture contains two paths: First: encoder
a card containing qualitative information about the region (contraction path) which is used to capture the context of the
such as its size, color, shape, Orientation, and the brackets image. The encoder is just a traditional stack of
that connect the nodes can be marked with information about convolutional and max pooling layers [21][22]. Second:
the relationship between adjacent areas, such as for example, decoder (symmetric expanding path) which is used to enable
an area whose content is in another, or it is below or above it, precise localization using transposed convolutions.
and so on. The level of complexity in network configuration
varies depending on the slicing technique used [17]. B. Loss Function for Segmentation
B. Fully convolutional networks (FCN) It is the effect of the loss function on the hash output
results. Where three different loss functions are used in the
The FCN, a variant of the CNN, was one of the most training procedure. So let's say p is the output value of each
significant improvements in the process of image pixel in the image. Then we define the studied loss functions
segmentation [19]. The FCN differs from a traditional CNN in this case as follows:
in that the completely linked layers at the end of the CNN are
converted into convolution layers. This results in a network ? Cross-Entropy Loss
that computes a nonlinear filter for each layer's output
vectors. As a result, the completed network can function on Loss of the log where it calculates the logarithmic value of
inputs of any size and produce outputs with the same spatial the output, i.e. for each pixel in the output tensor (and
dimensions. The classification network may now generate a because we are talking about images).The term alpha is a
heat map of the selected item class. Adding layers and a measure of overweight for different classes and is a means of
spatial loss to the network results in an efficient machine for balancing the loss for unbalanced classes. In Equation 1 we
end-to-end dense learning[20]. show the final weighted entropy loss equation.
Fig. 2: Convolutional Encoder-Decoder for U-Net Architecture. CE = -ac. log y^i (1)
IV. PROPOSED METHOD ? Focal Loss
A. U-Net architecture Focal loss is the best solution to the problem of unbalanced
The U-Net was developed by Ronneberger et al. [4].
Training this network relies on data intensively to use the data set. Where he adds another label to reduce the impact of
suggested images efficiently. Contributions aerial images correct predictions and focus on incorrect examples. Gamma
is a hyper parameter that determines how strong this
reduction is. This loss affects network training on the
unbalanced data set and can improve segmentation results.