Page 64 - IJEEE-2022-Vol18-ISSUE-1
P. 64

60 |                                                                                                                     Hussein & Ali

    in order to ensure that the desired information will be       containing 72 satellite images of Dubai were initially used to
    obtainable.                                                   test the efficiency of the proposed approach U-Net for
? Feature extraction: Image features at different resolutions     semantic segmentation, and the results were compared to
    are obtained from the same image data. These landmarks        CNN. In this field, effective outcomes were obtained, as
    are classified into :Global features such as color and        demonstrated by CNN. The architecture consists of:
    shape.                                                        1. A feature map encoder that shortens the input image.
It is possible to get more complex features related to colors     2. A decoder that uses deconvolutional layers learning to
and shapes in the image.
                                                                       enlarge the feature map to the size of the input image.
  A. Segmentation Architecture
   Image segmentation is an important stage of digital image      3. The U-Net architecture's key contribution is the creation
processing, which is the process of the segment images of an
area (image processing) interconnected and homogeneous                 of shortcut connections. We noticed in FCN that when
regions according to a specific criterion for each color. The          we down sample a picture as part of the encoder, we lose
union of these regions should result in a reconstruction of the        so, it captures finer information whilst also keeping the
original image. Slicing is an important stage that allows              computation at low.
extracting qualitative information about the image, as it
provides a high-level description, as each region is linked to                      Fig. 3: U-Net architecture.
its neighboring regions within a network of nodes in which
each node represents a region in the image. This node carries        The architecture contains two paths: First: encoder
a card containing qualitative information about the region        (contraction path) which is used to capture the context of the
such as its size, color, shape, Orientation, and the brackets     image. The encoder is just a traditional stack of
that connect the nodes can be marked with information about       convolutional and max pooling layers [21][22]. Second:
the relationship between adjacent areas, such as for example,     decoder (symmetric expanding path) which is used to enable
an area whose content is in another, or it is below or above it,  precise localization using transposed convolutions.
and so on. The level of complexity in network configuration
varies depending on the slicing technique used [17].                B. Loss Function for Segmentation

  B. Fully convolutional networks (FCN)                              It is the effect of the loss function on the hash output
                                                                  results. Where three different loss functions are used in the
     The FCN, a variant of the CNN, was one of the most           training procedure. So let's say p is the output value of each
significant improvements in the process of image                  pixel in the image. Then we define the studied loss functions
segmentation [19]. The FCN differs from a traditional CNN         in this case as follows:
in that the completely linked layers at the end of the CNN are
converted into convolution layers. This results in a network        ? Cross-Entropy Loss
that computes a nonlinear filter for each layer's output
vectors. As a result, the completed network can function on       Loss of the log where it calculates the logarithmic value of
inputs of any size and produce outputs with the same spatial      the output, i.e. for each pixel in the output tensor (and
dimensions. The classification network may now generate a         because we are talking about images).The term alpha is a
heat map of the selected item class. Adding layers and a          measure of overweight for different classes and is a means of
spatial loss to the network results in an efficient machine for   balancing the loss for unbalanced classes. In Equation 1 we
end-to-end dense learning[20].                                    show the final weighted entropy loss equation.

  Fig. 2: Convolutional Encoder-Decoder for U-Net Architecture.                 CE = -ac. log y^i  (1)

                     IV. PROPOSED METHOD                          ? Focal Loss

  A. U-Net architecture                                           Focal loss is the best solution to the problem of unbalanced
   The U-Net was developed by Ronneberger et al. [4].
Training this network relies on data intensively to use the       data set. Where he adds another label to reduce the impact of
suggested images efficiently. Contributions aerial images         correct predictions and focus on incorrect examples. Gamma
                                                                  is a hyper parameter that determines how strong this

                                                                  reduction is. This loss affects network training on the

                                                                  unbalanced data set and can improve segmentation results.
   59   60   61   62   63   64   65   66   67   68   69