Page 93 - IJEEE-2023-Vol19-ISSUE-1

P. 93

Alkabool, Abdullah, Zadeh, & Mahfooz | 89

problem as a NER token classification problem, Habiby classification. The corpus contains over 25,000 student
formulates the problem as a Q&A problem, which allows papers, all annotated by writing professionals [16]. To ensure
him to use a Q&A model. The Transformer model Habiby that the dataset is as accurate as possible, each article is
chose to fine-tune is Roberta, a BERT-inspired model from annotated using a double-blind scoring procedure and
Facebook. Habiby used a maximum length of 448 characters reviewed by another third-party writing professional [17].
and a stride of 192 for his model and trained his model for 3 The content of this dataset is very good and very useful for
epochs. His F-1 overall is 0.453. training/testing models; however, I believe some changes to
Roman et al. [12] used several machine learning techniques the format of the dataset can be made through data
in their approach to the problem of classification of discourse preprocessing.
elements. The first technique they used was weighted box
fusion, which combines the outputs of 10 different models 2) Data preprocessing
into a single decision. Most of the models used are variants To preprocess the data for this model, I decided to
of the Deberta model and the Longformer model. After reassemble the individual sentence sequences into a joint
obtaining the model results, the team used post-processing, article. In the PERSUADE corpus, articles are divided into
such as fixing range predictions and utterance-specific rules, sequences, each sequence representing a different discourse
to clean up the model's output after making the predictions. element. I believe this is not the best way to optimize
The F-1 total is 0.74, and the model is trained for 5 epochs transformer-based models as this use positional encoding.
on Nvidia's V100 32GB GPU and A100 40GB GPU. Positional encoding is a technique added to the Transformer
In this project, machine learning researcher Ali Habiby [13] architecture because the model is acyclic, which means that
used a random forest model instead of his previous Q&A "Hello World" and "World Hello" sequences look the same
model to solve the discourse element classification problem. in the Transformer model [18]. By adding positional
One advantage of this model is that it is easy to understand encoding to the word embedding, the Transformer model can
and replicate. The train/test split chosen by Habiby for this learn that different word positions in the text have different
model is 70% train and 30% test, and the model has an meanings, and I believe this tool can be used for discourse
overall f-1 value of 0.25. While this model is easy to replicate classification purposes. This is because certain discourse
and understand, I think the model is too simplistic given the elements, such as closing sentences, are highly correlated
low f-1 value to see how the different utterance elements are with their position in the text; merging the sequences before
related to each other. starting fine-tuning gives the model a chance to learn how
Lonnie [14] uses the Keras library to create an LSTM the position of the sequence in the paper is related to its
network that can classify utterance elements in student discourse type.
papers. One notable layer included in the Lonnie model is the
cushion layer of length 1024. This is important because most 3) Three different models (BERT, Longformer, and
other solutions are fine-tuned versions of the BERT model, GPT-2)
however, the BERT model can only hold 512 tokens at a
time. So Lonnie's model is better able to accommodate larger The three models chosen for fine-tuning this document
student papers than most other solutions, but Lonnie still are the BERT, Longformer, and GPT-2 models. I decided to
trains on one sequence of data at a time, which I think refine some models because I wanted to see how different
prevents his model from reaching its full potential. Overall, model architectures address the problem of discourse
the f-1 value of the Lonnie model is 0.214. classification. I was also interested in whether different
Drakuttala [15], a machine learning researcher, fine-tuned models are better at classifying different elements of
the RoBERTa base model by addressing the discourse discourse. We chose the BERT model because it is one of the
element classification problem. One thing that stands out most popular models for NLP tasks. According to the Hug
about Drakuttala's method is that he clearly defined each Face database, the BERT model was downloaded 15.8
element during the model training. Instead of using 7 classes million times by researchers in April 2022, making it the
like most other researchers, he used Claim, Position, Lead second most popular NLP model [19]. I chose to include this
and Counter Claim. Drakuttala organized their data into two model in my own study so that my results could be compared
parts: B and I. Class I, like its name implies, is for words with those of other researchers. Another model that I am
considered part of an entity. Drakuttala used this principle improving is the GPT-2 model. This model is a popular
instead of one Lead class— instead, they created two Lead model, but I included it in the project mainly because of the
classes, B-Lead and I-Lead. Drakuttala achieved a 0.54 f-1 model's design. Unlike his BERT model, which stacked the
score during training on 3 epochs with a 1e-5 learning rate coding layers of the transformer, the GPT-2 architecture
and a 512 token length. stacked the decoding layers of the transformer [20]. In this
post, we want to see if this small design change affects the
II. APPROACH (AND TECHNICAL CORRECTNESS) output of speech classification results. The last model that I
will improve on, and one that I think is the most promising,
1) PERSUADE corpus is the Longformer model. The Longformer model is an
The training and testing data used to fine-tune my extension of the BERT model designed to handle larger input
values without compromising quality [21]. This feature is
model is the PERSUADE corpus, a dataset created by important for my research because data preprocessing
Learning Agency Lab. I chose this dataset because it is produces long input values and most models forget what they
specially designed for the problem of discourse learned at the beginning of the sequence. The longformer

88 89 90 91 92 93 94 95 96 97 98