Page 92 - IJEEE-2023-Vol19-ISSUE-1
P. 92
88 | Alkabool, Abdullah, Zadeh, & Mahfooz
response, many agencies have taken steps to improve our the full paper can be provided to the model during training.
current automated feedback system. In this post, I hope to demonstrate that Transformer-based
models should be trained concurrently throughout the post to
2) The current machine learning approach take full advantage of their architecture. I also hope to
Two institutions, GSU and the Learning Institutions demonstrate that it is useful for machine learning researchers
Laboratory, have investigated a machine learning-based to evaluate models other than BERT (Bidirectional Encoder
approach to improving automated feedback systems. They Representations for Transformers) models, and I hope to
argue that machine learning models can be trained to demonstrate that Long Document Transformer (Longformer)
accurately classify discourse elements in written works. This [8] -Model is better in the following cases The model comes
model can then be added to an existing feedback system to to discourse classification.
help the system provide better and more constructive
feedback to students. The Learning Institution Lab took the 5) Outline
first steps towards creating this model by creating the corpus To justify my approach, I first turn to other projects
the Persuasive Essays for Rating, Selecting, and focusing on discourse classification. I then describe in more
Understanding Argumentative and Discourse Elements detail my approach to the problem of utterance classification
(PERSUADE). The PERSUADE corpus is a collection of and what I have done to implement a transformer-based
over 25,000 argumentative papers collected from students in model bidirectional encoder representation slave transformer
grades 6 to 12 [5]. All articles are annotated by professional (BERT), long document transformer (Longformer), and
English teachers in order to understand the different elements generative pretrained transformer take Step-2 (GPT-2).
of discourse. After creating this dataset, machine learning Afterwards, I will review some of my promising findings and
researchers have the basic facts they need to start training explain their implications for discourse classification tasks.
models. In particular, they hope to optimize existing natural Finally, I'll cover some improvements that can be made for
language processing (NLP) models for discourse future fine-tuning attempts.
classification tasks, focusing on Google's Bidirectional 6) Related Works
Encoder Representation (BERT) model from Transformers. Researchers Burstein et al. [7] attempted to use a Bayesian
I agree with the current approach to fine-tuning this model classifier to identify thesis statements in student written
for discourse classification; however, I believe some steps work. Their model was able to achieve an average overall
are required to make these models more accurate. accuracy of 43%, but most importantly, they were able to
3) The Discourse Elements show that the classification of propositional statements was
This list of discourse elements has been compiled by a generalizable. That is, the model does not need to be
team of teachers and professional writers from The Learning retrained for each new paper prompt, and once trained, the
Agency Lab [6]. They believed that this list contained all the model can recognize paper statements in all paper topics.
important discourse elements that make up the students' One drawback, however, is that the training set for the
writing, and they used this list as a template for creating the Bayesian classifier is small, with only 100 articles, and the
PERSUADE corpus. I will use the same rules when authors admit that their model can hold. Another model to
improving my own models: mention is the Longformer model modified by programmer
• Introduction - an introduction that begins with statistics, Darek Kleczek [8]. Kleczek solves this problem by
citations, descriptions, or other means of grabbing the optimizing a pre-existing longformer model on the hug face
reader's attention and pointing to the paper website [9], which was originally trained by machine
• Position - opinion or conclusion on the main issue learning engineers at allenai. By fine-tuning the Longformer
• Statement - a statement supporting the position model, Kleczek was able to achieve an accuracy of 61.4
• Counterclaim - an allegation that refutes another allegation Taboada et al. [9] Enters the history of Rhetorical Structure
or justifies the position to the contrary Theory (RST) and its advantages today. They found that RST
• Rebuttal - rebutting the assertion of the counterclaim can be used for a variety of applications (including discourse
• Evidence - an opinion or example to support a claim, classification) and is a "robust and well-tested theory". Most
counterclaim or refutation. importantly, they found relationships between various
• Closing Statements - Closing Statements Reaffirming elements of utterances that we hope our model will capture.
Statements. Instead of trying to specifically define the relationship
4) My approach & potential outcomes between the models or create a working machine learning
As with current machine learning approaches, I believe model, the researchers leave it as an open problem for others
that transformation-based models such as bidirectional to solve.
encoder representations from Transformers (BERT) [7] can The machine learning researcher Julian Peller [10] addresses
be fine-tuned to successfully address discourse classification the problem of classifying utterance elements by improving
problems. However, in this article, I also want to examine Google's BERT model. He approaches the problem as a
other transformer models and compare/contrast the different tokenized classification problem using Named Entity
results. In addition, further improvements can be made to the Recognition (NER), where articles are lists of tokens and
corpus of Persuasive Essays for Rating, Selecting, and utterance elements are distinct classes. He also trained on
Understanding Argumentative and Discourse Elements 10,000 articles from the PERSUADE corpus, and he
(PERSUADE). Current corpora divide articles into achieved an overall accuracy of 0.226 on the F-1 score.
sequences, each sequence corresponding to a different type Ali Habiby [11] tackled the problem of classifying utterance
of utterance. However, I will restructure the dataset so that elements in a rather unique way. Instead of defining the