Page 91 - IJEEE-2023-Vol19-ISSUE-1
P. 91
Received: 22 August 2022 Revised: 07 November 2022 Accepted: 12 November 2022
DOI: 10.37917/ijeee.19.1.11
Vol. 19| Issue 1| June 2023
Ð Open Access
Iraqi Journal for Electrical and Electronic Engineering
Original Article
Identifying Discourse Elements in Writing by
Longformer for NER Token Classification
Alia Salih Alkabool 1, Sukaina Abdul Hussain Abdullah2, Sadiq Mahdi Zadeh2, Hani Mahfooz2
1 University of Basrah, Basrah, Iraq
2 Islamic Azad University, Isfahan, Iran
Correspondence
*Alia Salih Alkabool
University of Basrah, Basrah, Iraq
Email: aliasalihjali@gmail.com
Abstract
Current automatic writing feedback systems cannot distinguish between different discourse elements in students' writing. This
is a problem because, without this ability, the guidance provided by these systems is too general for what students want to
achieve on arrival. This is cause for concern because automated writing feedback systems are a great tool for combating student
writing declines. According to the National Assessment of Educational Progress, less than 30 percent of high school graduates
are gifted writers. If we can improve the automatic writing feedback system, we can improve the quality of student writing and
stop the decline of skilled writers among students. Solutions to this problem have been proposed, the most popular being the
fine-tuning of bidirectional encoder representations from Transformers models that recognize various utterance elements in
student written assignments. However, these methods have their drawbacks. For example, these methods do not compare the
strengths and weaknesses of different models, and these solutions encourage training models over sequences (sentences) rather
than entire articles. In this article, I'm redesigning the Persuasive Essays for Rating, Selecting, and Understanding
Argumentative and Discourse Elements corpus so that models can be trained for the entire article, and I've included
Transformers, the Long Document Transformer's bidirectional encoder representation, and the Generative Improving a pre
trained Transformer 2 model for utterance classification in the context of a named entity recognition token classification
problem. Overall, the bi-directional encoder representation of the Transformers model railway using my sequence-merging
preprocessing method outperforms the standard model by 17% and 41% in overall accuracy. I also found that the Long
Document Transformer model performed the best in utterance classification with an overall f-1 score of 54%. However, the
increase in validation loss from 0.54 to 0.79 indicates that the model is overfitting. Some improvements can still be made due
to model overfittings, such as B. Implementation of early stopping techniques and further examples of rare utterance elements
during training.
KEYWORDS: BERT - Bidirectional Encoder Representations from Transformers, NER - Named Entity Recognition,
Longformer – Long Document Transformer, GPT2 - Generative Pre-Trained Transformer 2, NLP - Natural Language
Processing, GSU - Georgia State University
I. INTRODUCTION communities where proficient writing rates are less than 15%
[2]. As researchers at Georgia State University have pointed
1) The importance of writing out, this problem is primarily due to many schools, especially
Having the ability to write clearly and concisely is a key those in low-income communities, not having the resources
skill for all careers. Individuals who are able to express their to provide personalized feedback on students' writing [3].
thoughts and ideas have an advantage when writing business Fortunately, one of the problems can be resolved by
emails, proposals, or opposing or supporting new policies. automatically writing feedback. Automatic writing feedback
The Source Expert website notes in their article 43 Why systems are programs that can analyze and critique students'
Writing Matters to Students: "There are a variety of ways to writing while the teacher is away. These programs are
communicate with others, but writing will always be part of already popular in many applications, such as Microsoft
your daily life." [1]. Although writing is an important human Outlook's Autosuggest and Grammarly. In fact, Trey from
skill, many students lack writing skills. The National the website “apoven”, at how a writing feedback system like
Assessment of Educational Progress found that less than Grammarly can be used to expand one's vocabulary and
30% of high school graduates are proficient writers. They provide them with instant mini grammar lessons [4]. In
also showed that this problem is more acute in low-income
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and
reproduction in any medium, provided the original work is properly cited.
© 2023 The Authors. Published by Iraqi Journal for Electrical and Electronic Engineering by College of Engineering, University of Basrah.
https://doi.org/10.37917/ijeee.19.1.11 https://www.ijeee.edu.iq 87