Sep 18, 2025●MIT License

Bridging the Stylistic Gap An Analysis and Model for Humanizing AI-Generated Text- Anusara Sugeeshwa

bert
deep learning
fasttext
NLP
transfer learning
transformers

k
Anusara

Bridging the Stylistic Gap: An Analysis and Model for Humanizing AI-Generated Text

Anusara Sugeeshwara
Department of Computer & Data Science
NSBM Green University, Homagama, Sri Lanka
kasrsugeeshwara@gmail.com

Abstract

The formal and patterned nature of AI-generated text often renders it detectable and unsuitable for contexts requiring a natural, human voice. This research presents the development and evaluation of a novel AI-to-Human Text Converter, a system designed to address this limitation by transforming AI-generated content into a more natural and human-like format.

The work is grounded in a comprehensive stylistic analysis of a self-curated parallel corpus, comparing original human-written sentences to their AI-paraphrased counterparts. Employing dependency parsing, stylometry, and document clustering, the analysis first quantifies the key differences between the two text types. Findings reveal that AI rewriting introduces significant syntactic formalization, reduces punctuation variety, and, most critically, alters the contextual meaning in over 70% of sentences, as evidenced by their migration to different semantic clusters.

Informed by these insights, the converter was implemented using a modular pipeline. It leverages a combination of transformer-based models (RoBERTa for context-aware candidate generation and SentenceBERT for semantic similarity scoring) to identify and replace AI-characteristic phrases with human-preferred alternatives while preserving the original meaning.

The model's efficacy is evaluated through AI detection algorithms and sentiment analysis. Results indicate that the converter successfully reduces AI detection scores by an average of 60-70% across evaluated samples, as measured by GPTZero, and minimizes the divergence in emotional tone between original and converted text. This study demonstrates a viable approach to mitigating the stylistic artifacts of AI-generated content, enhancing its utility for publication and broader application. The project highlights the importance of stylistic analysis in informing NLP model design and underscores the potential for humanizing AI output to foster more seamless human-AI collaboration.

Keywords

AI Text Humanization
Style Transfer
Natural Language Generation
Computational Stylistics
Semantic Preservation
AI Detection Mitigation

II. Introduction

The rapid advancement of Large Language Models (LLMs) has democratized content creation [6], enabling the generation of coherent and contextually relevant text on an unprecedented scale. This capability offers immense potential to support experts across fields from academia to industry who possess valuable knowledge but may lack the time or refined writing skills to effectively publish their insights. However, a significant barrier impedes the seamless adoption of AI-generated content: its often formal, patterned, and structurally identifiable nature. This "AI-ness" not only hinders readability and audience engagement but also makes the text easily detectable by AI classifiers, potentially leading to issues of authenticity and trust in many professional and academic contexts.

The core problem, therefore, transcends mere text generation; it lies in the stylistic gap between machine-generated output and human-authored prose. Human writing is characterized by its syntactic diversity, nuanced emotional tone, and idiosyncratic stylistic choices. In contrast, AI-generated text frequently exhibits lexical repetition, predictable sentence structures, and a formality that lacks the fluidity and subtlety of human communication [2]. This divergence is not merely aesthetic; preliminary analysis conducted for this study revealed that automated rewriting of human text can alter named entities in over 36% of sentences and, more critically, shift the fundamental contextual meaning in over 70% of sentences, as measured by semantic document clustering. This demonstrates that standard AI paraphrasing can unintentionally distort the original message it seeks to convey.

While substantial research effort has been invested in developing robust AI text detectors like GPTZero [3], the inverse problem—developing models to humanize AI text—has received comparatively less attention. Current approaches often focus on simple paraphrasing or style transfer in broad strokes, without a foundational analysis of the specific linguistic features that make text appear "AI-generated" in the first place [4]. A targeted solution is required; one that is informed by a granular understanding of this stylistic gap and capable of performing precise, context-aware transformations to enhance authenticity.

To address this challenge, this paper presents a comprehensive methodology for the analysis and humanization of AI-generated text. Our work makes a twofold contribution:

A Diagnostic Analysis: We first quantify the stylistic and semantic differences between human and AI text through a multi-faceted analysis employing dependency parsing, stylometry, and clustering techniques on a curated parallel dataset.
A Novel Converter Model: Informed by these diagnostics, we then design, implement, and evaluate a dedicated AI-to-Human converter. This model leverages a pipeline of transformer-based networks (RoBERTa and SentenceBERT) to identify AI-characteristic patterns [8].

Screenshot 2025-09-01 110746.png
Figure 1. Analysis for AI-written and human-written text

III. Methodology

This research was conducted in two sequential phases: (1) a diagnostic analysis to quantitatively identify the stylistic and semantic characteristics distinguishing AI-generated text from human-written text, and (2) the design and implementation of an AI-to-Human converter informed by the findings of the first phase. This approach ensured that the converter's design was directly targeted at addressing the most salient and impactful differences identified.

A. Diagnostic Analysis: Maintaining the Integrity of the Specifications

The goal of this phase was to move beyond subjective impressions and pinpoint the exact stylistic and semantic differences between human and AI text using a data-driven approach.

1) Dataset Curation

A parallel corpus was essential for a controlled comparison. The curation process involved:

Screenshot 2025-09-01 110419.png
Figure 2. Data Set Curation Architecture

Source Collection: Articles published before 2019 were scraped from online platforms. This date threshold was critical to guarantee the content was authored by humans prior to the widespread use of modern LLMs.
Filtering and Processing: Articles were cleaned of HTML markup, ads, and other irrelevant content. The final corpus consisted of 82 verified human-written articles.
AI Rewriting: Each article was split into its constituent sentences. These individual sentences were then fed into the Ollama framework running the Mistral 7B model using a standard paraphrase prompt (e.g., "Rewrite the following sentence:"). This process generated a parallel AI sentence for each original human sentence, resulting in a final dataset of 2,548 aligned sentence pairs.

Using SpaCy’s en_core_web_sm model, the syntactic structure of sentences from both corpora was parsed into directed graphs. This analysis revealed that AI-generated sentences consistently produced simpler, more hierarchical, and predictable dependency trees, lacking the complex and nuanced grammatical relationships often found in human writing [2].

Stylometric Analysis

Key quantitative features were extracted for each sentence:

Sentence Length: Number of words per sentence.
Average Word Length: Average number of characters per word in a sentence.
Lexical Diversity: Ratio of unique words to total words in a sentence.
Punctuation Density: Count of punctuation marks per sentence.

The divergence for each feature was calculated as the absolute difference in their mean values between the human and AI corpora, normalized by the pooled standard deviation to facilitate comparison. This analysis confirmed statistically significant differences, particularly showing AI text had lower punctuation density and less variation in sentence length.

Document Clustering for Semantic Analysis

To assess whether AI rewriting altered the core meaning of sentences, a semantic clustering experiment was designed.

Embedding Generation: Sentence embeddings were created using a Word2Vec model trained on the corpus, representing each sentence as a fixed-length vector in a semantic space.
Clustering: The K-Means algorithm (K=6, matching the number of source articles) was applied to cluster these embeddings based on semantic similarity.

B. AI-to-Human Converter Architecture

Informed by the diagnostic results from Phase 1, the AI-to-Human converter was designed as a targeted solution to address the key issues of formality, stylistic rigidity, and semantic drift. The system operates through a structured pipeline where each module addresses a specific weakness identified in AI-generated text.

1) System Overview and Design Principles

The converter is built on two core design principles derived from the Phase 1 analysis:

Targeted Lexical-Syntactic Replacement: To counter AI formality, the system must identify and replace overly formal, repetitive, or "robotic" words and phrases with more natural, human-preferred alternatives.
Context-Aware Semantic Preservation: To mitigate the high semantic drift observed, any replacement must be rigorously evaluated to ensure it preserves the original sentence's meaning and context.

The architecture, depicted in Figure 3, implements these principles through a sequential process.

Screenshot 2025-07-26 065313.png

Figure 3. Architecture for AI text to Human text converter

2) Text Preprocessing

Tokenization: Segmenting the text into individual words and punctuation.
Part-of-Speech (POS) Tagging: Labeling each token with its grammatical role (e.g., noun, verb, adjective).
Dependency Parsing: Identifying the grammatical relationships between tokens. This structured representation of the text is crucial for the subsequent module to intelligently select which words to target for replacement.

Candidate Generation

This is the core replacement mechanism. For each target word or phrase identified by the preprocessing module, a set of candidate replacements is generated.

Target Identification: The system prioritizes words that are strong indicators of AI-style writing, such as overly formal vocabulary (e.g., "utilize" → "use"), repetitive sentence starters, or weak modifiers.
Candidate Suggestion: Two primary methods are employed to generate human-like alternatives:
- Contextual Prediction via RoBERTa: The target word is masked, and the RoBERTa masked language model (MLM) is used to predict the most likely fillers based on the surrounding context. This provides contextually appropriate synonyms.
- Semantic Retrieval via FastText: Pre-trained FastText word embeddings are used to retrieve the top-N most semantically similar words to the target, expanding the list of potential candidates.

3) Similarity Scoring and Selection

Contextual Embedding: The original sentence is encoded into a high-dimensional vector ( v_{orig} ) using SentenceBERT, which generates robust, context-aware sentence embeddings.
Candidate Evaluation: For each candidate replacement, a new sentence ( s_{candidate} ) is constructed. This new sentence is also encoded into a vector ( v_{candidate} ) using the same SentenceBERT model.
Selection Criterion: The cosine similarity between the original and candidate sentence vectors is computed. The candidate that yields the highest similarity score is selected, as it maximizes the preservation of the original meaning. The cosine similarity is calculated as:

[
\cos(\theta) = \frac{v_{orig} \cdot v_{candidate}}{||v_{orig}|| \cdot ||v_{candidate}||}
]

This ensures the chosen replacement alters the style without altering the fundamental semantic context, directly addressing the high semantic drift identified in Phase 1.

IV. Results and Discussion

A. Evaluation of Conversion Effectiveness with GPTZero

The primary objective of the converter is to reduce the detectable "AI-ness" of text, thereby making it more likely to be classified as human-written [3]. To evaluate this performance quantitatively, the converted outputs were analyzed using GPTZero, a standard benchmark tool for AI text detection.

A comprehensive set of results, including five full examples of original human text, their AI-generated conversions, and the corresponding GPTZero scores, is provided in Appendix A. The key findings from this analysis are summarized as follows:

Original Human Text	AI Converted Text	GPTZero Score
Sample 1	Output 1	4%
Sample 2	Output 2	9%
Sample 3	Output 3	13%
Sample 4	Output 4	23%
Sample 5	Output 5	10%

Table 1: Evaluation Results for the converter

The GPTZero scores for the converter's output consistently showed a low probability of being AI-generated (ranging from 4% to 23%).
In most cases, GPTZero reported "moderate confidence" that the text was entirely human-written.
This represents a significant reduction in AI detection metrics compared to the raw AI-generated text, demonstrating the converter's effectiveness in masking the structured, formalized expressions typical of AI-generated language.

B. Evaluation of Sentiment Preservation

As identified in the diagnostic phase, a significant difference in emotional tone (sentiment) often exists between human and AI text. A secondary goal of the converter was to minimize this divergence, thereby preserving the intended emotional impact of the original message.

Sentiment analysis was performed on the original human sentences, their AI-generated versions, and the final humanized outputs. The results indicate that the conversion process successfully reduced the sentiment score divergence between the AI output and the original human text.

Screenshot 2025-09-01 112339.png
Figure 4: Sentimental Evaluation for the converter results

While the mitigation was not perfect, the trend shows a meaningful correction towards the original human sentiment, moving the AI-generated text closer to a natural, human-like emotional tone. This suggests that the converter not only alters style but also makes positive strides in preserving the nuanced emotional content that is often lost in AI paraphrasing.

References

[1] E. Tian and A. Cui, "Detecting AI-Generated Writing Using GPTZero," ISCAP Conference, 2024. [Online]. Available: https://iscap.us/proceedings/2024/pdf/6184.pdf

[2] J. Muñoz-Ortiz et al., "Stylometric Analysis and Detection of Large Language Models Text," arXiv preprint arXiv:2505.23276, 2025. [Online]. Available: https://arxiv.org/pdf/2505.23276.pdf

[3] "GPTZero Technology - AI Detection," GPTZero, Oct. 2024. [Online]. Available: https://gptzero.me/technology

[4] S. Benedetto, "Author Guidelines for Artificial Intelligence (AI)-Generated Text," IEEE, Apr. 2024. [Online]. Available: https://open.ieee.org/author-guidelines-for-artificial-intelligence-ai-generated-text/

[5] C. Opara, "StyloAI: Distinguishing AI-Generated Content with Stylometric Analysis," arXiv preprint arXiv:2405.10129, May 2024. [Online]. Available: https://arxiv.org/abs/2405.10129

[6] S. Benedetto, "Civilizing and Humanizing Artificial Intelligence in the Age of Large Language Models," IEEE Computer, vol. 57, no. 5, pp. 32-41, May 2024. [Online]. Available: https://www.computer.org/csdl/magazine/ic/2024/05/10779592/

[7] "Stylometric AI: How It Detects Writing You Didn't Author," aicompetence.org, June 2025. [Online]. Available: https://aicompetence.org/stylometric-ai-how-it-detects-writing-you-didnt-author/

[8] AI Humanizer tools and platforms such as Scribbr’s AI Humanizer (2025). [Online]. Available: https://www.scribbr.com/ai-humanizer/

Appendix A: Full Converter Input and Output Examples

This appendix provides the complete text samples evaluated in this study. For each, the original human-written text, the AI-generated version, and the humanized output from the converter are presented alongside their GPTZero evaluation scores.

Sample 1

Original Human Text: In the rapidly evolving field of Natural Language Processing (NLP), the need for AI to produce more humanlike content has become increasingly important. While AI-generated content can be efficient and scalable, it often lacks the nuanced emotional depth that human-written content possesses. This project aims to address this gap by conducting a sentiment analysis to differentiate between AI-generated and human-written texts.

AI Converted Text (Output 1): In the rapidly developing field of Natural Language Processing (NLP), the requirement for AI to produce more human-like information has got increasingly crucial. While AI-created content can be efficient and scalable, it usually lack the nuanced emotional depth which human-authored content possesses. This project intends to address this gap by conducting a sentiment analysis to distinguish between AI-generated and man-authored text.
GPTZero Score: 4%

Sample 2

Original Human Text: The "Circle of Life" is a concept that describes the interconnectedness and cyclical nature of life and death in the natural world. Popularized by the song from Disney's "The Lion King," it emphasizes that every living thing has its role and purpose. Birth, growth, reproduction, and death are all part of this cycle, with each stage contributing to the ecosystem's balance. For example, plants grow, animals eat them, and eventually, both return to the earth, nourishing new growth. This continuous cycle ensures the sustainability and harmony of all living organisms.

AI Converted Text (Output 2): The "Circle of Life" is a theory which describes the interconnectedness and cyclical nature of life and death in the natural world. Popularized by the song from Disney's "The Lion King," it underscores that every living thing has its purpose and function. Birth, development, reproduction, and destruction are all part of this cycle, with each phase added to the ecosystem's balance. For example, trees thrive, pests eat them, and inevitably, both fall to the ground, nourishing new growth. This continuous cycle guarantees the sustainability and harmony of all live animals.
GPTZero Score: 9%

Sample 3

Original Human Text: AI content writing holds significant importance for students due to its multifaceted benefits. Firstly, it serves as a valuable educational support tool by generating explanations, summaries, and practice questions, which can enhance students' understanding of complex topics. Additionally, AI can automate tasks such as note-taking and drafting, allowing students to allocate more time to critical thinking and analysis. This not only improves efficiency but also deepens their comprehension of the material. Furthermore, AI has the capability to create personalized learning content tailored to individual learning styles and paces, making education more accessible and effective for a diverse range of students.

AI Converted Text (Output 3): AI content writing takes significant importance for students due to its multifaceted benefits. Firstly, it helps as a valuable educational support tool by giving explanations, summaries, and practice questions, what can enhance pupils' comprehension of complex issues. Additionally, AI can automate jobs such as mark-taking and drafting, allowing students to allocate more hours to critical thoughts and analysis. which not solely improves efficiency but additionally deepens the comprehension of the material. plus, AI has the ability to develop personalized education content tailored to individual education styles and paces, making education more accessible and effective for a diverse range of students.
GPTZero Score: 13%

Sample 4

Original Human Text: There's been some chatter about how things are running lately. It feels like some decisions might not be aligning with what everyone needs. There's a sense that not everyone is getting a fair shot, and it's causing some frustration. It would be great if we could find a way to ensure that everyone is in the right place doing what they do best. That way, things could run more smoothly, and everyone would feel more supported. Just something to think about!

AI Converted Text (Output 4): There's been some chatter about how things are moving lately. It feels like some decisions might not be align with something everyone wants. There's a sense that not anyone is get a fair shot, and it's causing some frustration. It would be good if we could discover a way to see that everybody is in the good place, doing which you do best. That way, things could run more smoothly, and anyone would feeling more supported. Just some to remember about!
GPTZero Score: 23%

Sample 5

Original Human Text: An unorganized system can significantly exacerbate the risks associated with handling very dangerous substances. One of the primary issues is the lack of protocols and procedures. In an unorganized system, there may be no clear guidelines or standard operating procedures (SOPs) for handling hazardous materials. This lack of structure can lead to inconsistent practices, increasing the likelihood of accidents and exposure. Another critical aspect is inadequate training. Without a structured training program, personnel may not be fully aware of the risks associated with dangerous substances or the proper methods for handling them. This can result in improper use, storage, and disposal, all of which can lead to serious health and environmental hazards. Effective communication is also crucial in any system dealing with hazardous materials. An unorganized system may lack clear communication channels, leading to misunderstandings, misinformation, and delayed responses to emergencies.

AI Converted Text (Output 5): An unorganized organization can drastically worsen the risks related with handling extremely hazardous material. One of the main problems is the lack of protocols and procedures. In an unorganized organization, there may be no clear regulations or standard operating procedures (SOPs) for treating hazardous materials. This lack of structure can led to inconsistent practices, increasing the likelihood of injuries and exposure. Another critical aspect is inadequate training. Without a structured education plan, personnel may not be completely mindful of the risks associated with dangerous materials or the proper techniques for treating those. That can resulting in improper use, storage, and disposal, all of which can leading to severe health and ecological risks. Good communication is also essential in any organisation covered with toxic chemicals. An unorganized organization may lack clear
GPTZero Score: 10%