Detecting and Integrating ChatGPT in the College Classroom

Sebastian Williams, PhD

Instructors can use several methods to detect whether a text has been authored by a chat bot such as ChatGPT, including both heuristic approaches and GPTZero. Additionally, instructors can limit the use of ChatGPT and other chat bots by modifying assignment prompts, asking for pre-writing (e.g., outlines) and early drafts, or by requiring more handwritten work.

Ultimately, however, faculty should try to integrate these technologies into the learning process. ChatGPT does not fully eliminate the need for problem solving or navigating complex issues, and, in many cases, it frees up writers to worry less about grammar or style than higher order issues such as developing original claims or expressing an individual position.

Heuristic Methods for Detection

ChatGPT can be detected by many writing instructors based on organization, word choice, and the originality of ideas.

For example, when asked to write a close reading of William Wordsworth’s “The World is Too Much with Us,” ChatGPT authored a relatively formulaic response that moved from the start of the poem to the end (i.e., chronologically) and was unable to produce a coherent argument when asked to organize topically instead. (See Appendix for the sample text created by Chat GPT.)

This reflects another tendency in the current iteration of ChatGPT to organize short essays using a tripartite structure. For example, ChatGPT now tends to write “First, X occurred [. . .] Then, Y happened [. . .] Finally, Z was present.” Though machine-learning programs are able to adapt over time, some writing experts have noted a simple, formulaic organization method (“X, Y, Z”). This may only be true if a user does not modify an original response (e.g., by asking ChatGPT to write more).[1]

Other basic issues indicated that the text was automated: the AI-generated text misquotes the final line of Wordsworth’s poem as “the sea that bares her bosom to the moon; / The winds that will be howling at all hours.” The bot includes two lines (not a single line) from the third quatrain and replaces “This Sea” (with a capital “S”) with “the sea.” Plus, the actual final line of the sonnet is significantly different; these surface-level problems caught my attention even before submitting it to GPTZero.

The chat bot also tends to elide in-text citations (though, again, this may change as the technology adapts). And, perhaps most notably for contemporary writing instructors, the essay reiterates only basic ideas or the most accepted interpretations similar to those on Sparknotes, Cliff Notes, and Wikipedia. I.e., it offers the most accepted reading and the main points easily found elsewhere, and, in terms of argumentation, the close reading is not an original or unique interpretation. The sample essay probably would not score an “A” based on expectations for college-level students to go beyond the commonplace reading, balance multiple interpretations at once, or to “read against the grain.”

Formulaic organization, simple interpretations, and basic errors when identifying the lines of a poem are not definitive proof that cheating has occurred, however. A student (or parent) might easily point out that the issues above are reasonable for a student writer.

GPTZero

GPTZero detects AI-generated text based on “perplexity” and “burstiness.” Perplexity refers to a probabilistic detection of unique words within natural language, including randomness and complexity. It also measures “burstiness,” or what rhetoricians commonly refer to as sentence variation (including random variations between complex, compound, simple, and compound complex sentences). GPTZero detects patterns in word choice and sentence length based on massive databases of stored information. In sum, human writers are often more random than a chat bot, though, obviously ChatGPT is not a static technology – it “learns” over time based on inputs.

GPTZero is currently free to use and generates a descriptive report that not only evaluates a sample text but also explains terminology.

In the sample text, GPTZero detected that the close reading of Wordsworth’s poem had a comparatively low rate of perplexity in several instances (see Figure 1) – an indication that certain parts of the text were not written by a human.

Fig 1 The perplexity evaluates the unique word choices that writers make. AI-generated text is less perplex, but, as the program notes, it’s also important to note perplexity across sentences (i.e., relative to the individual text more so than general word use).

Repeated use of GPTZero with the same text found the same results:

  • Perplexity: 16
  • Perplexity (across sentences): 46.1
  • The line with the highest perplexity: 151
  • GPT Zero wrongly concluded that the text was human-generated (see Figure 2)
Fig 2 The green text indicates that false-negative conclusion of GPTZero in this first attempt.

The last two bullet points are related. The most unique line in the essay came from a direct quote of the poem (it is actually two sentences, but it is missing punctuation). In other words, the AI-generated text included several quotes from a Romantic poet whose work is highly distinct (see Figure 3). When summarizing the results, this led to a false-negative report. The tool was fooled into believing all of the text was written by a human because several parts were in fact written by Wordsworth.

Fig 3 The most perplex sentences were written by Wordsworth, which, given their frequency in the essay, ultimately led the GPTZero bot to make a false conclusion.

Submitting a Text without Direct Quotes

I submitted a close reading of the poem without direct quotations. This involved deleting quotes while still maintaining the integrity of the syntax in the original essay. The results were promising:

  • Perplexity: 10
  • Perplexity (across sentences): 32.8.
  • The line with the highest perplexity: 58.
  • GPT Zero correctly concluded that the text was AI-generated.
 Fig 4 When eliminating direct quotations, GPTZero came to the correct conclusion.

This indicates that GPTZero is an effective tool for recognizing when text is AI-generated. However, when using the tool, instructors need to be aware that false-negatives are highly possible in a sample text that quotes other material.

Finally, GPTZero users should acknowledge two important factors. First, GPTZero was “trained” on a previous iteration of ChatGPT and will likely become less effective over time. Second, GPTZero uses a limited, probabilistic method and cannot confirm its results with 100-percent accuracy. Students accused of using ChatGPT could easily point out that GPTZero is subject to errors in several cases.

Other Ways of Addressing AI-Generated Text

Instructors should also recognize that they may need to modify their assignment prompts, ask for pre‑writing or drafts, or require more handwritten work to diminish use of ChatGPT.

For example, ChatGPT is less effective when writing about events, texts, or issues within the last few years (post-2020).[2] So, asking a student to analyze a historical document and then to connect it to current events may impact their ability to use a chat bot in place of original work.

Alternatively, asking students to write assignments by hand or to turn in pre-writing and drafts will likely offset the use of ChatGPT. But students can obviously generate an automated text and work backward from it, either copying it by hand or creating an outline after the fact (which is something many students admit to doing in composition classes anyway).

In short, these are not fool-proof methods.

Conclusion: ChatGPT in the Classroom

Perhaps the best method to address automated text in classrooms is to change the mindset that instructors have when approaching the issue. Rather than viewing ChatGPT as yet another tool for plagiarism, instructors can adopt methods for integrating it into classwork.

For example, in a previous writing course, I asked students to generate poetry using Google Verse, which is a machine-learning chat bot similar to ChatGPT that is designed to write poetry specifically. One assignment required students to generate a poem using AI and then to analyze what is omitted in the process. The assignment recognizes that these technologies are highly effective, yet it shows students the value of individual thought, critical analysis, and creativity.

In a literary studies class, a similar assignment might ask students to read a ChatGPT-generated close reading of Wordsworth’s poetry. Then, students could write about what the chat bot has omitted (I have used a similar assignment with SparkNotes and Wikipedia entries). This acknowledges the value of balancing multiple, often contradictory, interpretations of a literary text while still finding meaning in imaginative writing. Most importantly, it does not necessarily downplay the value of chat bots in organizing information – especially regarding dominant interpretations or common knowledge.

Ultimately, such technology will likely allow students and instructors to create more complex assignments and courses overall. Rather than viewing ChatGPT as a floodgate for new issues, instructors should use technologies to their advantage to eliminate some of the more tedious aspects of assignments and course design.

As a concluding remark, it seems worth mentioning that the text I chose to sample, Wordsworth’s “The World is Too Much with Us,” is itself a meditation on how technology changes the human imagination. But a twenty-first-century interpretation of the poem might recognize that what Wordsworth laments is not unique to the Industrial Revolution, and that the persona is ultimately critiquing how humankind uses and responds to technology – not necessarily the technology itself.


[1] Ian Bogost, “ChatGPT is Dumber than You Think.” The Atlantic, 7 Dec. 2022, https://www.theatlantic.com/technology/archive/2022/12/chatgpt-openai-artificial-intelligence-writing-ethics/672386/. Bogost is a world-renowned digital humanist and rhetorical theorist.

[2] Kevin Roose, “The Brilliance and Weirdness of ChatGPT,” New York Times, 5 Dec. 2022, https://www.nytimes.com/2022/12/05/technology/chatgpt-ai-twitter.html. ChatGPT does not “crawl the web for current events,” and its knowledge is somewhat restricted.


Leave a Reply

Your e-mail address will not be published. Required fields are marked *.