Confession: I don’t know what is meant by “the college essay.”
This phrase has been the shorthand for a type of student writing deployed over the past few weeks in a discussion about the relationship between college classes and AI programs like ChatGPT-3 that launched in November, which I touched on in a Weekly Varia a few weeks ago. These programs produce a block of unique text that imitates the type of writing requested in response to a prompt. In its outline, input/output mimics what students do in response to prompts from their professors.
The launch of ChatGPT has led to an outpouring of commentary. Stephen Marche declared in The Atlantic that the college essay is dead and that humanists who fail to adjust to this technology will be committing soft suicide, which followed on from a post earlier this year by Mike Sharples declaring that this algorithm had produced a “graduate level” essay. I have also seen anecdotal accounts of professors who have caught students using ChatGPT to produce papers and concern about being able to process this as an honor code violation both because the technology is not addressed explicitly in the school’s regulation and because they lacked concrete evidence that it was used. (OpenAI is aware of these concerns, and one of their projects is to watermark generated text.) Some professors have suggested that this tool will give them no choice but to return to in-class, written tests that are rife with inequities.
But among these rounds of worry, I found myself returning to my initial confusion about the nature of “the college essay.” My confusion, I have decided, is that the phrase is an amorphous, if not totally empty, signifier that generally refers to whatever type of writing that a professor thinks his or her students should be able to produce. If Mike Sharples’ hyperbolic determination that the sample produced in his article is a “graduate level” essay is any guide, these standards can vary quite wildly.
For what it is worth, ChatGPT is pretty sure that the phrase refers to an admissions personal statement.
When I finished my PhD back in 2017, I decided that I would never assign an in-class test unless there was absolutely no other recourse (i.e. if someone above me demanded that I do so). Years of grading timed blue-book exams had convinced me that these exams were a mismatch for what history courses were claiming to teach, while a combination of weekly quizzes that the students could retake as many times as they want (if I’m asking the question, I think it is worth knowing) and take-home exams would align better with what I was looking to assess. This also matched with pedagogical commitment to writing across the curriculum. The quizzes provided accountability for the readings and attention to the course lectures, as well as one or more short answer questions that tasked the students with, basically, writing a thesis, while the exams had the students write two essays, one from each of two sets of questions that they were then allowed to revise. Together, these two types of assignments allowed the students to demonstrate both their mastery over the basic facts and details of the course material and the higher-order skills of synthesizing material into an argument.
My systems have changed in several significant ways since then, but the purpose of my assignments has not.
First, I have been moving away from quizzes. This change has been a concession to technology as much as anything. Since starting this system on Canvas, I moved to a job that uses Blackboard and I have not been able to find an easy system for grading short answer questions. I still find these quizzes a valuable component of my general education courses where they can consist entirely of true/false, multiple choice, fill in the blank, and other types of questions that are automatically graded. In upper-level courses where I found the short-answer questions to be the most valuable part of the assignment, by contrast, I am simply phasing them out.
Second, whether as a supplement to or in lieu of the quizzes, I have started assigning a weekly course journal. In this assignment, the students are tasked with choosing from a standard set of prompts (e.g. “what was the most interesting thing you learned this week,” “what was something that you didn’t understand this week form the course material? Work through the issue and see if you can understand it,” “what was something that you learned this week that changes something you previously wrote for this course?”) and then writing roughly a paragraph. I started assigning these journals in spring 2022 and they quickly became my favorite things to grade because they are a low-stakes writing assignment that give me a clear insight into what the students have learned from my class. Where the students are confused, I can also offer gentle guidance.
Third, I have stopped doing take-home exams. I realized at some point that, while take home exams were better than in-class exams, my students were still producing exam-ish essay answers and I was contributing to this problem in two ways. First, two essays was quite a lot of writing to complete well in the one week that I allotted for the exam. Second, by calling it an exam most students were treating it as only a marginal step away from the in class exam where one is assessed on whether they have the recall and in-the-moment agility to produce reasonable essays in a short period of time.
What if, I thought, I simply removed the exam title and spread the essays out over multiple paper assignments?
The papers I now assign actually use some of the same prompts that I used to assign on exams, which were big questions in the field the sort that you might see on a comprehensive exam, but I now focus on giving the students tools to analyze the readings and organize their thoughts into good essays. Writing, in other words, has become an explicit part of the assignment, and every paper is accompanied by a meta-cognitive reflection about the process.
Given this context, I was more sanguine about ChatGPT than most of the commentary I had seen, but, naturally, I was curious. After all, Sharples had declared that a piece of writing it produced was graduate level and Stephen Marche had assessed it lower, but still assigned it a B+. I would have marked the essay in question lower based on the writing (maybe a generous B-), and failed it for having invented a citation (especially for a graduate class!), but I would be on firmer footing for history papers of the sort that I grade, so I decided to run an experiment.
The first prompt I assigned is one that will, very likely, appear in some form or another in one of my classes next semester: “assess the causes underlying the collapse of the Roman Republic and identify the most important factor.” I am quite confident in assigning the AI a failing grade.
There were multiple issues with ChatGPT’s submission, but I did not expect the most obvious fault with the essay. The following text appeared near the end of the essay.

Vercingetorix’ victory was, I’m sure, quite a surprise for both him and Julius Caesar. If I had to guess, the AI conflated the fall of the Roman Republic with the fall of the Roman Empire, thus taking the talking points for the Empire and applying them to the names from the time of the Republic. After all, ChatGPT produces text by assembling words without understanding the meaning behind them. Then again, this conflation also appears in any number of think-pieces about the United States as Rome, too.
But beyond this particular howler, the produced text has several critical issues.
For one, “Internal conflict, economic troubles, and military defeats” are exceptionally broad categories each of which could make for a direction to take the paper, but together they become so generic as to obscure any attempt at a thesis. “It was complex” is a general truism about the past, not a satisfactory argument.
For another, the essay lacks adequate citations. In the first attempt, the AI produced only two “citations,” both listed at the end of the paper. As I tell my students, listing sources at the end isn’t the same thing as citing where you are getting the information. Upon some revision, the AI did manage to provide some in-text citations, but not nearly enough and not from anything I would have assigned for the class.
A second test, using a prompt I did assign based on Rudyard Kipling’s The White Man’s Burden, produced similarly egregious results. The essay had an uninspired, but a mostly adequate thesis, at least as a starting point, but then proceeded to use three secondary sources, none of which existed in the format that they were cited. Unless the substantial C.V. of the well-published scholar Sarah C. Chambers is missing a publication on a topic outside her central areas of research, she hasn’t argued what the paper claims she did.
A third test, about Hellenistic Judea, cited an irrelevant section of 1 Maccabees and a chapter in the Cambridge History of Judaism, albeit about Qumram and neither from the right volume nor with the right information for the citation. You get the idea.
None of these papers would have received a passing grade from me based on citations alone even before I switched to a specifications grading model. And that is before considering that the AI does even worse with metacognition, for obvious reasons.
In fact, if a student were to provide a quality essay produced by ChatGPT that was accurate, had a good thesis, and was properly cited, and then explained the process by which they produced the essay in their metacognitive component, I would give that student an A in a normal scheme or the highest marks in my specs system. Not only would such a task be quite hard given the current state of AI, but it would also require the student to know my course material well enough to identify any potential inaccuracies and have the attention to detail to make sure that the citations were correct, to say nothing of demonstrating the engagement through their reflection. I don’t mind students using tools except when those tools become crutches that get in the way of learning.
In a similar vein, I have no problem with students using citation generators except that most don’t realize that you shouldn’t put blind faith in the generator. You have to know both the citation style and the type of source you are citing well enough to edit whatever it gives you, which itself demonstrates your knowledge.
More inventive teachers than I have been suggesting creative approaches to integrating ChatGPT into the classroom as a producer of counterpoints or by giving students opportunities to critique its output, not unlike the exercise I did above. I have also seen the suggestion that it could be valuable for synthesizing complex ideas into digestible format, though this use I think loses something by treating a complex text as though it has only one possible meaning. It also produces a reasonable facsimile of discussion questions, though it struggles to answer them in a meaningful way.
I might dabble with some of these ideas, but I also find myself inclined to take my classes back to the basics. Not a return to timed, in-class tests, but doubling down on simple, basic ideas like opening student ideas to big, open-ended questions, carefully reading sources (especially primary sources) and talking about what they have to say, and how to articulate an interpretation of the past based on those sources–all the while being up front with the students about the purpose behind these assignments.
My lack of concern about ChatGPT at this point might reflect how far from the norm my assessment has strayed. I suspect that when people refer to “the college essay,” they’re thinking of the one-off, minimally-sourced essay that rewards superficial proficiency of the sort that I grew frustrated with. The type of assignment that favors expedience over process. In this sense, I find myself aligned with commentators who suggest that this disruption should be treated as an opportunity rather than an existential threat. To echo the title from a recent post at John Warner’s SubStack, “ChatGPT can’t kill anything worth preserving.”