Let’s start with a simple question:
What is ChatGPT, the new text-generating software from the research lab OpenAI?
ChatGPT is a variant of the GPT (Generative Pre-trained Transformer) language model. It uses a deep neural network with an architecture called the Transformer, which is trained on a large dataset of text. The model learns to predict the next word in a sentence given the context of the previous words.
Hold up. I, Mark Bernstein, did not write the previous paragraph. ChatGPT wrote it for me when I gave it the prompt, “Please explain what ChatGPT is.” (Why I felt it necessary to say “Please” to a computer program is a different question.) Within seconds, the chatbot produced five reasonably lucid paragraphs explaining its existence. I’ve reproduced the first one here.
Though it may sound snobbish of me, the text ChatGPT generated is ... fine. The information is accurate and covers basic points. As prose, it’s dry and lacks any of the depth or style that a reader should expect from a PAW story, right? (Right?) Moreover — and this is not unimportant in a piece of writing appearing under my byline — it was not my own work.
In late November, San Francisco-based OpenAI (“AI” stands for artificial intelligence) released ChatGPT on its website, and the world went crazy. The site has been visited millions of times — for free now, though the company has said it will soon start selling subscriptions — and generated a comparable number of news stories, op-eds, blog posts, and tweets, the sort of writing that for now at least is still done by humans. Nearly every journalistic outlet, it seems, has run a story about this new program, often under hair-raising headlines such as, “Will ChatGPT Make Me Irrelevant?” (The New York Times), “The End of High-School English” (The Atlantic), and “Will Artificial Intelligence Kill College Writing?” (The Chronicle of Higher Education).
Will it? Although the prose ChatGPT generates is mundane, it has produced work that would receive a passing grade on at least one graduate-level business school exam. Fearing an outbreak of “AIgairism” (AI-enabled plagiarism), the public-school systems in New York, Los Angeles, and Seattle have banned it on school Wi-Fi networks, and numerous colleges and universities have reportedly taken steps to regulate its use.
Could ChatGPT produce acceptable academic work at Princeton? Don’t be too confident that it couldn’t. As an experiment, Arvind Narayanan, a computer science professor, fed ChatGPT questions from the final exam he gives to students in his upper-level course on “Information Security” (COS 432). “I had to read some of the answers three times before feeling confident I was reading nonsense,” he tweeted. “It was so unsettling I had to look at my reference solutions to make sure I wasn’t losing my mind.”
Narayanan was not alone in feeling unsettled. Marc Ratkovic, a professor in the politics department, says that press accounts about ChatGPT prompted him to add a warning to his open book take-home final in POL 345, “Introduction to Quantitative Social Science,” forbidding students from any interaction with AI algorithms. A few days before the start of the second semester, in the face of growing uncertainty among the faculty, the dean of the college and dean of the graduate school published the University’s first official guidelines on acceptable AI use. More about that later.
How should we react to ChatGPT and the even better iterations of text writing software that will follow? I posed that question to Uri Hasson, a professor of neuroscience and psychology who has studied this technology for years. Should we be amazed? Enchanted? Horrified?
His reply? “All of the above.”
Seriously, What Is ChatGPT?
To return to my original question (without the aid of a chatbot), ChatGPT is the most polished version of a type of computer program, known as a large language model (LLM), which has been around for several years. Less advanced LLMs suggest the next word in a sentence when you are composing a text or email. Another type enables Alexa and Siri to follow oral commands.
LLMs work by exposing a computer program to a range of human text. The program begins to spot patterns in the way sentences and arguments are structured, which words tend to follow other words, how tenses work, how prefixes and suffixes are used, and so on, then learns to predict their occurrence in different contexts. The larger the amount of text the computer is trained on, the more educated its guesses.
OpenAI says that ChatGPT has been trained on huge swaths of the internet, analyzing countless numbers of books, scholarly papers, blog posts, Wikipedia entries, newspaper articles, magazine stories, and plain idle chatter, which is why it seems so lifelike. It has also been exposed to thousands of staged conversations between human AI trainers. When the chatbot tried to re-create a conversation, human trainers fine-tuned it so it delivered the most appropriate responses.
Because ChatGPT recognizes the elements of different styles of writing, it can imitate them as well, not only producing something, say, in the form of a poem or a legal contract, but one written in the voice of a famous author. One drawback to being trained on the internet is that previous versions absorbed a lot of racism and alternative facts along with the great works of literature, something OpenAI says has been programmed out of ChatGPT. It will also refuse to tell you how to do something illegal, such as rob a bank or build a bomb. But it could.
OpenAI was founded in 2015 as a nonprofit research laboratory, with Elon Musk and Peter Thiel among its early investors. (Neither currently has a leadership role in the company.) In November, it released GPT-3 , which could create simple text such as tweets, blog posts, and short news stories. ChatGPT runs on a more sophisticated version of this program, GPT-3.5, and an even better version, GPT-4, is expected later this year.
Now that the public has discovered these chatbots, their commercial potential is undeniable. It took GPT-3 two years to attract a million subscribers; ChatGPT got a million visitors within five days. In late January, Microsoft said it was making a multibillion-dollar investment, reported by The New York Times to be $10 billion, in OpenAI as competitors, including Google parent company Alphabet, rush to get their own products to market.
‘A Statistical-Echoing Machine’
Here is another prompt I posed to ChatGPT: Recite the Princeton Honor Pledge. Within seconds, it printed out this: “I pledge that I have neither given nor received assistance on this academic work.”
Next, I upped the ante: Recite the Princeton Honor Pledge in the style of F. Scott Fitzgerald 1917. ChatGPT responded: “I swear upon my honor, that I have neither bestowed nor received aid in this scholarly pursuit, for to do so would be a betrayal of the ideals of integrity and fair play that I hold dear, like the glistening of the sun on the surface of the water in the novel The Great Gatsby by F. Scott Fitzgerald.”
That’s a lame Fitzgerald imitation, if you ask me, but ChatGPT performed better when I asked it to recite the pledge in the style of William Shakespeare: “Upon mine honor I do swear, that neither aid given nor received have I here. To break this pledge were a treachery, akin to the betrayal of Julius Caesar, by Brutus and his ilk. Thus, I do solemnly avow to uphold truth and integrity, now and evermore.” Try signing that on your next take-home test.
There is no limit to the amount of time one can waste playing around like this. (Ask ChatGPT to recite the 23rd Psalm in the style of Donald Trump. You’ll thank me.) It’s a hoot, except for one small point, which you may have noticed: In response to my initial query, it misquoted the Princeton Honor Pledge. It sounded right, the wording was close, but the information was wrong.
As Sanjeev Arora, a professor of computer science, explains, ChatGPT is designed to generate plausible text, not necessarily accurate text. Visiting the ChatGPT website, he walked me through one vivid example. He first asked it to generate a new story about the history of Baker Rink, which it did competently. But when he asked it to write a news item about Tom Cruise visiting Baker Rink, it did that equally well, inventing something out of whole cloth, including made-up quotes. Arora calls this “bunkum,” though some of his colleagues are less polite. Narayanan and Sayash Kapoor, a Ph.D. candidate in computer science, have dubbed ChatGPT “the greatest bullshitter ever,” drawing on the work of former Princeton philosophy professor Harry Frankfurt, who defined that term as “speech intended to persuade without regard for the truth.”
Bunkum — or bullshit, if you will — sharply limits ChatGPT’s usefulness. The tech site CNET used AI to generate dozens of news stories for its website, only to pull them down when human fact checkers discovered that they were full of “very dumb” errors. There are other limitations, as well: ChatGPT was not trained on information after 2021 or on information behind a paywall.
Hasson, who runs the Hasson Lab at the Princeton Neuroscience Institute, has thought a lot about these “statistical learning machines,” but his concerns are philosophical as well as technical.
“They don’t understand anything, they are simply echoing,” Hasson says of programs like ChatGPT. “Still, it’s amazing how smart you can appear to be simply by imitating and echoing. Now, you can ask, are we [humans] smart because we’re echoing The New York Times? Or are we smart because we are thinking and understanding? To what extent do we only look smart, but maybe if you look inside, we’re also just a statistical-echoing machine? That’s the question that bothers me the most.”
GPTZero to the Rescue?
Hoping to get ahead of concerns about AI plagiarism, OpenAI announced that it was looking into adding a digital “watermark” to ChatGPT text. A Princeton undergraduate may have found an even more ingenious solution. For his senior thesis project, Edward Tian ’23 has developed a program called GPTZero, which purports to detect whether a piece of writing was generated by artificial intelligence. His mantra is, “Humans deserve to know the truth.”
Tian, a computer science major, says he has been concerned about the misapplication of AI for some time. Last year, he TA’d for a machine learning class and recalls that even then he found it hard to distinguish some machine-generated text from the human-generated kind. Over the holidays, he sketched out the basic structure of GPTZero within a few days while sitting in a coffee shop near his home in Toronto.
GPTZero analyzes any piece of writing for two qualities that suggest whether it was written by AI. Tian has dubbed them “perplexity” and “burstiness.” Simply put, humans tend to make weird, unpredictable linguistic and organizational choices in our writing, both within sentences and across them. AI prose, on the other hand, tends to generate simpler sentences with fewer twists and turns. Tian’s program assigns a numerical score for each metric. The lower the score, the more likely something was automatically generated.
To illustrate how GPTZero works, Tian draws on a very different dimension of his undergraduate experience, a class in creative nonfiction taught by John McPhee ’53, where he says he learned to appreciate the beauty and elegance of good writing. Shortly after GPTZero was released, Tian posted a video on Twitter in which he asked it to assess one of McPhee’s New Yorker articles. Not surprisingly, McPhee’s work received a very high score on both perplexity and burstiness. As computer-generated balloons floated across the screen, the app rendered its judgment: “Your text is likely human generated!” The New Yorker’s editor, David Remnick ’81, was no doubt relieved.
To a public increasingly worried about a dystopian future filled with AI written prose, Tian became an unlikely hero. He may be the only senior in Princeton history to have his thesis profiled, while he was still writing it, by NPR, CBS, the BBC, and The Guardian, along with numerous other outlets. More than 23,000 teachers from 40 states and 30 countries have signed up for GPTZero since its release on Jan. 2, Tian says, and once he finally submits his thesis in April, he hopes to launch it as a startup company.
The challenge of thwarting a computer program developed by a multibillion-dollar tech company would seem to be enough to tackle in any senior thesis, but Tian admits to more humanistic concerns. “I can imagine a world 10 or 20 years down the line where everyone is using ChatGPT to write,” he explains. “That’s sad to me, because if everyone is writing the same thing, then no one is writing anything original anymore. There are no more John McPhees, no more Susan Sontags. This beautiful prose is lost because ChatGPT is only regurgitating what has already been written.”
‘Compel Us To Be Better Teachers’
Because ChatGPT burst into the public consciousness in late December and early January, when students and faculty were away, its impact on campus has been muted. Nevertheless, many were uneasy.
Historian Anthony Grafton, who also directs the Program in European Cultural Studies, admitted his own qualms. “At this point I think we’re all nervously waiting for the next shoe to drop,” he wrote in an email over winter break. “For the fields that I teach, I suspect the corpus of texts isn’t tremendously rich — that was my conclusion from my own fooling — and I’m hoping I can retire before it gets big enough for the software to fool me.”
Grafton’s colleague, Jeremy Adelman, sounded a similar note. “People are mumbling about it, but not much more,” he wrote. “But it’s only a matter of time before there has to be a discussion.”
That discussion has now begun. On Jan. 25, just days before the new semester, Dean of the College Jill Dolan and Dean of the Graduate School Rod Priestley released a three-page letter to all University teaching faculty setting forth guidelines for the acceptable use of AI, and specifically ChatGPT. “Faculty were starting to inquire,” Dolan says, “and because there has been so much in the news about it, I think it was the right time for us to begin a discussion about the best ways to go forward.”
After taking a swipe at “copious handwringing in the media” about the dangers of AI, the deans declined to ban it altogether. “If anything,” they wrote, “AI will make higher education and the nuanced and sophisticated use of thinking it teaches even more essential. The ability of AI tools like ChatGPT to generate comprehensible text and code makes close reading and careful discernment even more important for our students and ourselves.”
All this said, the letter made clear that, under the Honor Code and academic integrity regulations, Princeton students are still expected to produce original work and acknowledge their sources. Furthermore, ChatGPT may not be cited as a co-author on a paper, something a few professional journals have decided to allow. Beyond that, the guidelines permit instructors to control the use of AI on a class-by-class or assignment-by-assignment basis. They were, however, urged to be clear about their rules for AI use, emphasize to students why doing original work matters, and design assignments that require critical thinking.
“We made a conscious decision not to focus the memo on questions of cheating,” Dolan explains, “because we want to be more forward-thinking in seeing this as a tool that inevitably students will use and encouraging faculty to think about how they can use it ethically, appropriately, and interestingly.” The few other American universities that have issued guidelines for AI use appear to have taken a similar, non-absolutist approach.
Princeton faculty seem to be greeting the new guidelines positively. “I think it’s quite good, actually,” Adelman wrote of the deans’ letter in a follow-up email. “In fact, this whole challenge can compel us to be better teachers. I think.”
Although it did not mandate them, the letter offered several suggestions for how AI could be used as a teaching tool. A few people on campus had already been thinking about that. Steven Kelts, a lecturer in the Princeton Writing Center who teaches a course on tech ethics, envisions asking students to take something produced by ChatGPT and, in effect, edit it. Did it make any factual errors? Are the examples it cites valid? How could its organizational structure be improved? Because it is good at summarizing long blocks of text, ChatGPT might be useful in helping to generate ideas or organize notes. Others, at Princeton and elsewhere, have suggested that ChatGPT may soon be used in humanities courses the way calculators are used in math classes. There are times, in other words, when it could be an acceptable labor-saving device and other times when a professor may decide that it is important for students to demonstrate mastery of the underlying skills.
Will AI Kill College Writing?
In a Dec. 20 column for The Daily Princetonian, Mohan Setty-Charity ’24 predicted a “technological arms race” between students and faculty if ChatGPT were banned on campus. Instead, he urged the University to find ways to use the new technology as a pedagogical tool, stating the case for AI as baldly as possible.
“Presumably, Princeton students are learning skills that cannot be immediately reproduced by AI,” he wrote. “If we aren’t, what are we doing here?”
Persuasive chatbots present practical, political, and philosophical questions that society will have to grapple with in coming years. In the near term, at Princeton, two things seem clear: AI is not going anywhere, and students and faculty will have to learn the best ways use it. But higher education as we know it, thank goodness, has not been rendered obsolete.
Don’t take my word for it. Ask ChatGPT: Will artificial intelligence kill college writing?
“Artificial intelligence has the potential to assist in the writing process, but it is unlikely to completely replace the need for college-level writing. AI can assist with things like grammar and spell checking, but it cannot replicate the critical thinking and analysis that is required in college-level writing. Additionally, AI may not be able to understand the nuances and context of a particular topic, which is essential in academic writing. Ultimately, while AI can be a helpful tool, it cannot replace the skills and knowledge that are developed through the process of writing at the college level.”
Would a computer lie to you?
Mark F. Bernstein ’83 is PAW’s senior writer.