+++
author = "Dixi Yao"
title = "What's the Point of Writing in the Era of LLM?"
date = "2026-02-08"
+++

I was preparing for the manuscripts for International Conference on Machine Learning (ICML) last week and came up with this question. Basically, in every conference's policy, direct LLM-generated papers are not allowed. However, it is quite easy to use LLM to generate some content. At least I know that many students back in China are just using LLM to generate huge amounts of direct paper content such as related work and conclusions, and revise it a little bit to remove some obvious AI patterns such as "--".

Well, this makes me think more about this question. What is the real point? I think the first thing I would imagine is from my past experience. I have published papers in both the ML and system communities (ML sys). Regarding these two communities, it is a completely different feeling to read papers from each. For ML papers, it is more like quickly gathering some interesting ideas with a few pages of reading. While for system papers, it is more of an enjoyment to learn something from scratch, as there is too much to learn related to systems that is not covered in textbooks. If I want to use a metaphor, it is like watching a lot of TikTok videos versus having a coffee and reading a book—both of which I enjoy a lot. Human-written papers can provide these feelings.

The second thing I would say is efficiency. The occurrence of LLM, in fact, helps us have a clear definition of what is a good paper. If any content of an academic paper can be directly generated by LLM (at least current LLMs), it would not be qualified as a good paper. In other words, all content to put in a paper can serve as prompts. For example, if someone is going to introduce federated learning and wants to use LLM to generate related work, this person must use some prompt to ask the LLM to generate it. Then, only the prompt about the high-level idea of federated learning should be in the main content of the paper rather than the following useless content generated by an LLM. Federated learning is a very broad topic. One may say, "Give me references about how we resolve data heterogeneity in FL." Well, this is all you need in the paper. Generating a lot of hallucinated references or wasted supportive ideas is meaningless, as readers just need 10 seconds to input "FL+data heterogeneity" into an LLM and will get all they want.

To make the argument stricter, we shall refine the definition in the above paragraph as "using the least tokens". For example, I found that a lot of papers published or not published today have ideas which actually can also be generated by LLM only. But it may require trillions of tokens first, e.g., reading all past ICLR, NeurIPS, ICML, etc. papers. Then, if we can use just a few words and one or two sentences to summarize, it is meaningful. Hence, I think the definition of a good paper is: *A paper contains only content which serves as prompts and cannot be generated by an LLM with other prompts that have fewer or a similar number of tokens.*

In conclusion, I think two reasons for writing papers in the era of LLM: amusement and efficiency.