Users can be ‘rude’ To AI services to be more effecient & sustainable

Do you speak AI? It’s not a question that we’re used to yet, but it might be soon. At its lower level, artificial intelligence obviously has a language in terms of the coding syntax, structure and software methodology used by the developers and data scientists who build it. It also has a language in terms of its data model, its employment of large and small language models and the data fabric that it operates in. But AI also has a human language.

Users who have experimented with ChatGPT, Anthropic’s Claude, Google Gemini, Microsoft Copilot, Deepseek, Perplexity, Meta AI through WhatsApp, or one of the enterprise platform AI services such as Amazon Code Whisperer will know that there’s a right way and a wrong way to ask for automation intelligence.

Being quite specific about your requests and structuring the language in a prompt with more precise descriptive terms to direct an AI service and narrow its options is generally a way of getting a more accurate result. Then there’s the politeness factor.

The Politics Of Politeness

Although some analysis of this space and a degree of research suggests a polite approach is best when interacting with an AI service (it might help to be better humans, after all), there is a wider argument that says politeness isn’t actually required as it takes up extra “token” space… and that’s not computationally efficient or good for the planet’s datacenter carbon footprint. A token is a core unit of natural language text or some component of an image, audio clip or video, depending on the “modality” of the AI processing happening; while “sullen” is one token, “sullenness” would more likely be two tokens: “sullen” and “ness” in total. All those please, thank yous and “you’re just awesome” interactions a user has with AI are not necessarily a good idea.

So let’s ask ChatGPT what to do…

Inference Complexity Scales With Length

Keen to voice an opinion on this subject is Aleš Wilk, cloud software and SEO specialist at Apify, a company known for its platform that allows developers to build, deploy and publish web scrapers, AI agents and automation tools.

“To understand this rising topic of conversation further, we need to start by realising that every token a user submits to an AI language model represents a unit that is measurable in computational cost,” said Wilk. “These models work and rely on ‘transformer architectures’, where inference complexity scales with sequence length, particularly due to the quadratic nature of self-attention mechanisms. Using non-functional language like ‘please’ or ‘thank you’ feels like a natural level of conversational dialogue. But, it can inflate prompt length by 15-40% without contributing to semantic precision or task relevance.”

Looking at this from a technical and efficiency point of view, this is a hidden cost. Wilk explains that if we look at platforms such as GPT-4-turbo, for example, where the pricing and compute are token-based, verbosity in prompt design directly increases inference time, energy consumption and operational expenditure. Also he notes, empirical analyses suggest that 1,000 tokens on a state-of-the-art LLM can emit 0.5 to 4 grams of CO₂, depending on model size, optimization and deployment infrastructure. On a larger scale and across billions of daily prompts, unnecessary tokens can contribute to thousands of metric tons of additional emissions annually.

“This topic has become widely discussed, as it not only concerns cost, but also sustainability. Looking at GPU-intensive inference environments, longer prompts can drive up power draw, increase cooling requirements and reduce throughput efficiency. Why? Because as AI moves into continuous pipelines, agent frameworks, RAG systems and embedded business operations, for example, the marginal ineffectiveness of prompt padding can aggregate into a big environmental impact,” underlined Wilk.

Streamlining User Inputs

An optimization specialist himself, Wilk offers a potential solution by saying that one notion is that developers and data scientists could create a prompt design similar to how they write performance code, such as removing redundancy, maximizing functional utility and streamlining user inputs. In the same way that we use linters and profilers (code improvement tools) for software, we need tools to clean and token-optimize prompts automatically.

For now, Wilks says he would encourage users to be precise and minimal with their prompts. “Saying ‘please’ and ‘thank you’ to AI might feel polite, but it’s polite pollution in computational terms,” he stated.

Greg Osuri, founder of Akash, a company known for its decentralized compute marketplaceagrees that the environmental impact of AI is no longer just a peripheral concern, it is a central design challenge. He points to reports suggesting AI inference costs contribute to more than 80% of total AI energy consumption. The industry has spent the last couple of years pushing for bigger models, better performance and faster deployment, but AI inference, the process that a trained LLM model uses to draw conclusions from brand-new data, might be doing most of the damage right now.

Language Models vs Google Search

“Each user query on LLM models consumes approximately 10 to 15 times more energy than a standard Google search. Behind every response lies an extremely energy-intensive infrastructure. This challenge isn’t just about energy usage in abstract terms, we’re talking about a whole supply chain of emissions that begins with a casual prompt and ends in megawatts of infrastructure demand and millions of gallons of water being consumed,” detailed Osuri, speaking to a closed press gathering this month.

He agrees that a lot is being said around polite prompts and whether it is more energy-efficient to be rude (or at least direct and to the point) to AI; however, he says these conversations are missing the broader point.

“Most of the AI architecture today is inefficient by design. As someone who has spent years developing software and supporting infrastructure, it’s surprising how little scrutiny we apply to prompt efficiency. In traditional engineering, we optimize everything. Strip any redundancies, track performance and reduce waste wherever it’s possible. The real question is whether the current centralized architecture is fit for scale in a world that is increasingly carbon-constrained. Unless we start designing for energy as a critical constraint, we will continue training models and further accelerating our own limitations,” he concluded.

This discussion will inevitably come around to whether AI itself has managed to become sentient. When that happens, AI will have enough self-awareness and consciousness to have conscious subjective feelings and so be able to make an executive decision on how to manage the politeness vs. processing power balance. Until then, we need to remember that we are basically just using language models to generate content, be it code, words or images.

If I Had An AI Hammer

“Being polite or rude is a waste of precious context space. What users are trying to accomplish is to get the AI to generate the content they want. The more concise and direct we are with our prompts, the better the output will be,” explained Brett Smith, distinguished software engineer and platform architect at SAS. “We don’t use formalities when we write code, so why should we use formalities when we write prompts for AI? If we look at LLMs as a tool like a hammer, we don’t say ‘please’ when we hit a nail with a hammer. We just do it. The same goes for AI prompts. You are wasting precious context space and getting no benefits from being polite or rude.”

The problem is, humans like empathy. This means that when an AI service answers in a chatty and familiar manner that is purpose-built to imitate human conversations, humans are more likely to want to be friendly in response. The general rule is, the more concise and direct users are with your prompts, the better the output will be.

“The AI is not sentient… and it does not need to be treated as such,” asserted Smith. Stop burning up compute cycles, wasting datacenter electricity and heating up the planet with your polite prompts. I am not saying we ‘zero-shot’ every prompt [a term used to define when we ask an AI LLM a question or give it a task without providing any context or examples], but users can be concise, direct and maybe consider reading some prompt engineering guides. Use the context space for what it is meant for, generating content. From a software engineering perspective, being polite is a waste of resources. Eventually, you run out of context and the model will forget you ever told it ‘please’ and ‘thank you’ anyway. However, you may benefit as a person in the long term from being more polite when you talk to your LLM, as it may lead to you being nicer in personal interactions with humans.”

SAS’s Smith reminds us that AI tokens are not free. He also envisages what he calls a “hilarious hypothetical circumstance” where our please and thank you prompts get adopted by the software itself and agents end up adding in niceties when talking agent-to-agent. The whole thing ends up spinning out of control increasing the velocity the system wastes tokens, context space and compute power as the agent-to-agent communication grows. Thankfully, we can program against that reality, mostly.

War On Waste

Mustafa Kabul says that when it comes to managing enterprise supply chains at wider business level (not just in terms of software and data) prudent businesses have spent decades eliminating waste from every process i.e. excess inventory, redundant touchpoints, unnecessary steps.

“The same operational discipline must apply to our AI interactions,” said Kabul, in his capacity as SVP of data science, machine learning and AI at decision intelligence company Aera Technology.

“When you’re orchestrating agent teams across demand planning, procurement and logistics decisions at enterprise scale, every inefficient prompt multiplies exponentially. Inside operations we’ve managed, we have seen how agent teams coordinate complex multi-step workflows – one agent monitoring inventory levels, another forecasting demand, a third generating replenishment recommendations. In these orchestrated operations, a single ‘please’ in a prompt template used across thousands of daily decisions doesn’t just waste computational resources, it introduces latency that can cascade through the entire decision chain,” clarified Kabul.

He says that just as we (as a collective business-technology community) have learned that lean operations require precision, not politeness, effective AI agent coordination demands the same “ruthless efficiency” today. Kabul insists that the companies who treat AI interactions with the same operational rigor that they apply to their manufacturing processes will have a “decisive advantage” in both speed and sustainability.

Would You Mind, Awfully?

Although the UK may be known for their unerring politeness, even the British will perhaps need to learn to drop the normal airs and graces we would normally consider a requisite part of normal civilities and social intercourse. The chatbot doesn’t mind if you don’t say please… and, if your first AI response isn’t what you wanted, don’t be ever so English and think you need to say sorry either.

What's On

Today’s NYT Mini Hints And Answers For Wednesday, June 17

Trump tariffs are generating only 25% of the revenue needed to pay interest on national debt

Why New iPhone Software Update Is Tracking For Early July

The Politics Of Politeness

Inference Complexity Scales With Length

Streamlining User Inputs

Language Models vs Google Search

If I Had An AI Hammer

War On Waste

Would You Mind, Awfully?

Today’s NYT Mini Hints And Answers For Wednesday, June 17

Why New iPhone Software Update Is Tracking For Early July

Health Outcomes Data For Wearables Key To Medicare Coverage Expansion

Our Picks

Today’s NYT Mini Hints And Answers For Wednesday, June 17

Trump tariffs are generating only 25% of the revenue needed to pay interest on national debt

Why New iPhone Software Update Is Tracking For Early July

Most Popular

No country for rich men: 6 out of 10 wealthy Americans want to pull a Clooney and pack their bags

Health Outcomes Data For Wearables Key To Medicare Coverage Expansion

Who is Michael Truell? Meet Cursor’s 25-year-old CEO who cemented a $60 billion deal with SpaceX

Archives

Categories

What's On

Users can be ‘rude’ To AI services to be more effecient & sustainable

The Politics Of Politeness

Inference Complexity Scales With Length

Streamlining User Inputs

Language Models vs Google Search

If I Had An AI Hammer

War On Waste

Would You Mind, Awfully?

Related Articles