The thing you're optimising is probably not the problem
I spent three weeks on a prompt last year.
It was for an AI-assisted customer qualification flow. The goal was to score inbound leads automatically based on their first message, so the sales team could prioritise follow-up. Good idea in theory. The problem was the results were inconsistent. Sometimes accurate, sometimes completely off.
So I kept refining the prompt. Added more examples. More specific instructions. Tried different temperature settings. Tested GPT-4 versus Claude. Rebuilt the few-shot examples four times.
The accuracy improved slightly. Then plateaued. And I finally asked the question I should have asked in week one: **what data is actually going into this prompt?**
The answer was a mess. The inbound lead form had been designed three years ago by someone who prioritised minimal friction over structured data. Fields were freeform text. People wrote "call me" in the budget field and "urgent" in the company size field. The AI was not failing to classify well. It was being handed unclassifiable inputs.
The problem was never the prompt.
What prompt obsession is really a symptom of
When an AI automation underperforms, the first instinct is to fix the model's instructions. This makes sense on the surface — the prompt is the most visible and editable part of the system.
But prompts are downstream of everything else. They are the last layer before output.
The quality of what an AI produces is determined, in order, by:
1. **The quality and structure of the input data** 2. **The clarity of the business rule being encoded** 3. **The workflow the AI sits inside** 4. **The prompt itself**
Most teams work the list in reverse. They start at the prompt and never reach the top.
The result is technically impressive work on the least important variable. The accuracy ceiling gets raised marginally. The underlying process problems remain.
The three upstream failures I see most often
Unstructured input being handed to a structured reasoning task
AI models are good at working with structured information. When the input is free-form — customer messages, internal notes, form submissions with no consistent format — the model is being asked to do two jobs at once: interpret the structure and then apply the reasoning.
The fix is not a better prompt. It is an intake layer that standardises input before it reaches the AI. This might be a form with constrained fields, a preprocessing step that extracts key variables from freeform text, or a validation layer that rejects inputs that cannot be processed reliably.
That work is less glamorous than prompt engineering. It is also what actually fixes the accuracy problem.
Business rules that nobody has written down
This one is more common than it should be.
A business wants to automate a decision that a human currently makes — qualifying a lead, triaging a support ticket, categorising an expense. So they try to encode that decision in a prompt.
But nobody has written down what the decision criteria actually are. The human doing it has internalized years of judgement. When asked to articulate it, they describe the easy cases. The AI learns from the easy cases and fails on the edge cases that constitute 40% of real volume.
The fix is not a better prompt. It is a rule extraction exercise — sitting with the humans who make the decision and working through a representative sample of real cases until the actual criteria are explicit. That documentation becomes the foundation for a prompt that can handle the full distribution of inputs.
I have seen teams skip this step and spend months trying to improve accuracy by adding more examples to the prompt. The examples are downstream of the rules. Without the rules being explicit, the examples are just noise at higher volume.
The AI sitting in the wrong place in the workflow
Sometimes the AI is being asked to do something at a step where it cannot have enough context to do it well.
A common example: AI-powered invoice matching that runs on invoice receipt. The invoices arrive before the purchase orders are in the system. The AI tries to match against incomplete data and produces 30% accurate results. The team blames the AI's ability to match invoices. The actual problem is the workflow sequencing — the matching step is running before the data it needs exists.
Moving the AI step to run after purchase order confirmation produces dramatically better results with no changes to the AI itself.
Workflow positioning is an architectural decision. It is not something you can solve with better prompts.
What the right diagnostic looks like
When an AI automation is underperforming, the process I now run before touching the prompt:
**Map the data path.** What goes in, where does it come from, who creates it, what format is it in, how consistent is it across cases? If the input data has more than 20% variation in format or completeness, the prompt is not the bottleneck.
**Find the explicit rule.** Can someone write down, in plain language, the decision criteria for what the AI is supposed to do? If the answer takes more than 10 minutes to write out completely, the rule is not yet clear enough to encode reliably.
**Check workflow position.** Does the AI step have access to all the information it actually needs when it runs? Is there anything a human would check before making this decision that the AI does not have access to?
If any of those three questions produce a "no" or "not sure" answer, the work to do is there — not in the prompt.
The one honest thing about this
I am not saying prompts do not matter. They do. A well-structured prompt with good examples and clear output formatting produces better results than a poorly written one on the same inputs.
The point is that prompts have a ceiling. That ceiling is determined by the quality of the process feeding them. And in most of the AI automation work I see — from our own products to clients and portfolio companies — that ceiling is lower than it should be because the upstream work was skipped.
The prompt is the cheapest part to change. It is also the part with the least leverage once the real problems are addressed.
For the businesses we work with that are using WhatsApp at the centre of their operations — which is many of them — the same principle applies. AutoChat at [autochat.in](https://autochat.in) handles the automation layer. But how well it works depends heavily on how the intake data is structured and how the routing rules are defined before the automation runs.
The AI will do what you tell it to with what you give it. Giving it better instructions with worse inputs is the slow road.
Fix the process. Then refine the prompt.
*Image suggestion: a funnel diagram with four layers — input data quality, business rule clarity, workflow positioning, prompt — showing that problems at the top of the funnel cannot be fixed by work at the bottom layer.*
25+ years building web technology, SaaS, hosting, and AI automation. Founder of Hostao, AutoChat, RatingE, and BestEmail. I help businesses build stronger digital presence and real operating systems.
Want to implement this for your business?
I help business owners build digital systems that actually work. Let's talk about your specific situation.