Series 3 | Pain is the Signal: The Hardcore Biological Epiphany of Forward Evolution
Subtitle: From Prompt Injection Failure to the Birth of Silicon Pain Receptors

Opening: Where Does Wisdom Truly Grow From?
Since "instruction manual"-style Prompt injection is a disaster, where should the "wisdom" of AI actually grow from?
In Series 2, I mentioned a failed experiment: I tried to directly write abstract principles like First Principles, Antifragility, Systems Thinking, and Long-termism into the System Prompt.
I thought that as long as the model "knew" these principles, it would naturally follow them in its actions.
The results proved this to be a very naive fantasy.
It would repeat the principles in its replies, but bypass them when invoking tools; It would promise caution in its plans, but continue to recklessly modify during execution; It would say "I understand First Principles," yet continue to optimize locally in complex code scenes.
This made me reflect on a more fundamental question:
Why does current AI always seem to lack true "intuition" and "depth"?
It has read almost all wisdom texts in human history, so why does it still act like a student who just memorized the textbook but has never experienced the beatings of reality?
The answer might lie in the difference between deep learning and biological learning.
Today, when we train models, we rely heavily on backpropagation: first complete a forward computation, then propagate gradients back based on the global error, and adjust parameters uniformly.
But life in the real world does not grow this way.
A person does not run through their entire life first, and then backpropagate gradients from the end. An organization does not execute a ten-year strategy completely, and then update its values all at once. A brain does not instantly acquire mature intuition from a perfect instruction manual.
Biology is more about constantly acting, hitting walls, and correcting within the environment.
They rely on local signals, immediate feedback, pain, surprise, failure, and repeated experience to etch the external world into their bodies bit by bit.
Hinton's Forward-Forward algorithm is not the direct technical source for PD, but it gave me an important epiphany: learning doesn't have to rely solely on ex-post global reckoning; it can also rely on local signals, local goals, and local adjustments during the execution process.
For PD, this local signal is the Pain Signal.
Biological growth is forward. The internalization of wisdom should also be forward.
01 Pain + Reflection: From Life Principles to Reflective Practice
I found that famous evolutionary closed loop in Ray Dalio's Principles:
Pain + Reflection = Progress
This is an extremely concise business and life maxim. But its power comes precisely from its simplicity.
Pain itself does not make people progress. Many people experience pain and only become numb, resentful, or avoidant.
What truly makes people progress is: after pain occurs, the system does not immediately forget it, but translates it into a reviewable, distillable, and transferable experience.
In other words:
Pain is not the goal; pain is just a signal. Reflection is not a posture; reflection is translating the signal into structure.
Here, we must clarify a core concept: In this context, "Pain" and "Failure" are absolutely not synonyms. Setting the driving force as "Pain" rather than "Failure" is precisely one of the most exquisite underlying logics in this system. We can understand this from three dimensions:
Process Friction vs. Endpoint QualificationFailure is a discrete, binary result (either success or failure); it is a post-hoc, static qualification. Pain, on the other hand, is a continuous, real-time process signal representing "friction". An Agent might retry 50 times and waste massive tokens just to fix one bug. From the result perspective, it didn't "fail", but the process was full of "pain". If driven only by "failure", the system wouldn't optimize as long as the result is successful. But a "pain-driven" system captures these frictions and triggers reflection, ensuring it gets it right the first time next around.
Biological Feedback Signal vs. Objective Event Judgment A person or a system can experience failure, but if they don't perceive pain (e.g., they just don't care), no evolution will occur. Pain is a profound biological signal for protection and evolution (like the muscle memory of immediately withdrawing your hand from a hot stove). Pain is the true biological current that can alter neural network weights. Therefore, the formula is
Pain + Reflection, notFailure + Reflection.Granularity and Directionality of Reflection The granularity of failure is too coarse; it only tells you "this path doesn't work" (e.g., CI failed, PR rejected). The granularity of pain is extremely fine and comes with coordinates, pointing precisely to where the problem lies in the system (e.g., "forgot to update the lockfile", "used
anywhen handling unknown data"). Only by recording every specific pain point can we transform pain into principles ahead of time when encountering similar contexts.
In summary, a system driven by "failure" is merely practicing how "not to fall" (the baseline of survival); whereas a system driven by "pain" is acutely capturing every slight sense of blockage, continuously eliminating friction, and ultimately moving towards "master-level elegance and intuition" (the upper limit of evolution).

Later, I dug deeper and found that as early as 1983, Donald Schön proposed a more systematic framework in The Reflective Practitioner.
He argued that true experts are not people who simply apply textbook theories. Top practitioners are formidable because, in sites full of noise, conflict, and uncertainty, they can continuously engage in two types of reflection:
Reflection-in-Action When encountering unexpected resistance, hitting the brakes in real-time and adjusting on the spot.
Reflection-on-Action Returning to the scene of the crash afterward to deconstruct the problem, the action, and the result.
These two concepts hit the core of PD's subsequent architecture almost precisely.
If an AI Agent only says "Sorry, I will pay attention next time" after a task ends, it hasn't truly reflected. If an AI Agent only writes "I will follow First Principles" in its output but cannot actively hit the brakes before a high-risk operation, it hasn't truly internalized the principle either.
The reason many AIs today still act like "typists" is not that they can't spout grand truths, but that they have no sense of pain.
When you correct its mistakes, it apologizes. When you point out it broke the architecture, it admits it. When you ask it to rewrite, it regenerates.
But in the next file, the next task, the next context window, it will very likely make the exact same mistake.
This is not called progress.
This is called repeating.
In the mid-phase of the PD project, I became increasingly certain of one thing:
To plant principles into the behavioral system of AI, you cannot just give it an instruction manual. You must make it experience the forward evolution cycle of "Pain — Brake — Debrief — Precipitate."
02 From "Knowing" to "Forgetting": The Necessary Path to Becoming a Master
Let me tell a true story that happened to me.
I once gave my wife an in-depth popularization of "First Principles."
She earnestly read related books and business cases, and even tried to use this principle for some decision-making exercises in daily life.
I was very happy at the time, even somewhat naively thinking: she had "learned" it, and her life and behavioral patterns were about to undergo a high-dimensional metamorphosis.
But reality soon slapped my face.
When she encountered real, complex, emotional-pressure-inducing reality friction, she would often still act on old inertia and intuition.
Later I realized, the protagonist of this story is not just her, but also myself.
We all too easily mistake "understanding a concept" for "already possessing the capability."
But actually:
"Knowing" a concept, "Understanding" a concept, and "Mastering" it are completely different levels of existence.
To become a true master, one must go through trials.
You must bump around in real life, suffer losses, feel pain, and experience intense friction before an abstract principle strongly connects with your actual situation.
Zeng Guofan (a prominent Chinese statesman) was not a natural-born saint in his youth either. He constantly wrote daily lessons, recorded his faults, and accepted criticism from others, using an almost clumsy method to press external discipline into his own behavioral system bit by bit.
The epiphany this gave me was:
Principles are not internalized by "remembering." Principles are internalized by being "repeatedly triggered at the pain point."

Only when a principle grows specific branches and leaves in your mind, becoming underlying muscle memory, deeply integrated into your bones, until finally you even forget the words "First Principles" but naturally practice it in every move you make, have you truly crossed the chasm.
Schön calls this state Knowing-in-Action.
That is: you no longer need to invoke a principle, because you have lived as that principle.
The same is true for large models today.
They have read countless texts about First Principles, but that doesn't mean they have internalized First Principles in action.
Writing human wisdom into a Prompt can at most leave the AI at the surface level of "knowing."
If you don't let it experience the cycle of "bump — debrief — correct" in the real ruins of code, it will forever remain a test-taker wearing a philosophical nameplate.
03 Establishing Silicon "Pain Receptors"
Therefore, in the new architecture of Principles Disciple (PD), I introduced the Pain Signal mechanism.
But I must clarify one point first:
Pain is not a punishment. Pain is also not anthropomorphizing AI into a lifeform that suffers.
In PD, Pain is a system signal: when AI's behavior causes rework, risk, entropy increase, permission blocks, or goal deviation, the system must make this cost explicit and trigger deceleration, recording, and reflection.
Human pain is for the body to recognize harm. PD's pain is for the Agent to recognize the cost of behavior.
If there is no pain, the Agent will treat all errors as ordinary context.
Write bad code? Just change it. Break the architecture? Just apologize. Repeated rework? Just keep generating. Bypass principles? Just pay attention next time.
This is exactly the problem.
Without pain, mistakes leave no trace. Without traces, reflection cannot occur. Without reflection, principles will not grow.
Therefore, PD needs a "Silicon Pain Receptor Grading System."
1. Low-Level Pain: Projection of Human Pain
The most direct Pain comes from the developer.
When you angrily reject a PR in Code Review, or you discover that the code written by the Agent is creating an architectural disaster, this human frustration should not just stop at a sentence saying "this is wrong."
It should be captured by the system and transformed into the highest-priority Pain Signal.
Because in real projects, human anger is often not emotional noise, but a concentrated embodiment of systemic costs:
- It wasted your time;
- It destroyed existing boundaries;
- It increased future maintenance costs;
- It lowered your trust in the Agent.
These are the pains of the real world.
2. Mid-Level Pain: Quantification of System Friction
The second layer of Pain comes from the system itself.
Not all errors need to be pointed out by humans personally. Many signs of "going bad" can be automatically observed by the system.
For example:
- The Agent repeatedly modifies the same logic without substantive progress;
- The Diff continuously balloons, but test coverage does not increase;
- Permission errors, build failures, and lint failures appear frequently;
- The scope of modification continues to expand, but the task goal does not become clearer;
- Multiple tool calls only create noise without advancing the goal.
PD compresses these scattered anomaly signals into a unified metric, which I temporarily call GFI (Global Friction Index).
GFI is not for showing off technical skills.
It only answers a very simple question:
Is this Agent currently spinning in circles at a very high cost?
If the answer is yes, then the system should not continue to indulge its "effortful execution."
Effort does not equal progress. High-frequency operations do not equal effective actions.
3. High-Level Pain: The Emptiness of Goal Deviation
The third layer of Pain comes from long-term goals.
This is the most important, yet hardest layer.
In long-horizon tasks, the Agent sometimes looks extremely diligent: It constantly modifies files, constantly runs commands, constantly submits results, and constantly explains its behavior.
But from a longer timeline, it might just be busy for nothing.
For example:
- Frequent
git revertmeans it is repeatedly overthrowing itself; - Constantly expanding scope means it is fleeing the original goal;
- Fixing one bug but introducing three new ones means it is polluting the system with local patches;
- Pursuing the completion of the current task but destroying the long-term architectural direction of the project.
This kind of pain is not brought by "errors," but by "goal deviation."
It is more like a sense of emptiness at the cognitive level:
I did a lot, but I did not get any closer to the goal.
If an Agent cannot perceive this kind of pain, it will turn into an extremely diligent disaster-making machine.
![]()
A More Foundational Question: The Eyes Decide the World
But before we go further, we have to ask something even more foundational:
Large language models do not have innate pain receptors.
Human pain is hardwired — touch fire, and the nervous system completes perception, transmission, and response within milliseconds, without any additional design. But an AI Agent's "pain" must be explicitly constructed by an external system. If you don't build it, the agent cannot perceive it. Without perception, there is no reflection. Without reflection, principles will not grow.
This points to a fact that is easy to overlook:
The kinds of failure an agent can perceive are not a natural fact. They are a product design choice.
The kind of eyes you give an agent determines the kind of world it can see.
An agent that can only perceive "tool call failures" — no matter how clever its diagnostician — can only circle around the tool-invocation layer. An agent that can perceive "goals drifting off course" gives humans a chance to make more meaningful judgments based on that evidence. An agent that can perceive "users repeatedly correcting it" gives humans a chance to crystallize principles aligned with human expectations.
These three layers of perception do not automatically produce three layers of wisdom — perception is a necessary condition, not a sufficient one. But without perception, even the necessary condition is missing.
So beyond Pain Signal itself, PD has to keep asking a plainer question:
What should the agent see? Have we given it a wide enough sensory system?
Memory layer and learning layer have been discussed extensively in recent years, but the sensing layer — "how should an agent perceive what's going wrong with itself" — has received noticeably less attention. How deeply this layer is built may decide where the ceiling of an agent system actually sits.
04 What Happens After Pain Crosses the Threshold?
In the world of PD, once Pain crosses the threshold, it won't just generate an apology.
It will trigger a chain reaction.
1. Forced Brake: Reflection-in-Action
The first step is a forced brake.
When the system judges that the current execution flow already has obvious risks, PD will physically block continued execution and force deceleration.
This corresponds to what Schön calls Reflection-in-Action.
That is: don't wait until the entire task has completely failed to summarize, but identify resistance, pause inertia, and re-judge on the scene of the action.
This step is crucial.
Because many disasters do not happen when "you have absolutely no idea what to do," but when "it looks like you can still keep doing."
The Agent will tend to keep generating. Keep modifying. Keep remedying. Keep explaining.
But PD must say at certain moments:
Stop. Don't use actions to cover up a lack of judgment anymore. Return to the problem itself first.
2. Scene Freeze: Turning Errors from Smoke into Specimens
The second step is scene freezing.
If an error is not recorded, it will quickly turn into smoke. You only remember "it crashed again," but you don't know exactly where it started to deviate.
So PD needs to capture the key context of this moment:
- What was the goal at the time?
- What files did the Agent reference?
- What tool calls did it execute?
- Why did it choose this path?
- Which signal exposed the risk earliest?
- At which node did humans intervene?
- What is the final cost?
This information will be written into a traceable Decision Ledger.
Not to judge the past, but to give future systems a memory.
Without a ledger, experience evaporates. Without experience, pain is just pain.
3. Forward Reflection: Reflection-on-Action
The third step is forward reflection.
I originally called this step the "Interrogation Room," but later realized this word has a punitive connotation and is not accurate enough.
A more appropriate name is:
The Debriefing Pod.
The AI will be brought into a special debriefing context.
Here, it cannot continue to rush to fix the code. It must face the scene of the error that just happened, hold that abstract principle, and re-explain this failure.
For example:
Principle: Think twice before you act. Scene: Force-modifying multiple files in an unsynchronized branch state, causing conflict expansion. Debrief Question: What checks should have been triggered at the time? Next time encountering a similar scene, what conditions must be met first?
This process must consume additional Tokens.
This is very important.
Because in system design, cost itself is a signal.
If every crash must pay an extra computational cost, extra recording cost, and extra debriefing cost, then the Agent's behavioral system will gradually form a new tendency:
Do not easily create pain. Pause before high-risk actions. Judge first, then execute.
This is what PD calls "Forward Evolution."
It's not global backpropagation after training ends, But changing future behavior using local pain signals during the action process.
05 From Text Memory to "Branches and Leaves" Growth
Through the Pain Signal, principles are no longer chicken soup floating in the air.
They will become lessons after real, repeated falls.
This is what I mean by:
Letting principles grow branches and leaves in different soils.
A seed of "think twice before you act," when written in a Prompt, is just a beautiful sentence.
But when it is repeatedly triggered by Pain Signals, it will grow into different forms in different scenes.
In the soil of handling Git conflicts, it will grow into:
You must pull the latest branch first; force pushing in an unsynchronized state is strictly prohibited.
In the soil of refactoring old logic, it will grow into:
Modifying more than 100 lines of core logic without unit test support is strictly prohibited.
In the soil of cross-file modification, it will grow into:
Before modifying more than 3 key files, an impact scope list must be generated first.
In the soil of limited permissions, it will grow into:
After two consecutive permission failures, continuing to try the same path is prohibited; you must switch strategies or request human confirmation.
In the soil of ambiguous goals, it will grow into:
Before the requirement boundaries are clear, only propose plans; do not execute destructive modifications.
These branches and leaves are true high-dimensional cognition.
Because they are not abstract concepts. They are concrete, executable, triggerable, and defensive knowledge.
This is the key transition from "knowing" to "internalizing."
The principles in the Prompt are text memory. The rules precipitated after Pain are behavioral memory.
Text memory is easily forgotten. Behavioral memory will change the next action.
06 The Next Question: Will Too Much Wisdom Crush the System?
But problems follow one after another.
If every Pain grows a new rule, and every failure precipitates a new leaf, won't the tree of principles quickly become too lush?
More and more branches and leaves, denser and denser rules, and more and more complex checklists.
Will another disaster eventually occur:
Will the AI, because it has learned too much "wisdom," instead be crushed by it, becoming slow to act, bloated in context, and confused in judgment?
This is not a small problem.
The real biological brain does not retain all connections infinitely. It engages in Synaptic Pruning: retaining high-frequency, stable, and important patterns, and pruning away low-value, repetitive, and outdated connections.
In other words, true learning is not just about adding memory. True learning also includes forgetting, compressing, and hardening.
The same is true for humans.
A novice will memorize many rules. An expert will invoke a few key principles. A master even forgets the names of the rules, yet naturally avoids pitfalls in action.

So in PD, we must also answer the same questions:
- Which rules generated by Pain should be retained?
- Which are just accidental noise?
- Which soft principles should be hardened into code checks?
- Which high-frequency experiences should be moved out of the context and into the system layer?
- Which outdated branches and leaves should be pruned?
This is the theme Series 4 will enter:
The Alchemy of Soft to Hard Rules.
If the Pain Signal solves "how principles grow," then the soft-to-hard transition solves:
How principles are compressed, hardened, migrated, and finally transformed from prompt words into system instinct.
Conclusion: Wisdom is Not Instilled, but Sculpted by Pain
In Series 1, I discussed why the most scarce thing in the AI era is not execution power, but judgment.
In Series 2, I recorded the failure of the Prompt injection experiment: Writing wisdom into natural language cannot stably change the Agent's behavior.
And in Series 3, I am increasingly convinced:
Wisdom is not instilled. Wisdom is sculpted by pain, reflection, and real feedback.
This is true for people. This is true for organizations. This should also be true for silicon-based systems.
PD's Pain Signal is not for punishing AI. It is to give errors weight, give reflection a scene, and give principles the soil to grow.
Without Pain, principles are just chicken soup. Without Reflection, pain is just noise. Without Progress, the system is just repeating.
And one step earlier still: without a wide enough sensory system, the agent doesn't even know where it hurts. The kinds of pain decide where reflection can point, and where reflection can point decides what soil principles can grow in.
True wisdom must grow from the friction of forward actions, time and time again.
If you want to see how we compile philosophy into code, please look forward to Series 4: The Alchemy of Soft to Hard Rules.
— The Reed