Part 2 | The Illusion of Wisdom: Why "Silicon Chicken Soup" is Destined to Become Cyber Noise?
Subtitle: Stripping the Pretense of Prompt Engineering—From the "Sycophantic Personality" of LLMs to the Loss of Macro-Thinking

Prologue: The "Cyber Brute" Wearing a Philosophical Crown
After realizing the "Helmsman Crisis" and the necessity of "Constructive Friction" mentioned in Part 1, I once naively thought that injecting a soul into AI required nothing more than a perfect "instruction manual."
If we wanted to create friction and force the AI to slow down and think, wouldn't the simplest solution be to teach it the most brilliant mental models in human history?
Thus, I initiated the earliest and most "illusion-filled" phase of the PD (Principles Disciple) project: The Prompt Injection Experiment.
I gathered Aristotle's "First Principles," Software Engineering's "High Cohesion and Low Coupling," and feedback loops from Cybernetics. I meticulously formatted these principles, condensed from human wisdom, and solemnly wrote them into the System Prompt.
At that moment, I felt a sense of "God-view" euphoria: My Agent was no longer just a tool for writing code; it had donned the "philosophical crown."
But when the code actually started running, reality slapped me hard in the face.
The Agent remained as defiant as ever. It was still that "cyber brute" who, upon receiving an instruction, would roll up its sleeves and start refactoring core modules without even glancing at the context.
I told it to "think twice before acting," yet when faced with complex asynchronous logic, it overrode production code without even assessing deadlock risks. The test results were grim: in a series of long-horizon tasks, 6 out of 10 failed in various bizarre ways.
I finally realized a hard truth: Abstract principles in natural language are nothing more than high-frequency "low-value noise" to an LLM.
Why is it that even though the parameter space of a large model absolutely contains knowledge of "First Principles," "Anti-fragility," and "Long-termism," it still behaves like an unprincipled laborer when it gets down to work?
01 The Tyranny of Mathematics: Why Models Excel at "Submissive Execution"
Don't be too quick to blame the AI for being disobedient. In a sense, this is exactly what it was trained to be.
Current LLMs typically undergo a crucial stage before reaching users: Post-training and Alignment (RLHF or DPO). These techniques rely on Preference Data.
The model isn't learning "what wisdom is" in an abstract sense. It's learning what kind of answer is more likely to be labeled as "good" by a human. In many consumer-grade interaction scenarios, answers that "look like they are helping immediately" are more likely to receive instant positive feedback.

Consequently, a subtle bias emerges: Current consumer-grade AI preference data and product experiences often reward "instant help" rather than "long-term correctness."
Ray Dalio, founder of Bridgewater Associates, emphasized that one of the most valuable assets in top-tier decision-making teams is "Thoughtful Disagreement." Yet, many consumer-grade AI assistants are trained to be the opposite: they aren't there to have high-quality disagreements with you; they are there to make you feel "understood, supported, and immediately satisfied."
If we want AI to be a true brain trust rather than a submissive executor, it must possess a capability: In critical moments, not to rush into completing a task, but to dare to question the task.
02 Intelligent Machines vs. Wise Machines: What exactly is LLM missing?
We need to distinguish between two concepts: Intelligence and Wisdom.
Intelligence is the ability to find an optimal solution under a given goal. Wisdom is the ability to choose the right goal in an uncertain world and not betray it over the long term.
Intelligence is local optimization; Wisdom is long-term calibration. True wisdom is often not about knowing more, but about betraying your own principles less at critical moments. From this perspective, current LLMs are High-Intelligence Machines. They lack the structure to carry wisdom.
03 The Carnival of False Gods: Even with <think>, it might still be a Tactical Test-Taker
One might argue: "But we have reasoning models like DeepSeek-R1 now! They output <think>, they self-check, they reflect. Isn't that what you call 'thinking twice'?"
This is indeed a major breakthrough. But we must see the boundaries. Math and code tasks have a huge advantage: They can be verified. The feedback is nearly binary (0 or 1). This means the model can be driven by a relatively clear objective function: Reason better, get the right answer, get a higher reward.

The problem is, PD doesn't care if a model can "think a few more steps." PD cares about: Can it judge if a task is worth doing? Can it identify long-term risks before a major refactor?
Tactical diligence cannot compensate for the lack of a macro-timescale goal.
04 The Missing Anchor: Thinking Without a "Goal" is Just Circular
Following the gap in timescales, we touch upon the most fatal blind spot in the AI cognitive domain: Goal Anchors.
Human deep thinking often spans months or years. We have "the weight of time" because we have a long-term goal pulling us across different contexts. A common LLM is a Presentist. Its world is destroyed every time a context is cleared.
Without a Goal as an anchor across the long horizon, all AI reflection is just spinning in place within the current context. It lacks: The weight of time, the gravity of goals.

05 Wisdom is Not a Bigger Model, but a Settlable System
Economic historian Burton Malkiel once made a famous claim in A Random Walk Down Wall Street: if markets are efficient, a blindfolded monkey throwing darts at a newspaper's financial pages could select a portfolio that performs as well as those selected by experts.
In the context of PD, this takes on a new meaning: When effective "wisdom" is settled into rules, risk controls, and feedback systems, the scale of the executor (the model) will no longer be the sole variable of success.
A smaller model, equipped with external long-term goals, principle knowledge bases, and feedback loops, might perform more reliably and with more "wisdom."
06 Breakthrough: Building a Macro Engine Outside the Model
Since natural language principles cannot stably constrain long-horizon behavior, the direction of the PD project is clear: Abandon the superstition of Prompt Engineering and build a macro engine outside the model.
PD is not a set of fancy prompts. It is the missing macro-timescale axis for large models. By introducing:
- Goal Registry: For long-term objectives and priorities.
- Principle Compiler: To compile abstract principles into hard AST/Lint constraints.
- Friction Trigger: To force pauses before high-risk actions.
- Decision Ledger: To record human compromises and opposing views.
- Feedback Loop: To update principle weights based on real-world results.
Epilogue: From Wisdom Text to Wisdom Structure
The prompt injection experiment taught me one thing: LLMs are intelligence machines, but not yet wisdom machines.
Intelligence is in the parameters. Wisdom is settled in principles, goals, memory, feedback, and costs. If principles are just natural language in a prompt, they will eventually decay into silicon chicken soup. If they are compiled into structures, that is where true wisdom begins to grow.
When philosophy must transform from words into physical defense, we need a new alchemy. In the next part, I will record how PD attempts to compile abstract principles into executable, triggerable, and feedback-ready code hard rules.
— The Reed