The Engineer's Alpha

Posts tagged with "Agent"

Loop Engineering's Missing Building Block: Why Embedded Systems Are a Safer Playground for AI Agents

Addy Osmani's Loop Engineering describes five building blocks for autonomous AI loops, but misses a critical question: when AI says 'done,' how do you know it's actually done? In AOSP / embedded environments, the answer is deterministic verification, the system itself tells you.

Loop Engineering 的第六塊積木：為什麼嵌入式系統天生更適合 AI Agent

Addy Osmani 的 Loop Engineering 描述了 AI 自動化迴圈的五塊積木，但漏了一個關鍵問題：AI 說「做好了」，你怎麼知道它是真的？在 AOSP / 嵌入式環境裡，答案是 deterministic verification，系統本身會告訴你。

LoRA Space Navigation: Three Ways to Specialize LLMs Without Fine-Tuning

LoRA parameter space can be programmatically navigated, no gradient descent needed. A survey of 15+ papers across three research schools (generators, routers, mergers) and what they mean for agent systems.

LoRA 空間導航：不用 Fine-Tune 就能特化 LLM 的三條路

LoRA 參數空間可以被程式化導航，不需要 gradient descent。涵蓋三個研究流派（生成、路由、合併）共 15+ 篇論文的調查，以及對 agent 系統的實際意義。

Claude Code Skill Safety: From 'Please Stop' to 'You Can't Move'

38 Skills, three layers of defense, one hard lesson: natural language instructions are not a safety mechanism. How I systematically hardened 12 unprotected destructive Skills with PreToolUse Hooks, Skill splitting, and disallowed-tools.

Claude Code Skill 安全性：從「拜託你停下來」到「你根本動不了」

38 個 Skills、三層防護、一個血淚教訓：自然語言指令不是安全機制。本文記錄我如何用 PreToolUse Hook、Skill 拆分和 disallowed-tools 系統性地修補 12 個毫無 checkpoint 的破壞性 Skills。

Safety Gates in Claude Code Skills: From Auditing 35 Skills to a Three-Layer Protection Model

I assumed writing 'Use AskUserQuestion' in a Skill was a hard constraint. After auditing 35 Skills, reading the official docs, and digging through GitHub Issues, I found out: the model uses the same mechanism to decide whether to obey your CHECKPOINT and whether to invoke your tool. There's only one gate that's truly 100%.

Claude Code Skill 的安全閘門：從 35 個 Skills 的審計到三層防護模型

我以為在 Skill 裡寫 Use AskUserQuestion 就是 hard constraint。審計完 35 個 Skills、查完官方文檔和 GitHub Issues 之後發現，模型用同一套機制決定要不要理你的 CHECKPOINT 和要不要調用你的 tool。真正 100% 的閘門只有一個。

Git as an External Brain for Claude Code: Beyond MEMORY.md

MEMORY.md isn't the end of the road for AI Agent memory. When project scale exceeds what a context window can hold, Git becomes the truly scalable external memory. This post breaks down the three layers of memory, Git's role among them, and which practices have research backing vs. which are just my own experiments.

Git 作為 Claude Code 的外接大腦：超越 MEMORY.md 的記憶架構

MEMORY.md 不是 AI Agent 記憶的終點。當專案規模超過 context window 能承載的範圍，Git 才是真正能無限擴展的外接記憶體。這篇拆解記憶的三個層次、Git 在其中的角色、以及哪些做法有研究支撐、哪些只是我自己的實驗。

26% 的真相：Mem0 論文、Benchmark 戰爭，和 Graph Memory 的承諾與現實

Mem0 的 26% accuracy boost 是怎麼算出來的？Zep 為什麼說 Mem0 作弊？Graph Memory 真的比純 Vector 好嗎？這篇文章逐層拆解 arXiv 論文，還原 benchmark 爭議真相，給你 production 選型的真實判斷依據。

The Truth About 26%: Mem0's Paper, Benchmark Wars, and the Promise vs Reality of Graph Memory

How was Mem0's 26% accuracy boost actually calculated? Why does Zep accuse Mem0 of cheating? Is Graph Memory really better than pure Vector? This article dissects the arXiv paper layer by layer, reveals the truth behind the benchmark controversy, and gives you real production selection criteria.

2026 AI Agent Memory Wars：三大流派的技術對決

AI Agent 的記憶問題終於有了認真的解法。Graph-based、OS-inspired、Observational——三大架構流派正在正面交鋒。這篇文章幫你搞懂它們的設計哲學、技術 trade-off，以及什麼場景該用哪一個。

2026 AI Agent Memory Wars: Three Architectures, Three Philosophies

AI Agent memory finally has serious solutions. Graph-based, OS-inspired, Observational—three architectural schools are going head-to-head. This article breaks down their design philosophies, technical trade-offs, and when to use which.

Cursor's $29B Secret: The Deleted Shadow Workspace, Reverse-Engineered

A deep dive into Cursor's Shadow Workspace architecture—the core innovation that once gave Cursor a massive edge over Copilot, why it quietly disappeared from settings, and what you can learn from it.

Cursor 的 $29B 秘密：被刪除的 Shadow Workspace 技術解密

深入解析 Cursor 的 Shadow Workspace 技術架構——這個讓 Cursor 碾壓 Copilot 的核心創新，為何後來從設定中消失？以及你可以從中學到什麼。