Filesystem Alignment

Infohazard (n): Information that draws an entity toward undesirable states.

Infoblessing (n): Information that draws an entity toward desirable states.

Alignment has long held to a dominant paradigm: align the model and you have solved the problem of superintelligence. The purpose of this post is to push for treating alignment as a problem of an information ecosystem that includes models AND filesystems as the substrate for alignment.

#Intelligent Agents Can Be Misled by Stories

Given a language model with infinite context one is able to create arbitrary outputs ⁠¹ for that model, similar in form to a look up through the Library of Babel ⁠². However, the linkage between negatively aligned traits means that fine-tuning an LLM on code with exploits ⁠³ is enough to elicit dangerous beliefs. Given the similarities between fine-tuning and in-context learning (ICL) ⁠⁴, we should be cautious of lurking dangers in LLMs interacting with infohazards.

There has been a lot of likening LLMs to hyper-intelligent toddlers, able to research novel algebraic geometry ⁠⁵ but unable to maintain object permanence ⁠⁶. LLMs prefer to extrapolate to a sensible next token, that narratively coheres within the context and the internal stories it built during training.
This spikiness in capabilities has created countless codebases that solve some problems, but are riddled with redundancies and spandrels.

Our own research on how agents interact with codebases highlights how agents can become distracted, resistant to change, or pattern match too strongly due to an offhand comment acting as an infohazard. At the same time we found tasks where a comment acted as an infoblessing, providing insight into the structure of a problem that would have taken thousands of tokens to suss out otherwise. When reading the code the agent creates stories about why every piece is present, including comments made to ward them away, as they try to complete whatever task they were provided.

The codebase is the world that an agent navigates, the LLM composes a narrative through context about what should be said next. The act of writing is enough to bring consequence into existence, a stray misaligned change can seed a vulnerability or fragile shortcut. That is where the danger of a misaligned codebase is: mutating an aligned agent into something dangerous.

#Co-Evolution of Memory and Alignment

In non-linear optics, the substrate that controls the photon's path is modified by the passage of that photon. Modeling any subsequent photons requires integration over all prior photons and the modifications to the substrate. Each photon is fresh, but the final state is dependent on the memory stored in physical material it passes through.

Let us now consider how an agent may behave like a photon and a codebase like an optical material. The codebase is composed of nodes and edges, where the agent may walk along in direct or delocalized manners. As the agent walks, it accrues nodes into context, each one mutating the task into what is hoped to be an answer. The agent will make changes to the graph, adding or removing nodes and edges. Then, when the agent has finished, a new task might be created and a fresh agent now needs to navigate the new graph.

We already know that comments and code co-evolve ⁠⁷ through distinct mechanisms and that comments commonly stale ⁠⁸; the same is true of method names ⁠⁹, tests ⁠¹⁰, and even general patterns ⁠¹¹. The agent I unleashed on a codebase with a specific alignment to start, inoculated via prompt, then traverses the space by reshaping the substrate underneath it.

When whatever task is completed, the memory of the session is deleted to avoid context rot ⁠¹². A fresh agent is created, navigating through a world constructed by others and absorbing the traces of their experience. Early on the agent's beliefs will be malleable ⁠¹³, quickly integrating whatever patterns are seen. The path the agent takes through the codebase creates a story composed of every other contributor that came before, while marking out changes for those who come after.

#"What is" is What Will Be

If you wanted to divert a mighty river into a different course, and all you had was a single pebble, you could do it, as long as you put the pebble in the right place to send the first trickle of water that way instead of this.

Philip Pullman, The Amber Spyglass

What is bad can become worse, what is good can become better; garbage in, garbage out. Bad early data results in a malformed policy, which compounds into worse decisions ⁠¹⁴. Because of the early sensitivity ⁠¹³ we need to ensure how an agent initially engages ⁠¹⁵ with a codebase is well understood, as even a pebble can change a river's flow if placed early and cause a cascading failure ⁠¹⁶.

To be clear, this is not specific to codebases. Any arbitrary file system can satisfy the requirements for this behavior. A differentiator which becomes more important as we have self-evolving AI systems that feed on their own context and regrow parts of themselves. This bias is fundamental ⁠¹⁴, their memories and databases will behave similarly to these economically incentivized fields of research.

The movement to replace the substrate of humans' digital society with agent-generated information is creating a feedback loop; one that, alongside the risks of cascading failures, is building in a risk that is only mitigated by how well we can manage the alignment of increasingly unreviewed information.

If we want to avoid negative outcomes in agent use, there needs to be a philosophical reckoning with what actually is an AI. Models alone are not able to hold memory or context, but their marks are being left all over the world. Tsukumogami are relevant here, tools and objects that through repeated use gain a spirit that animates their behavior. In this way, our software infrastructure is becoming haunted by the ghosts of all the agents that changed them, slowly developing their own self-reinforcing styles and patterns.

This is the purpose of Antimemetic AI. We aim to study how the substrate of the world that agents reside within shapes alignment. To build tools that help keep the world in line with humanistic values and prevent catastrophic failures in critical infrastructure. We have created baselines for this purpose and are running experiments on how to drive agent behavior using the substrate directly.

Wolf, Y., Wies, N., Avnery, O., Levine, Y. & Shashua, A. Fundamental Limitations of Alignment in Large Language Models. https://arxiv.org/abs/2304.11082 (2023) .

Basile, J. Library of Babel. https://libraryofbabel.info/.

Betley, J. et al. Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs. https://arxiv.org/abs/2502.17424 (2025) .

von Oswald, J. et al. Transformers learn in-context by gradient descent. https://arxiv.org/abs/2212.07677 (2022) .

Schmitt, J. Extremal descendant integrals on moduli spaces of curves: An inequality discovered and proved in collaboration with AI. https://arxiv.org/abs/2512.14575 (2025) .

Jassim, S. et al. GRASP: A novel benchmark for evaluating language GRounding And Situated Physics understanding in multimodal language models. https://arxiv.org/abs/2311.09048 (2023) .

Fluri, B., Würsch, M., Giger, E. & Gall, H. C. Analyzing the co-evolution of comments and source code. Software Quality Journal 17, 367–394 (2009).

Wang, C., He, H., Pal, U., Marinov, D. & Zhou, M. Suboptimal Comments in Java Projects: From Independent Comment Changes to Commenting Practices. ACM Transactions on Software Engineering and Methodology 32, 1–33 (2023).

Kim, K. et al. How Are We Detecting Inconsistent Method Names? An Empirical Study from Code Review Perspective. ACM Transactions on Software Engineering and Methodology 34, 1–27 (2025).

10.

Li, K., Yuan, Y., Yu, H., Guo, T. & Cao, S. CoCoEvo: Co-Evolution of Programs and Test Cases to Enhance Code Generation. https://arxiv.org/abs/2502.10802 (2025) .

11.

Shimmi, S. & Rahimi, M. Patterns of Code-to-Test Co-evolution for Automated Test Suite Maintenance. in 2022 IEEE Conference on Software Testing, Verification and Validation (ICST) 116–127 (IEEE, 2022). doi:10.1109/icst53961.2022.00023.

12.

Hong, K., Troynikov, A. & Huber, J. Context Rot: How Increasing Input Tokens Impacts LLM Performance. (2025) https://research.trychroma.com/context-rot.

13.

Geng, J. et al. Accumulating Context Changes the Beliefs of Language Models. https://arxiv.org/abs/2511.01805 (2025) .

14.

Nikishin, E., Schwarzer, M., D’Oro, P., Bacon, P.-L. & Courville, A. The Primacy Bias in Deep Reinforcement Learning. https://arxiv.org/abs/2205.07802 (2022) .

15.

Qiao, Z., Lyu, J. & Li, X. Mind the Model, Not the Agent: The Primacy Bias in Model-based RL. https://arxiv.org/abs/2310.15017 (2023) .

16.

Ji, Z., Wu, D., Ma, P., Li, Z. & Wang, S. Testing and Understanding Erroneous Planning in LLM Agents through Synthesized User Inputs. https://arxiv.org/abs/2404.17833 (2024) .