Blog
Introducing aintelope
Yeah, we have a site documentation and some first Python code now.
Who are we?
Three guys moving AI safety forward in their own way and in their spare time.
What do we want to do?
We want to implements agents in simulated environments according to the brain-like AGI paradigm by Steven Byrnes.
AI safety benchmarking
We're publishing a benchmarking test suite for AI safety and alignment, with a focus on multi-objective, multi-agent, cooperative scenarios. The environments are gridworlds that chain together to produce a hidden performance score on the prosocial behavior of the agents. This platform will be open sourced and accessible, with support for PettingZoo.
We hope to facilitate further discussion on evaluation and testing for agents with this.
https://github.com/aintelope/biological-compatibility-benchmarks
aintelope at VAISU
We had a presentation at the VAISU unconference:
Demo and feedback session: AI safety benchmarking in multi-objective multi-agent gridworlds - Biologically essential yet neglected themes illustrating the weaknesses and dangers of current industry standard approaches to reinforcement learning. (Video, Slides)
aintelope presentation at Foresight's Vision Weekend Europe
We presented the aintelope benchmark at the Foresight conference.
A working paper: From homeostasis to resource sharing: Biologically and economically compatible multi-objective multi-agent AI safety benchmarks
Abstract: Developing safe agentic AI systems benefits from automated empirical testing that conforms with human values, a subfield that is largely underdeveloped at the moment. To contribute towards this topic, present work focuses on introducing biologically and economically motivated themes that have been neglected in the safety aspects of modern reinforcement learning literature, namely homeostasis, balancing multiple objectives, bounded objectives, diminishing returns, sustainability, and multi-agent resource sharing. We implemented eight main benchmark environments on the above themes, for illustrating the potential shortcomings of current mainstream discussions on AI safety.
Publication link: https://arxiv.org/abs/2410.00081
Repo: https://github.com/aintelope/biological-compatibility-benchmarks
AI Safety Camp project proposals on Universal Values, Risk Aversion vs Prospect Theory, and Proactive AI Safety
Roland Pihlakas will be running one of three possible projects, based on which one receives the most interest. Below are included the summaries for the respective projects. The link to the full project descriptions document is here.
---
Category: Evaluate risks from AI
(32a) Creating new AI safety benchmark environments on themes of universal human values
Summary:
We will be planning and optionally building new multi-objective multi-agent AI safety benchmark environments on themes of universal human values.
Based on various anthropological research, I have compiled a list of universal (cross-cultural) human values. It seems to me that various of these universal values resonate with concepts from AI safety, but use different keywords. It might be useful to map these universal values to more concrete definitions using concepts from AI safety.
One notable detail in this research is that in case of AI and human cooperation, the values are not symmetric as they would be in case of human-human cooperation. This arises because we can change the goal composition of aents, but not of humans. Additionally there is the crucial difference that agents can be relatively easily cloned, while humans cannot. Therefore, for example, a human may have an universal need for autonomy, while an AI agent might imaginably not have that need built-in. If that works out, then the agent would instead have a need to support human autonomy.
The objective of this project would be to implement these mappings of concepts into tangible AI safety benchmark environments.
---
Category: Agent Foundations
(32b) Balancing and Risk Aversion versus Strategic Selectiveness and Prospect Theory
Summary:
We will be analysing situations and building an umbrella framework about when either of these incompatible frameworks would be more appropriate in describing how we want safe agents to handle choices relating to risks and losses in a particular situation.
Economic theories often focus on the “gains” side of utility and how our multi-objective preferences are balanced there. A well known formulation is to use diminishing returns - a concave utility function, which mathematically results in a balancing action where an individual prefers averages in all objectives to extremes in a few objectives.
But, what happens in the negative domain of utility? How do humans handle risks and losses? Turns out, it might be not so simple as with gains.
One might imagine that one could apply a concave utility function to the negative domain as well, in order to balance the individual losses, or to equalise and provide an equal treatment in case of multiple individuals. This would resonate with the idea that generally people prefer averages in all objectives to extremes in a few objectives. As an example, a negative exponential utility function would achieve that.
Yet there is a well known theory named “Prospect theory”, which claims instead that our preferences in the negative domain are convex.
As I see it, this contradiction between the theories of “preferring averages over extremes” and “the Prospect Theory” may be underexplored, especially with regards to how it is relevant to AI safety.
---
Category: Train Aligned/Helper AIs
(32c) Act locally, observe far - proactively seek out side-effects
Summary:
We will be building agents that are able to solve an already implemented multi-objective multi-agent AI safety benchmark that illustrates the need for the agents to proactively seek out side-effects outside of the range of their normal operation and interest, in order to be able to properly mitigate or avoid these side-effects.
In various real-life scenarios we need to proactively seek out information about whether we are causing or about to cause undesired side effects (externalities). This information either would not reach us by itself, or would reach us too late.
This situation arises because attention is a limited resource. Similarly, our observation radius is limited. The same constraints apply to AI agents as well. We humans, as well as agents, would prefer to focus only on the area of our own activity, and not on surrounding areas, where we do not intend to operate. Yet our local activity causes side effects farther away, and we need to be accountable and mindful of that. Then these far away side effects need to be sought out with extra effort, in order to mitigate them as soon as possible, or even better, in order to proactively avoid them altogether.
I have built a multi-agent multi-objective gridworlds environment that illustrates this problem. I am seeking a team who would figure out the principles necessary or helpful for solving this benchmark, and who would build agents which illustrate these important safety principles.
Presentation at Foresight Institute's Intelligent Cooperation Group
The subject of the presentation was describing why we should consider fundamental yet neglected principles from biology and economics when thinking about AI alignment, and how these considerations will help with AI safety as well (alignment and safety were treated in this research explicitly as separate aspects, which both benefit from consideration of aforementioned principles). These principles include homeostasis and diminishing returns in utility functions, and sustainability. Next introducing our multi-objective and multi-agent gridworlds-based benchmark environments we have created for measuring the performance of machine learning algorithms and AI agents in relation to their capacity for biological and economical alignment. The benchmarks are now available as a public repo. The presentation ends with mentions of some of the related themes and dilemmas not yet covered by these benchmarks, and descriptions of new benchmark environments we have planned for future implementation.
Presentation recording:
https://www.youtube.com/watch?v=DCUqqyyhcko
Slides:
https://bit.ly/beamm
LessWrong post - Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well)
A few excerpts follow. For the full text, please read the post at LessWrong.
https://www.lesswrong.com/posts/vGeuBKQ7nzPnn5f7A/why-modelling-multi-objective-homeostasis-is-essential-for
Much of AI safety discussion revolves around the potential dangers posed by goal-driven artificial agents. In many of these discussions, the agent is assumed to maximise some utility metric over an unbounded timeframe. This simplification, while mathematically convenient, can yield pathological outcomes. A classic example is the so-called “paperclip maximiser”, a “utility monster” which steamrolls over other objectives to pursue a single goal (e.g. creating as many paperclips as possible) indefinitely. “Specification gaming”, Goodhart’s law, and even “instrumental convergence” are also closely related phenomena.
However, in nature, organisms do not typically behave like pure maximisers. Instead, they operate under homeostasis: a principle of maintaining various internal and external variables (e.g. temperature, hunger, social interactions) within certain “good enough” ranges. Going far beyond those ranges — too hot, too hungry, too socially isolated — leads to dire consequences, so an organism continually balances multiple needs. Crucially, “too much of a good thing” is just as dangerous as too little.
This post argues that an explicitly homeostatic, multi-objective model is a more suitable paradigm for AI alignment. Moreover, correctly modelling homeostasis increases AI safety, because homeostatic goals are bounded — there is an optimal zone rather than an unbounded improvement path. This bounding lowers the stakes of each objective and reduces the incentive for extreme (and potentially destructive) behaviours.
Homeostasis — the idea of multiple objectives each with a bounded “sweet spot” — offers a more natural and safer alternative to unbounded utility maximisation. By ensuring that an AI’s needs or goals are multi-objective and conjunctive, and that each is bounded, we significantly reduce the incentives for runaway or berserk behaviours.
Such an agent tries to stay in a “golden middle way”, switching focus among its objectives according to whichever is most pressing. It avoids extremes in any single dimension because going too far throws off the equilibrium in the others. This balancing act also makes it more corrigible, more interruptible, and ultimately safer.
In short, modelling multi-objective homeostasis is a step toward creating AI systems that exhibit the sane, moderate behaviours of living organisms — an important element in ensuring alignment with human values. While no single design framework can solve all challenges of AI safety, shifting from “maximise forever” to “maintain a healthy equilibrium” is a crucial part of the solution space.
BioBlue: Biologically and economically aligned AI safety benchmarks for LLM-s with simplified observation format
We aim to evaluate LLM alignment by testing agents in scenarios inspired by biological and economical principles such as homeostasis, resource conservation, long-term sustainability, and diminishing returns or complementary goods.
So far we have measured the performance of LLM-s in three benchmarks (sustainability, single-objective homeostasis, and multi-objective homeostasis), in each for 10 trials, each trial consisting of 100 steps where the message history was preserved and fit into the context window.
Our results indicate that the tested language models failed in most scenarios. The only successful scenario was single-objective homeostasis, which had rare hiccups.
Authors: Roland Pihlakas, Shruti Datta Gupta, Sruthi Kuriakose 2025)
repo and PDF report