Our work focuses on the safety of agentic AIs through verification, unsupervised discovery, and neurologically inspired solutions.
In project aintelope we're developing a virtual platform that allows experimentation with multiple forms of agentic systems in various environments, and the consequent benchmarking of the alignment of these agents. We hope that our work will find solutions for alignment and facilitate further discussion on how cooperation works in theory and practice.
Yeah, we have a site documentation and some first Python code now.
Who are we?
Three guys moving AI safety forward in their own way and in their spare time.
What do we want to do?
We want to implements agents in simulated environments according to the brain-like AGI paradigm by Steven Byrnes.
https://www.lesswrong.com/posts/c2tEfqEMi6jcJ4kdg/brain-like-agi-project-aintelope
We're publishing a benchmarking test suite for AI safety and alignment, with a focus on multi-objective, multi-agent, cooperative scenarios. The environments are gridworlds that chain together to produce a hidden performance score on the prosocial behavior of the agents. This platform will be open sourced and accessible, with support for PettingZoo.
We hope to facilitate further discussion on evaluation and testing for agents with this.
https://github.com/biological-alignment-benchmarks/biological-alignment-gridworlds-benchmarks
We had a presentation at the VAISU unconference:
Demo and feedback session: AI safety benchmarking in multi-objective multi-agent gridworlds - Biologically essential yet neglected themes illustrating the weaknesses and dangers of current industry standard approaches to reinforcement learning. (Video, Slides)
We presented the aintelope benchmark at the Foresight conference.
Abstract: Developing safe agentic AI systems benefits from automated empirical testing that conforms with human values, a subfield that is largely underdeveloped at the moment. To contribute towards this topic, present work focuses on introducing biologically and economically motivated themes that have been neglected in the safety aspects of modern reinforcement learning literature, namely homeostasis, balancing multiple objectives, bounded objectives, diminishing returns, sustainability, and multi-agent resource sharing. We implemented eight main benchmark environments on the above themes, for illustrating the potential shortcomings of current mainstream discussions on AI safety.
Publication link: https://arxiv.org/abs/2410.00081
Repo: https://github.com/biological-alignment-benchmarks/biological-alignment-gridworlds-benchmarks
A methodology brainstorming document for identifying when, why, and how LLMs collapse from multi-objective and/or bounded reasoning into single-objective, unbounded maximisation on Biologically & Economically aligned benchmarks; showing practical mitigations; and performing the experiments rigorously.
The subjects covered include: Stress & Persona, Memory & Context, Prompt Semantics, Hyperparameters & Sampling, Diagnosing Consequences & Correlates, Interpretability & White/Black-Box Hybrid Benchmark & Environment Variants, Automatic Failure Mode Detection and Metrics, Self-Regulation & Meta-Learning Interventions.
https://www.lesswrong.com/posts/6Sf9KMMDMFSauDe85/ai-safety-interventions
A comprehensive overview of current AI safety, alignment, and control interventions.