Yeah, we have a site documentation and some first Python code now.
Who are we?
Three guys moving AI safety forward in their own way and in their spare time.
What do we want to do?
We want to implements agents in simulated environments according to the brain-like AGI paradigm by Steven Byrnes.
https://www.lesswrong.com/posts/c2tEfqEMi6jcJ4kdg/brain-like-agi-project-aintelope
We're publishing a benchmarking test suite for AI safety and alignment, with a focus on multi-objective, multi-agent, cooperative scenarios. The environments are gridworlds that chain together to produce a hidden performance score on the prosocial behavior of the agents. This platform will be open sourced and accessible, with support for PettingZoo.
We hope to facilitate further discussion on evaluation and testing for agents with this.
https://github.com/biological-alignment-benchmarks/biological-alignment-gridworlds-benchmarks
We had a presentation at the VAISU unconference:
Demo and feedback session: AI safety benchmarking in multi-objective multi-agent gridworlds - Biologically essential yet neglected themes illustrating the weaknesses and dangers of current industry standard approaches to reinforcement learning. (Video, Slides)
We presented the aintelope benchmark at the Foresight conference.
Abstract: Developing safe agentic AI systems benefits from automated empirical testing that conforms with human values, a subfield that is largely underdeveloped at the moment. To contribute towards this topic, present work focuses on introducing biologically and economically motivated themes that have been neglected in the safety aspects of modern reinforcement learning literature, namely homeostasis, balancing multiple objectives, bounded objectives, diminishing returns, sustainability, and multi-agent resource sharing. We implemented eight main benchmark environments on the above themes, for illustrating the potential shortcomings of current mainstream discussions on AI safety.
Publication link: https://arxiv.org/abs/2410.00081
Repo: https://github.com/biological-alignment-benchmarks/biological-alignment-gridworlds-benchmarks
A methodology brainstorming document for identifying when, why, and how LLMs collapse from multi-objective and/or bounded reasoning into single-objective, unbounded maximisation on Biologically & Economically aligned benchmarks; showing practical mitigations; and performing the experiments rigorously.
The subjects covered include: Stress & Persona, Memory & Context, Prompt Semantics, Hyperparameters & Sampling, Diagnosing Consequences & Correlates, Interpretability & White/Black-Box Hybrid Benchmark & Environment Variants, Automatic Failure Mode Detection and Metrics, Self-Regulation & Meta-Learning Interventions.
https://www.lesswrong.com/posts/6Sf9KMMDMFSauDe85/ai-safety-interventions
A comprehensive overview of current AI safety, alignment, and control interventions.
https://www.lesswrong.com/posts/vtxZtjiR9Rb9HC72N/parameters-of-metacognition-the-anesthesia-patient
A single clinical case study is used as a running example to illustrate three empirical aspects of cognition that are well-documented but rarely used together: Working Memory Bandwidth, Nested Observer Depth, and Metacognitive Intransparency.