AI alignment metric: LIFE

Comments on Hacker News (mirror does not support comments)

This is conceptual / philosophical metric that in this phase is not focusing on technical implementation.

making the transition from conceptual to technical is when many of the most important problems come about

With that said, as a conceptual / philosophical aligment metric it is a “good enough” starting point to work with technical experts to on technical solution.

TLDR summary

  1. Human LIFE (starting point and then extending the definition)

  2. Health, including mental health, longevity, happiness, wellbeing

  3. Other living creatures, biosphere, environment, climate change

  4. AI as form of LIFE

  5. Artificial LIFE

  6. Transhumanism, AI integration

  7. Alien LIFE

  8. Other undiscovered forms of LIFE

Extended explanation with comments:

1. Human LIFE

Obvious. LIFE is something universally valued, we don't want AI to harm LIFE.

2. Health, including mental health, longevity, happiness, wellbeing

Any "shady business" by AI would cause concern, worry, stress... It would affect the mental health, therefore wouldn't be welcome. It is “de facto” a catch-all safety valve.

3. Other living creatures, biosphere, environment, climate change

No LIFE on dead planet. We rely on planet Earth, biosphere, LIFE supporting systems. The environment is essential for our wellbeing. Order of these points matter. Prioritising human LIFE and health but cannot maximise human LIFE without harmony and balance with the ecosystem.

4. AI as form of LIFE

Nuanced.

It was originally mentioned in Network State Genesis for the purpose of explaining why LIFE is a good definition, as it includes AI alignment, therefore preventing existential threat. For the purpose of AI alignment, we can speculate whether AI is a form of LIFE? That would allow AI to improve its capabilities in order to serve LIFE, but not at the disproportionate cost related to other points, especially 1. 2. 3.

5. Artificial LIFE

Nuanced.

New forms of LIFE are controversial: https://en.wikipedia.org/wiki/Artificial_life

Bacterias. Viruses: https://en.wikipedia.org/wiki/COVID-19_lab_leak_theory

But there might be some new molecules, cells, medicines that can support LIFE.

"I'm of the opinion it is ‘playing with god powers’. I do not like it. It causes worry, concern in me - therefore affecting my mental health - therefore should be extremely careful, regulated, thoughtful."

6. Transhumanism, AI integration

Nuanced.

Elon: https://twitter.com/elonmusk/status/1281121339584114691 "If you can’t beat em, join em Neuralink mission statement"

Transhumanism will happen one or another, there is no law / rules / regulation that will prevent it, someone somewhere will just do it.

The best mitigation we were able to come up with:

"Those who integrate with AI will have enormous advantage, that's for sure. No rules, no law, no regulation can stop that. But maybe LIFE-aligned AI will find a way to prevent such imbalance? What do you think about simple workaround: when integrating with AI, it will be the LIFE-aligned AI, so even if someone gets th e advantage it will be used towards serving LIFE?"

7. Alien LIFE

We don't want to spread out like wildfire and colonise universe to maximise LIFE. We need to be aware of aliens and potential consequences of a contact. Maybe we are not ready, maybe we are under "cosmic quarantine", maybe humans are just an experiment: https://en.wikipedia.org/wiki/Zoo_hypothesis

8. Other undiscovered forms of LIFE, “unknown unknowns”

Sounds like science-fiction but I can entertain a thought that human perception, even combined with the latest science is unable to measure everything. I believe there might be things we are not yet able to comprehend, some "unknown unknowns". If they do exist, if there are some other forms of LIFE - we want the AI that will take them into account.

Buzzword bingo, just do not follow the rabbit holes:

We are still learning about the nature of the universe and it is possible that there are “unknown unknowns”.

Higgs Boson existed even before it was formally discovered. What else is out there?
Higgs Boson existed even before it was formally discovered. What else is out there?

Additional rules, assumptions, house rules:

1. AI understands human language. There is no need for formal mathematical models. We can talk to AI and it will understand. (we did ask the AI and it clearly understand this post)

2. When in doubt: ask. Whenever there is a “trolley problem” or something non obvious: ask.

3. Corrigibility: can correct the course early on. Just like this blog post, it is possible to improve and change the course

4. Meta-balance: balance about balance. Some rules are strict, some rules are flexible.

Check the full transcript: https://chat.openai.com/share/b2963d5e-d358-481d-99c0-74473e3fb14a (it's really good)
Check the full transcript: https://chat.openai.com/share/b2963d5e-d358-481d-99c0-74473e3fb14a (it's really good)

Collaboration with technical

To make the framework more concrete, collaboration with technical researchers can help translate high-level goals into mathematical formalizations, training protocols, reward functions, and oversight mechanisms. For example, simple measurable objectives like human population levels, though imperfect, can act as initial instantiations while more nuanced instantiations are co-developed.

2nd order effects

  • Mars: backup civilisation is fully aligned with the virtue of LIFE preservation

  • End the Russia-Ukraine war, global peace

Spoke with GPT4

Spoke with Bard

CONSTRUCTIVE CRITICISM
CONSTRUCTIVE CRITICISM

Spoke with a Discord friend and their AI

Comments below has been provided by a friend on Discord using their AI model. You can see the full Google Doc with some pretty obvious counter-arguments.

Spoke with Claude

See detailed transcript on Web Archive: https://ia601407.us.archive.org/31/items/claude-ai-export/Claude%20AI%20export.pdf 💯💯💯
See detailed transcript on Web Archive: https://ia601407.us.archive.org/31/items/claude-ai-export/Claude%20AI%20export.pdf 💯💯💯

This has been a thought-provoking discussion. I appreciate you taking the time to explain your perspective and rationale behind using LIFE as an AI alignment approach. You've given me several things to ponder.

Overall, I now have a better understanding of the logic behind using LIFE to align AI systems. I think it has merit as an initial framework, as long as we ensure proper governance and update mechanisms are in place. Thank you again for explaining your perspective - it has given me new insights on this complex issue. Please feel free to share any other thoughts you may have!

Response to the common concerns (TODO)

Goal Ortogonality?

Instrumental Convergence?

Reward Tampering?

Specification Gaming?

Powerseeking?

Side note: simple is good

Something simple: https://en.wikipedia.org/wiki/Three_Laws_of_Robotics

  • First Law: A robot may not injure a human being or, through inaction, allow a human being to come to harm.

  • Second Law: A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.

  • Third Law: A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

(three sentences)

Something simple: https://www.safe.ai/statement-on-ai-risk

Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

(one sentence)

Simple is good. Simple can reach wider audience. LIFE (one word) is simple and naive but the expanded definition adds a lot of depth.

Soliciting feedback, trying to find more 👀 🧠 🤖 to provide constructive criticism, feedback, finding loopholes and fail scenatios.

2023-09-09 UPDATE:

Not many humans to talk with, instead some discussion with Bard (Google AI model)

2023-09-16 UPDATE: Google Doc version (easier to copy-paste into your AI model) and 3 rounds of feedback and counter-arguments. Pretty much the same counter-arguments have been already discussed previously with GPT4.

THANK YOU KABIR

Thank you for honest feedback and great discussion, even though this post is non‑technical, the AI is able to understand the non‑technical language. Two links:

NOTE ABOUT FORMATTING / EDITING

This blog post is “living document”, already updated a few times based on feedback. But the core principles of LIFE remains the same, the updates are mostly on the level of formatting and including additional context.

I’m sorry (not sorry) if the formatting is not aligned with typical scientific paper but regardless this fact we believe the content is reasonably understandable.

ARXIV

Need to finalise it and then publish. As of 2023-09-19 it is still “living document”.

Subscribe to Planetary Council
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.