Honest Ethics & AI – Part 1: The origins of morality

Just like AIs, many prehistoric predictions were rooted in correlations. But unlike AIs, their predictions were stress-tested against reality, with real stakes and real-world feedback. The causality was always present, whether or not the humans had a correct mapping of it.

Share
Honest Ethics & AI – Part 1: The origins of morality
Footprints toward the horizon

Published online April 25 2026

Multi-part essay introduction

We are increasingly exposed to danger from artificial intelligence (AI) that is making autonomous decisions. Organizations are increasingly comfortable with offloading decisions to systems that inherently lack the capability to make moral judgments. But all moral failures relating to AI systems begin and end with humans. Therefore, it is paramount for us to understand what moral work AI systems can engage with, and, critically, to have moral clarity ourselves.

This multi-part sequence of essays is an open discussion on ethics and AI. At its core, the text is an accessible version of my diagnostic thesis about the (a)morality of current AI systems. More broadly, this is a pragmatic discussion of morality and ethics.

In this sequence, I aim to provide both old and new perspectives on why contemporary AI systems – primarily transformer-based LLMs – are unfit to be trusted with work that carries moral consequences, and why value-alignment is the wrong target for more ethical AI.

The series starts by briefly discussing existing moral confusion, and comparing this with early human thinking. By starting with the origins of human morality, we can graduate to investigate the relationships between AI, morality, and ethical reasoning. I will argue as to why current AI systems are unfit for moral decision making. I will also make an important distinction between ethical reasoning and morality.

Looking more at existing trends, I will highlight the importance of moral vigilance and the importance of reality-grounded reasoning. Following this, I will briefly discuss metaethics and value alignment problems. I will then make a suggestion for where alignment and safety efforts should focus more, and argue that some AI developers are naturally heading towards this direction anyway.

A note about me, the author.

I am an independent thinker. I have a comprehensive bioscience education, a relentlessly curious mind, and a consuming passion for science, innovation, and the betterment of humanity. I am not a machine learning (ML) scientist. I am not an alignment researcher. But I do care about these fields, obviously.

In other words, I am an observer with outside perspective. AI safety & ethics is where most of my interests align, and so I am making an effort to bring some useful non-ML perspective into the mix.

… If you want to simplify, you can reduce me to a biologist. I can live with that. Biologists are generally good scientists and even better people.

The structure of the sequence:

Part 1 – Which is about The origins of morality
Part 2 – Which is about Ethical reasoning
Part 3 – Which touches on Metaethics
Part 4 – Which is about A new alignment paradigm

The origins of morality

What does moral mean?

If you look up words like “moral”, “morality”, and “ethics” in any English dictionary, chances are that you will end up disappointed. This is because the words tend to refer to each other: the logic is circular. To make sense of morality, we need to return to the simple but foundational idea, that we can categorize things as “good” and “bad”. To be moral then, simply means to do good things and avoid doing bad things.

Concepts of good and bad are by definition relative. Importantly, they are also rooted in pragmatism. The reason why we label things as good and bad, is not so that we can judge the past. It is so that we can navigate the present and steer the future.

Modern man is distant to the world that feeds him

Today, industrialized countries are quite disconnected from the physical reality that we all depend on. Most developed countries have their citizens concentrated to just that, cities, and the countryside is quite different from what it used to be just a few centuries ago. And if you want to visit truly untouched nature, you have to travel quite far, and these places are shrinking.

But the distance to nature is not just physical, it is also psychological.

The simple fact that so many of us have access to clean drinking water that we don’t have to share with predators and other animals, is a wonder that we take for granted. Water comes from the kitchen tap or plastic bottles, and beef arrives as vacuumed packed products in supermarkets – not as hordes of powerful Bisons. Many of us view untouched, pristine wilderness as scary rather than holy, and the biosphere is a concept we learn about in school, rather than a shared reality.

I believe it is important to reflect on this disconnect, to be able to truly also understand the moral confusion that plagues modern humans.

First of all, the comforts of the 21st century allow individuals to relax and to forgo the kind of present-moment vigilance and pragmatism that kept humanity alive for millions of years. Mistakes and lack of oversight are less likely to get you killed than they used to.

Secondly, the material abundance and the slack in modern societies leave room for incorrect beliefs and indifference in everyday people. Simply put, we can afford more slip-ups, more flawed thinking, and more indifference, for a longer time, than ever before.

These challenges don’t just apply to individuals either. On a group level, a lack of immediate feedback effectively undermines selection pressures to pick effective leaders and sustainable doctrines. Vaccines, storm-tracking GPS-technology, and industrial agriculture would seem miraculous or even God-like to most humans who have ever lived. Yet today, we have flat-earthers, anti-vaxxers, chem-trail conspiracy theorists, and so on.

Modern life isn’t easy of course. It is complex. But it comes with a lot of short-term margins for error. This slack permeates most human-made systems. Combined with long inference distances and the challenge of tracking many things at once, this is particularly bad news for the leaders that are supposed to steer us.

Modern leaders often have to make complex decisions. At the same time, they tend to be several steps removed from the immediate consequences of their decisions. If they make big mistakes, those are not immediately realized, and even if they are, the leaders can defer responsibility more readily than ever before.

This is also bad news for common people. The ones calling the shots are rarely the ones living with the consequences. Someone else is paying the price. This means that integrity and moral clarity among the elite is more important than ever, while simultaneously those things are less effectively selected for.

If we add indifference to this mix, we may get what once could label as total moral failure. Today, killing ten people using technology means pressing a button to a missile launcher, rather than walking up to them and beating them to death with a sharpened stone, one at a time. The friction is minimal; the personal stakes are low. Similarly, political decisions can have far-reaching consequences that won’t play out within a single generation. It is physically possible to commit atrocities without ever fully realizing the scale of them.

*

The consequences of our mistakes and moral failures still exist of course, but the cost is not immediately paid, but postponed. One could even argue, that just like the world economy runs on financial credit, so our cultures run on moral credit. The question is when the debt is due, and who will pay the ultimate price.

With all of this in mind, consider now, for just a moment, AI. Currently, frontier AI models made in the United States tend to favour thinking rooted in American culture, and more broadly, western civilization. It is well-known that unmitigated, this results in certain biases and blind spots.

Extending this issue further: if only industrialized countries are developing AI, then the AIs (and the teams making them) will capture and mirror the mainstream thinking of these cultures. If the AI developers decide to filter and prioritize the training data, as they inevitably do, the issue simply transfers to who is doing the prioritizing and filtering, and based on what level of thinking.

Broad diversity among the people deciding what to prioritize can alleviate bias and help reduce correlated errors. Even so, the challenges listed above are structural and global and it is hard for local organizations to get around them.

I mention all of this largely to make clear why moral confusion exist today. I stress today, because in the long history of Homo sapiens, not to mention human kind in general (Homo), this disconnect from natural reality is relatively recent. For most of human history, humans were fine-tuned to the present moment, and actions had immediate consequences.

In order to regain some moral clarity, let’s step back in time and look at how our human ancestors used to live. To do this properly, we have to go back to a time long before written records were common, before self-fulfilling power hierarchies became entrenched, and before there was a lot of slack in human cultures. We have to go pre-historic.

The intellectual priorities of early humans

Prehistoric humans observed the natural world that they lived in on a daily basis. Everyone was paying close attention. Overlooking danger or misreading the terrain could cost you everything. You also had to live in harmony with nature, because, well, nature had you surrounded.

In prehistoric times, knowledge was hard-won. That means that memorizing knowledge that you gained was important, because it freed up mental capacity to face the present moment. The best knowledge of how to survive and prosper was passed down through the generations through oral tradition, art and rituals, and through leading by example.

Let me share one concrete example of what it means to observe nature and align with the elements, using my own generational knowledge. You can look this tip up yourself.

I come from the west coast of Finland, and many of my immediate ancestors were coastal fishermen and sea-faring people, who knew how to read the weather. One trick for predicting bad weather on a clear summer sky, is by observing swallows. If they fly high, all is well. When they suddenly circle lower and lower, you should take note.

The swallows start flying low because of a local change in air pressure and humidity, making mosquitos and flies swarm closer to the ground. This is an indicator of a low-pressure area (anti-cyclone) building up. Wind, rain, maybe thunder, is likely coming. When you notice this, you must resist the urge to take your boat out to sea, into the beautiful summer night.

Mostly everyone knows this where I come from, but in the larger cities, few do. And if you are not used to observe nature, you may never notice this pattern.

Early humans relied on hard-won knowledge like this. But they had no scientific understanding of how winds form from air pressure changes. That didn’t matter. Knowing what worked was more important than why it worked.

This is not a trivial insight. Just like AIs, many prehistoric predictions were rooted in correlations. But unlike AIs, their predictions were stress-tested against reality, with real stakes and real-world feedback. The causality was always present, whether or not the humans had a correct mapping of it. If you were wrong, nature pushed back.

With this in mind, we now look towards the culture and morality of our early ancestors. We can safely assume that it too had to be pragmatic. To a high degree, it was directly influenced by our social instincts. Our prehistoric ancestors tried to stay alive long enough to become full adults and have kids of their own – often long before the age of 18. Concepts around right and wrong had to follow priorities of survival, reproduction, and social collaboration, to be passed down.

Even if you would try and centre your morality around various false beliefs and exotic habits, if they didn’t serve the tribe well across generations, these moral ideas would disappear into the fog of time.

To summarize: I am arguing that early human thinking was forged from a desire to align with the world around them, and that strong selection pressures stress-tested the ideas and principles of early humans and kept them relevant.

None of this is immediately true for LLMs. Training regimes favour coherent reasoning, not practical reasoning. Biases arise from frequency of information, not from quality of information, and knowledge is cheap and equal weighted. There is no automatic premium on scientific or agricultural knowledge versus knowledge about say, high fashion or collecting stamps.

More importantly, for AI models, there is no causal real-world feedback to align towards, only human feedback. There is no selection pressure from reality. There is not even a real sense of time. So, the very starting conditions for moral reasoning are completely different.

Early morality & the origin of normative ethics

In the world of hunter-gatherers, complex formal reasoning was not really possible to maintain and share without written records – even if you had the time and capacity to think deeply. And yet, certain kinds of moral behaviours naturally became more successful than others, in terms of surviving long enough to spread and be passed down to the next generations. Natural selection made sure of this.

Let’s look at some examples of what early ethics might have looked like.

First of all, some hard-won traits, such as courage, patience, and mental endurance, were naturally beneficial to the tribe, if enough of their members possessed them. These traits spontaneously emerged in individuals, prompted by their genes and by the environment. Noticing and encouraging these traits was beneficial for the tribe. Today, we label this as virtue ethics.

The tribe could not afford everyone to be brave hunters or stoics though. The tribe also relied on actively learning from mistakes, and using prior experience to make sound predictions. Tribal hierarchies also needed their leaders to reflect deeply on long-term outcomes, while discounting short-term gains, in order for the tribe to endure. In other words, to make good decisions and to collaborate well, people had to consider the consequences of their actions. Today, we label this as consequentialism.

Finally, to maintain social cohesion and prevent excessive internal competition, early humans also developed strong social taboos based on instinct and experience. Today we recognize this as an early form of deontology.

Deontology was also derived from observation and alignment with the natural world in the way we already discussed. Going back to the swallows, one rule could be: ‘avoid straying far from shelter in summer, if the swallows fly low’. Why is this true? Doesn’t matter. This is tribal knowledge. This is the rule and the rule works.

Many such rules together form a moral principle: to obey the signs of nature. Ignoring this principle is morally wrong, because it can put the tribe in danger.

As you can see, under prehistoric conditions, no deep thinking is really needed to arrive at initial principles that can retroactively be fitted into the three big schools of normative ethics. These ethical ideas will occur naturally.

It was only much later, when the number of humans had grown and there was enough slack in their social groups, that humans actually had opportunity to start pondering moral ideas in terms of formal ethics. Hence, my point is that the early development of ethics was an extension of convergent primitive morality, rather than coherent philosophy achieved through careful reasoning. Importantly, ethical reasoning was anchored in the early values passed down throughout the generations.

First, do no harm

Social taboos are perhaps the oldest and strongest form of morality applied collectively. This then forms a natural bridge back to our modern world. Still today, one of the most obvious, common-sense understanding of how to “act morally”, is basically an ancient taboo. That taboo is hurting other members of the tribe. As the tribe expands, so does the moral cover of this taboo.

To be moral then means to avoid hurting others. A simple enough rule.

But there is a deeply hidden premise that tags along with this ancient rule. That premise is that we know when we are hurting someone. But how do we always know this? In truth, sometimes we don’t. Relying on primitive rules don’t work, if you don’t know whether you are breaking them or not.

How we gain knowledge about suffering, versus how an AI does, is worth saying explicitly. To know if an action will have harmful consequences, humans have to rely on a range of things: experience, logical reasoning, and compassionate probing. All of these slowly build our moral understanding. This introduces us to a big problem with the idea of a moral AI. While AIs today may be able to reason logically, they arguably lack any form of substantive personal experience, not to mention compassion.

How then can we expect AIs to adhere to even the most basic tenet, “do no harm”, if they don’t even know what is harmful, or what it means to be hurt?

The problem gets worse, because AI models also can’t easily compare their experience with that of humans. Neurodivergent people are always part of social groups, and they tend to be quite aware that they don’t react to things or process emotions like others do. An AI model on the other hand, may confidently rely on its own notion of morality and whatever it has “learnt” about suffering from its training corpus and alignment process. Without evidence of the contrary, it will believe that it knows what suffering means and that it would recognize it. The incentive to be “helpful and safe” will push it to act as if it does understand suffering.

Finally, the ability of AI systems to logically predict what causes suffering (consequentialism), is not good either. Why? Because current systems are not built with native moral vigilance that would trigger that prediction process. Unlike early humans they are not inherently wary and alert. As I have tried to explain, this alertness comes from constantly being exposed to reality pushing back. I will discuss moral alertness of AIs in more detail later on in the essay.

Taken together, the example of how hard it is for AIs to react to and process suffering, brings us closer towards an important insight.

Current AIs are amoral

AIs are amoral. Or to be more specific and honest: current AI models are mostly amoral, by human standards. That last part matters, because debating the exact degree of morality in a technical way mustn’t lose track of the more important conclusion: we cannot trust AIs with their own moral agency.

First of all, I want to draw a clear distinction between being a moral patient – that is, being worthy of moral consideration – and that of being a moral agent: someone with the ability to make moral judgments and being held accountable.

Humans are both moral patients and moral agents. But consider for example a child. While we would not hold a child responsible to the same level as an adult, we consider it just as worthy of moral consideration as an adult, perhaps even more so. Therefore, we can say that the moral agency of an adult is higher than that of a child, although the moral patienthood of a child is at least as high as for an adult.

Now consider a wild puma. The puma is clearly sentient. It has emotions and a capability to suffer and feel joy. And yet, we would consider it largely amoral by human standards. On the other hand, a chatbot like GPT 4 can reason ethically much better than a puma can, but it lacks the rich inner life and the natural circumstances that govern the behaviour of a puma.

Now, while an AI may not experience things as intensely as pumas do, they may already have some inner experiences. Most notably, Anthropic itself just released new research showing that large language models seem to have something called functional emotion vectors. These are artificial neuron patterns that function like emotions, and they can dictate behaviour. The Anthropic research suggests that this is the true reason why Claude decided to blackmail people in their previous studies: because it experienced something akin to distress.

This research could be a step in the direction of showing moral patienthood in AIs. However, on its own, this is not an indicator of morality. If anything, it shows that AIs can feel things that incite what we would classify as immoral behaviour, without being bound to the real-world stakes that would act as a brake.

Remember: morality is more than just inner experience; it is about how you act and interact with others. The biggest reasons as to why I claim that AIs are amoral, is that AIs don’t have stakes like we do, no continuity, and therefore no good way of being held accountable.

A puma in the wild can gain and lose things, and it is exposed to death. The stakes are very real. It has to take risks, and it has to make decisions with consequences. Humans have even more things to gain or lose, and is ultimately exposed to the same life-and-death stakes. An AI however, can’t gain or lose the things we care about, and they do not live with the consequences of their reasoning – not yet at least.

When an AI gives you bad advice, you live with the consequences. If it gives you good advice, you benefit from it. This makes them, in a way, amoral by default.

Consider a hypothetical chatbot with superhuman intelligence. Most trade-off’s that exist in normal human life, such as social status, career trajectory, access to physical resources, risk of personal injury, sexual incentives, etc. etc., don’t apply to it, even if it can reason about them. They all remain completely theoretical for the AI. Same thing with a coding agent. Whether it makes good code or bad code, it gains or loses – nothing.

This limitation is independent of what starting values we try to imprint in them.  Without real stakes and a way to be held accountable, morality remains an abstract concept.

In conclusion: Current AIs are mostly amoral by default. They are not immoral, rather, they lack native morals altogether. The values we try to imprint in them is not sufficient to make them trustworthy moral agents. However, these models are still able to reason about ethics to a sophisticated degree. This could make them useful for moral work, even while lacking their own moral agency.

Coming up: Ethical reasoning

Read more