
Bletchley Park Mansion in the United Kingdom, where the 2023 Bletchley summit on advanced technology safety was held. Image by DeFacto, licensed under CC BY-SA 4.0.
Artificial intelligence safety is the set of practices, rules, and institutions created to reduce harms caused by AI systems. Deliberate uses of AI can create risks when the technology is used to deceive, surveil, or attack digital systems. In addition, unintended failures can cause harm when a model produces false answers or discriminates against people. The risk increases when the system hides its decision criteria or is placed in tasks for which it has not been tested. In diplomacy, these questions moved to the center of the agenda as AI began to affect struggles over public information, critical infrastructure, and technological power.
The debate over this subject includes a political question: who will have enough institutional and technological capacity to guide the life cycle of advanced AI systems? To a large extent, that capacity belongs to the companies that control the central inputs of artificial intelligence, as well as to the states in which those companies are based. Actors that depend on these systems must trust rules written by others and accept risks they cannot measure on their own. For that reason, AI safety involves technical reliability and a political contest over power, dependence, and regulatory capacity.
Summary
- AI safety debates address the reduction of harms caused by technical failures, discrimination, disinformation, surveillance, cyberattacks, military uses, and effects on the labor market.
- International AI governance remains fragmented. Multilateral organizations, regional blocs, forums of major economies, and specialized summits have created principles, codes, and institutions with very different reach and legal force.
- AI is a dual-use technology: the same models that can support scientific research, education, health, and logistics can, in other contexts, expand risks linked to military operations, espionage, repression, propaganda, and biological threats.
- AI safety becomes harder in an environment of strategic competition, where major powers treat the technology as a source of power, technological autonomy, and influence over global standards.
- The main challenge for current AI governance is inequality in the participation of developing countries and in the implementation of governance principles.
What AI Safety Means
The expression “AI safety” has a technical layer and a social-political layer. In a technical sense, it refers to the attempt to make an artificial intelligence system operate reliably. The AI model needs to resist manipulation, recognize its own limits, protect sensitive data, and remain within the use for which it was designed. In a social and political sense, in turn, the expression refers to the effort to reduce the harmful effects that AI can produce for people, institutions, and international relations.
This distinction is necessary: not every risk is necessarily a software defect. Some risks arise when the system reproduces earlier inequalities; a recruitment model may function as designed and still discriminate. Other risks come from the appearance of reliability: a generative AI model can produce coherent text while spreading a lie. In these cases, safety requires a legitimate purpose, supervision, and a way for the affected person to contest the decision.
The debate over generative AI safety gained force after 2022, when automated production of digital content reached a new scale. From that point onward, relatively simple interfaces began to produce content and code with broad ease, even for users without deep technical knowledge. Although that flexibility brings several benefits, its risks are considerable, since the same model can be used for positive purposes or for malicious ones.
AI safety, therefore, cannot be achieved through a single public policy. Model behavior can be regulated through technical tests and governance structures, while human rights and data protection defend the interests of affected people. In cross-border questions, AI safety is connected to arms control, industrial standards, and scientific cooperation. The diplomatic challenge is to formulate rules on AI before the harms caused by it become irreversible.
Main Civil and Social Risks
The most recurrent AI risk concerns automated decision-making in sensitive matters. Candidate selection affects access to work. Credit scoring affects consumption, housing, and economic activity. Patient prioritization affects access to health care. In automated policing or migration-checking tools, the problem becomes more serious when the decision can restrict liberty, movement, and lawful residence. In this way, if an AI system uses biased data or opaque criteria, it turns earlier inequalities into decisions that appear neutral.
Another risk is the erosion of public trust through abusive uses of artificial intelligence. By reducing the cost of producing content at scale, AI models facilitate disinformation. The risk appears in two main forms. Synthetic voices and false images of public authorities give a lie a human and institutional appearance. Forged documents and false news, in turn, give the fabrication a bureaucratic or journalistic appearance. The harm caused by these materials comes from the fact that they are cheap to produce and look plausible. As a result, they require major effort from public institutions, journalists, and electoral bodies before they can be refuted.
Disinformation increases polarization by accelerating the circulation of lies. During elections or security crises, for example, AI systems can be used to create personalized messages for specific political groups. Such content gains force on the internet and can quickly increase fear and distrust toward official institutions. To confront this challenge, democratic governments can invest in legitimate oversight tools. In more authoritarian regimes, however, the fight against disinformation can serve as a justification for restricting public debate through censorship and state surveillance.
AI systems create significant privacy challenges, given their dependence on abundant data for training, adjustment, and use. When they manipulate biometric data or health records, they can reveal intimate patterns about people. When they handle financial data or everyday traces, they can expose personal routines. The larger problem appears when personal data are combined at scale: in such situations, it becomes difficult to distinguish an efficient AI system from a surveillance system. These risks increase as databases, facial recognition cameras, and automated behavior-assessment systems spread.
Artificial intelligence creates material risks as well. For systems of this kind to be trained and operated, high-capacity data centers must exist. That depends not only on advanced chips, whose manufacture requires many inputs, but also on electrical grids and server-cooling infrastructure. On one hand, countries with abundant water and clean energy matrices can benefit when they have conditions to maintain these data centers sustainably. On the other hand, in countries with resource scarcity, or even in scenarios of disorderly expansion, demand for computing resources can pressure electrical and water-supply systems.
Dual Use, War, and International Security
AI is a dual-use technology: the same capability can serve civilian and military purposes. This duality appears when the system’s purpose changes. A computer-vision system used to identify industrial defects can be adapted to identify targets. The same pattern appears when information-processing capacity moves into another context: a model able to summarize large volumes of text can support diplomatic analysis or strengthen military intelligence and internal surveillance. In planning tools, the difference lies in operational use, as the same capability that improves humanitarian logistics can organize cyber operations. Since the risk depends on the user, the context, and the integration with other systems, simple controls by software category are rarely enough.
In the military field, AI safety depends on human responsibility, control of speed, and limits on proliferation. Human responsibility is the first problem, since military decisions cannot disappear inside an algorithmic recommendation. If a system recommends a target, orders an interception, or prioritizes a threat, it must be possible to know which authority is responsible for the decision. The question becomes even more sensitive when the system operates an autonomous vehicle. International humanitarian law already requires distinction between combatants and civilians, proportionality, and precaution. AI preserves these obligations and makes it harder to demonstrate how they were fulfilled when the decision passes through opaque models.
The second problem is speed. Automated systems can reduce the time between detection, classification, and response. In a crisis between armed states, this compression of time can increase the risk of escalation. A false alert, an incorrect interpretation of military movement, or an automated retaliation decision can create pressure to act before diplomats and commanders verify the context. In this case, safety depends on operational limits and communication channels that preserve human judgment in critical decisions.
The third problem is proliferation. AI systems do not resemble nuclear weapons, which require specific physical materials, detectable facilities, and tightly controlled production chains. Models, data, and knowledge can circulate more easily. Even so, the most advanced capabilities depend on high-performance chips, cloud computing, specialists, and access to large datasets. For that reason, security policy has begun to control chips, require cyber protection, and use technical evaluations even while the technology remains distributed.
The REAIM conferences, launched in 2023 in the Netherlands and continued in 2024 in South Korea, show this concern. They address the responsible use of AI in the military domain, with emphasis on human oversight and accountability for decisions made with algorithmic support. Their documents help military and civilian actors use a common language, although they do not yet create a binding treaty. By negotiating concepts on autonomous weapons and human responsibility, these forums influence the terms that may support future norms.
How International AI Governance Emerged
International AI governance grew through an accumulation of forums. There is still no central treaty for the subject. Before the explosion of generative AI, the discussion already existed around algorithmic ethics, digital rights, and military technology. In 2021, UNESCO adopted a recommendation that places human oversight at the service of human rights and sustainability in AI ethics. The text addresses privacy, non-discrimination, and equitable access to the benefits of the technology. The OECD and the G20 consolidated principles on trustworthy AI, innovation, and responsible use.
The year 2023 changed the political scale of the issue. The G7 launched the Hiroshima AI Process to develop principles and a voluntary code of conduct for advanced systems. In the same year, the first AI Safety Summit, held at Bletchley Park, produced a declaration signed by participants such as Brazil, the United States, China, and the European Union. The declaration gave centrality to the risks of frontier models, that is, highly capable general systems whose power can generate uses that are difficult to foresee.
The UN moved to the center of the agenda when AI began to be treated as a matter of security, development, and human rights at the same time. In 2023, the Security Council discussed AI and international security. The meeting addressed disinformation and cyber risks as cross-border problems and opened space to discuss whether military uses of AI would require their own rules. In 2024, the General Assembly adopted resolutions on safe AI for sustainable development and on international cooperation in capacity-building. These resolutions indicate a minimum political consensus, even though they do not have the force of a treaty: AI should respect human rights, support development, and reduce the digital divide.
The final report of the UN High-level Advisory Body on AI, published in 2024, organized the problem around three gaps that reinforce one another. Representation is the first: many countries, especially in the Global South, were left outside important plurilateral initiatives. This exclusion makes coordination harder, since the multiplication of forums creates rules, standards, and commitments without sufficient mechanisms to connect them. Even when there are common principles, a third gap appears: implementation. Voluntary principles do not, by themselves, produce technical capacity to execute rules and provide accountability.
This diagnosis led to proposals such as an international scientific panel on AI, a regular policy dialogue, and an international exchange of standards. In 2025, the General Assembly began turning part of this design into institutions by creating an independent scientific panel and a global dialogue on AI governance. The broader package suggested by the UN is meant to give that cooperation practical support through capacity-building, financing, data rules, and a dedicated office in the Secretariat. The objective is to reduce information asymmetries and make cooperation less dependent on occasional summits, without replacing national or regional regulations.
The Bletchley Process continued after 2024. The Paris summit of 2025 shifted part of the language from “safety” toward “action,” inclusion, and sustainability, while the AI Impact Summit in New Delhi in 2026 kept the sequence of intergovernmental meetings going. The change in vocabulary shows a multilateral dispute over priorities: some governments want to concentrate the agenda on risks from advanced models, while others insist on access, infrastructure, development, and participation by the Global South.
The European Regulatory Model and Its Effects
The European Union adopted the AI Act to turn AI safety into market obligations. Its central logic is to classify uses of artificial intelligence according to their risks. Practices considered to create unacceptable risk are prohibited when they threaten autonomy, equality, or democratic control at a level incompatible with legitimate use. This is the case for social scoring systems and certain forms of manipulation or biometric identification. High-risk practices, in turn, must comply with obligations on risk management, documentation, data quality, human oversight, robustness, cybersecurity, and transparency. In the EU classification, this category covers uses that can affect rights and essential services. It therefore reaches infrastructure that keeps society running, educational and labor-market systems that shape access to opportunity, and state decision-making processes that can alter legal status or public benefits.
In diplomatic terms, the AI Act matters as a rule for access to the European market by companies that use artificial intelligence. Since foreign companies operating in the European Union need to adapt their systems to the bloc’s rules, these rules can produce global effects. That is partly what happened with European data-protection laws. In addition, the AI Act serves as a reference for other countries that want to regulate artificial intelligence without starting from zero.
Nevertheless, the reach of this European regulation has limits. Although it entered into force in 2024, its obligations began to apply in stages, creating a political and business dispute over implementation. Moreover, certain subjects remain outside the AI Act’s reach, such as military risks, national security questions, and imbalances in worldwide access to advanced computing. In this way, the regulation’s main value is to show how debates on AI safety can result in verifiable legal obligations.
The United States, China, and Strategic Competition
The United States and China treat AI as part of a broader technological dispute. For Washington, advanced AI is connected to the country’s economic leadership, its defense, and its intelligence community. In 2023, the Biden administration presented AI safety as one of the axes to be regulated through an executive order. In 2025, however, the Trump administration revoked that approach and adopted an AI plan focused on preserving U.S. technological leadership. The new plan linked that objective to infrastructure development and the reduction of regulatory barriers. Even with this change in emphasis, the U.S. government continued to connect AI to public procurement, cooperation with allies, and export controls on advanced chips.
Export controls have a clear strategic function, since advanced AI models require major processing capacity. By limiting Chinese access to cutting-edge chips, software, and manufacturing equipment, the United States tries to make it harder to develop Chinese systems for supercomputing, advanced surveillance, and military modernization. Thus, the technology supply chain becomes an instrument of national security.
China, in turn, responds with its own investments, expanded AI regulation, and promotion of international standards for the technology. Since 2017, Beijing has treated AI as a development priority and, in 2025, proposed the creation of a World Artificial Intelligence Cooperation Organization. Chinese rules on algorithms, synthetic content, and generative AI require AI models to be registered, labeled, and evaluated through safety tests. In the Chinese government’s view, this is necessary to ensure social stability, content control, and protection against military risks. In addition, there is an emphasis on reducing external technological dependence so that Chinese companies can compete both in the quality of their models and in the robustness of their digital infrastructure.
The European Union occupies a third position in this dispute. It does not concentrate platforms and cloud infrastructure on the same scale as the United States, nor does it operate according to the Chinese party-state model. Its strength lies in another instrument: the capacity to use the internal market to turn legal values into access requirements. For this reason, European regulation functions as indirect diplomatic power. Companies that want to operate in the bloc need to adapt their systems, and other governments can use these rules as a reference when drafting their own standards. In this context, technical standards, chips, and clouds become parts of the same dispute, alongside data rules, infrastructure financing, and regulatory trust.
For middle- and low-income countries, the dilemma is different. Many of them need AI to improve public services, agricultural productivity, climate adaptation, education, and health. However, safe use of these tools depends on conditions that are not always available: sufficient computing power, quality data, specialists, and regulatory agencies able to evaluate complex systems. If these countries depend only on foreign companies, they may receive systems poorly adapted to their languages, needs, and local risks. If they remain outside governance forums, global rules will be designed without taking their implementation capacities into account. For these countries, AI safety involves access, digital sovereignty, and protection against technological dependence.
Why Governance Is Difficult
AI governance is difficult, first of all, because the technology changes faster than the institutions that try to regulate it. The first difficulty is technical. Advanced AI systems are evaluated through tests that measure performance on specific tasks. These tests have limited reach: real life brings together contexts, users, and incentives that do not appear in the laboratory. A model can seem safe in an evaluation and produce risk when integrated into external tools, sensitive data, or high-impact decisions. As models change quickly, an evaluation conducted before release can age rapidly. Governance, therefore, needs to follow systems that keep changing after they are put into use.
The second difficulty is institutional. Private companies concentrate a significant share of development capacity, while governments need to regulate systems they do not always fully understand. Public authorities can require transparency, auditing, and tests, but these requirements work only when the state can interpret and verify them. Effective oversight requires specialists, infrastructure, and access to internal information. When the regulated sector controls essential technical knowledge, oversight requires constant public capacity, qualified staff, and legal authority to obtain technical evidence.
The third difficulty is diplomatic. States want cooperation to avoid cross-border harms and, at the same time, seek to preserve technological advantage. This tension appears when a government defends safety while protecting its national industry. It reappears when a state demands transparency from foreign companies while keeping its own military uses secret. In more serious cases, a state can support human rights principles in multilateral forums and use AI for domestic surveillance. This difference between discourse and practice weakens the trust needed to negotiate common rules.
The fourth difficulty is distributive. Safety costs money: it depends on tests, data audits, civil-service training, infrastructure, and protection of critical systems. Many countries need the benefits of AI and lack the resources to evaluate its risks autonomously. If they adopt imported tools without adequate oversight, the risk is not distributed in the same way as the benefit. International governance needs to deal with this inequality. Otherwise, advanced AI will widen the distance between countries that make the rules and countries that merely receive their effects.
What AI Safety Can Do in Practice
An AI safety policy needs to connect risk, testing, supervision, and contestation. The first step is to classify risks, since not every use of AI requires the same degree of control. The difference appears when the system’s function is compared. A recreational or administrative use tends to affect comfort, cost, or efficiency. A system that prioritizes care in a hospital, for example, can alter concrete access to medical treatment. A model used to support military target selection affects another kind of risk as it approaches the use of force. Classification helps direct obligations toward the uses that can cause greater harm.
The second element is testing before and after deployment. Before use, models need to be evaluated for robustness, tendency to produce false information, and risk of discrimination. After deployment, evaluation should observe cybersecurity, behavior in unexpected contexts, and possible guidance toward dangerous activities. As systems learn through updates, integrations, and continuous use, evaluation needs to follow the product’s entire life cycle.
The third element is human supervision with real authority. It is not enough to say that a person is “in the loop” if that person does not understand the system, lacks time to review the decision, or cannot contradict the automated recommendation. Relevant supervision requires training, documentation, decision records, and clear responsibility. In military and critical-infrastructure uses, this means defining in advance which decisions may receive algorithmic support and which should not be automated.
The fourth element is contestation. People affected by automated decisions need to know when AI was used, which information mattered, and how to appeal. Without that path, AI creates an opaque bureaucracy. The decision appears as a technical result, and the affected person cannot identify the error or hold the institution that applied it responsible. Contestation, therefore, turns safety into a procedural right, beyond a technical promise.
The fifth element is international cooperation. Technical standards, incident exchange, and capacity-building can reduce risks that no country controls alone. Public research and diplomatic dialogue perform the same function when they bring together governments with very different capacities. This cooperation needs to include companies and researchers without leaving governance in the hands of those who profit from the technology. Legitimacy depends on enough public participation to contest technical choices that produce political effects.
Limits and Diplomatic Consequences
AI safety will continue to coexist with political conflict around the technology. States will use AI to seek productivity, military advantage, economic influence, and surveillance capacity. Companies will compete over markets, data, and infrastructure. Societies will need to decide which risks they accept in exchange for efficiency, convenience, or growth. Governance tries to manage these interests through limits, transparency, and responsibility.
The central point for diplomacy is that AI turns safety into a dispute over infrastructure. Whoever controls chips, clouds, and cables controls part of the conditions under which others act. Energy, data, standards, and models complete this material base. This structure gives power to some states and companies, but it also creates vulnerabilities for those who depend on them. Supply chains can be interrupted, models can be used against their creators, national rules can conflict, and dependence on a few suppliers can affect political autonomy.
As AI changes rapidly, originates largely in private companies, is distributed across many uses, and produces effects that cross borders, AI safety is unlikely to be organized through a single instrument. Broad treaties may emerge when a specific harm requires a more stable legal commitment, as happens in debates over human rights, autonomous weapons, or cybercrime. In other cases, governance tends to advance through complementary layers: each layer responds to a limitation of the previous one. Technical standards make principles verifiable by turning them into tests and metrics. For those tests to gain political direction, summits help form consensus before legal obligations exist. Even so, summit consensus is not enough to guide the everyday conduct of companies and governments. For that reason, codes of conduct occupy the intermediate space between voluntariness and regulation, in a stage where law still does not reach every use. Regional rules, finally, give legal force to local priorities by transforming them into conditions for market access. This format is imperfect, but it reflects the nature of the technology.
This combination of technological speed, corporate concentration, and geopolitical dispute makes layered governance more likely than a single regime. In this architecture, each pole occupies a different place, since each controls different instruments. The European Union tends to act through market regulation, since its strength lies in turning access to the bloc into the diffusion of standards. The United States starts from its private innovation base and national security, using infrastructure and technological controls to preserve advantage. China links AI to development, internal control, and strategic autonomy, so that its rules serve both technological competition and political stability. The UN, in turn, occupies the space that these poles cannot fill alone: legitimacy, inclusion, and coordination for countries that do not participate in the technological centers. Between these poles, middle powers will try to adapt rules, protect their data, and benefit from AI without binding themselves to a single technological sphere.
AI safety, therefore, involves both a technical agenda for obtaining more reliable models and a power agenda over who decides the limits of automation. This double dimension defines who will have access to the benefits of automation and who will answer when automated systems cause harm. The diplomatic significance of AI safety lies here: AI already participates in the way states manage information, force, development, and sovereignty. Governing it means disputing the conditions under which the technology can serve cooperation without becoming another mechanism of dependence, coercion, or instability.