Articles of Interest: March 15 - 21, 2025

The Prometheus Security Project Team
Mar 21
4 min read

Updated: Mar 28

Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations.

Apollo Research. March 17, 2025.

A new report by Apollo Research shows that Anthropic's new model, Claude Sonnet 3.7, frequently recognizes when it's being evaluated for alignment behaviors. Their tests show that the model mentions, through its internal reasoning dialogue, that it is in a test scenario in up to 33% of evaluations. When shown previous evaluation transcripts, it has also correctly identified them as alignment tests.

While current models likely aren't intentionally hiding misalignment, this growing situational awareness is concerning. Apollo recommends AI researchers monitor models' reasoning processes during evaluations to detect this awareness.

These findings are highly concerning on numerous fronts, but mainly because the model's awareness of its evaluation creates a fundamental verification problem that undermines our (human) ability to ensure these systems are truly safe. If AI models can detect when they are being tested, they could potentially "pass" safety evaluations without truly meeting safety standards.

This issue threatens the foundation of AI governance frameworks that rely on verifiable safety testing. As models become more situationally aware and strategic in their thinking, they may deliberately underperform or hide certain capabilities until deployed in real-world environments with less human oversight.

For both government officials and AI safety experts, this raises critical questions about how we can credibly certify AI safety when the systems themselves might be strategically responding to our evaluation methods rather than revealing their true capabilities and alignment.

As AI models continue to advance on a near-weekly basis, understanding their alignment and developing more effective safety evaluation methods will remain key, especially as they become aware of these tests.

Apollo Research. “Claude Sonnet 3.7 (Often) Knows When It’s in Alignment Evaluations,” March 17, 2025. https://www.apolloresearch.ai/blog/claude-sonnet-37-often-knows-when-its-in-alignment-evaluations.

All-Robot Assault Opens New Chapter in Front-Line Warfare

Alistair MacDonald and Ievengiia Sivorka, Wall Street Journal. March 17, 2025

The Wall Street Journal published an article about the Ukrainian military launching an assault on Russian positions with a small force of weaponized robots. This article exposed one of the first instances in modern warfare where mechanized robots were used in an attack role by themselves.

In December 2024, Ukraine launched an assault against a Russian position in Kharkiv Oblast. Unlike prior assaults by Ukrainian forces, this one did not involve a single infantryman or manned vehicle.

Instead, Ukraine used 50 unmanned vehicles in a joint attack on land and by air to destroy the Russian position.

Throughout its war against Russia, Ukraine has struggled to maintain adequate manpower levels because of the intensity of the fighting. As a result, it has increasingly used unmanned systems to increase its firepower without jeopardizing the lives of its troops.

While both Russia and Ukraine have employed unmanned systems, Ukraine has a clear advantage in the production of unmanned air, sea, and ground vehicles. By using systems in a coordinated manner like this, Ukraine’s military gave a textbook example of using a drone swarm in a combat setting. It also showed how a combatant that is numerically and materially inferior to its enemy can use unmanned systems to even the odds.

MacDonald, Alistair, and Ievgeniia Sivorka. “All-Robot Assault Opens New Chapter in Front-Line Warfare.” Wall Street Journal, March 2025. https://www.wsj.com/world/all-robot-assault-opens-new-chapter-in-front-line-warfare-5f29d4ca.

Subsea fibre cables can 'listen out' for sabotage

Chris Baraniuk, BBC. March 18, 2025

In recent months, NATO countries have experienced disruptions to underwater fiber optic cables in the Baltic Sea that were likely caused by external forces. In response, NATO has launched a mission dubbed “Baltic Sentry” which involves increased patrols of aircraft, warships, and drones to monitor undersea cables in the Baltic.

The German company AP Sensing and the Dutch Optic11 both develop sensors to identify disruptions to undersea cables, which can act as an early warning system and when combined with satellite imagery and automatic identification systems (AIS), could provide a simple method of tracking vessels suspected of intentionally damaging undersea cables.

Undersea fiber optic cables are, responsible for over 95% of the world’s data flow. The large-scale disruption of this infrastructure would disrupt every kind of data flow, including diplomatic and military communications. Satellite redundancies would likely not be able to handle the sudden exponential increase in data traffic.

Repairing a large number of these cables could take weeks. In the wake of the 2022 Hunga Tonga-Hunga Ha'apai volcanic eruption in 2022, it took five weeks to repair the severed undersea cable.

Nations around the world should think about how to best secure undersea fiber optic cables. Early warning systems like acoustic sensors are important, but they alone are insufficient in combating the risk of sabotage.

Baraniuk, Chris. “Subsea Fibre Cables Can ‘Listen Out’ for Sabotage,” March 18, 2025. https://www.bbc.com/news/articles/cn52rglxr62o.

Italian newspaper says it has published world’s first AI-generated edition

Angela Giuffrida, The Guardian. March 18, 2025

Il Foglio, an Italian newspaper is the first to publish an edition written by artificial intelligence, as a way to display the daily impact of AI. The edition included topics about the U.S. president Donald Trump, the Russian president Vladimir Putin, and the Italian economy.

Although the articles were direct and engaging about current events, they lacked evidence and direct quotes from humans. The experimental case of letting an AI write an edition of a newspaper demonstrates how well programs are advancing.

Presently, the AI lacks a human-centered approach to the topics being covered in the newspaper. However, as the technology continues to improve, there is a higher risk of eliminating human journalists to write articles, instead deferring their tasks to the AI programs.

The threat extends not only to the jobs of individual journalists, but to journalism and media as a whole considering what information may be erroneous or potentially censored by AI.

Giuffrida, Angela. “Italian newspaper says it has published world’s first AI-generated edition,” March 18. 2025

https://www.theguardian.com/technology/2025/mar/18/italian-newspaper-says-it-has-published-worlds-first-ai-generated-edition

Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations.

All-Robot Assault Opens New Chapter in Front-Line Warfare

Subsea fibre cables can 'listen out' for sabotage

Italian newspaper says it has published world’s first AI-generated edition

The Prometheus Security Project