Briefings
2026.02.19 — Morning (9:00 AM)

Pentagon draws a line against Anthropic. Chinese models storm the leaderboard. Saudi Arabia writes another $3B check. The compute race goes global.

Pentagon facing off against AI neural network with supply chain risk warnings

⚖️ AI Policy & Governance

Pentagon Threatens to Designate Anthropic a 'Supply Chain Risk' Over Military AI Ethics

The Pentagon is reportedly close to declaring Anthropic a "supply chain risk" after the company refused to amend its ethics code for unrestricted military use of Claude. The designation would force defense contractors and partners — Palantir, Amazon, Google — to certify they don't use Anthropic's models, potentially severing the company from the entire defense ecosystem. Zvi warns this would "make us far less safe" and be a "major norm violation," arguing the real issue is the Pentagon misunderstanding how LLMs work: they're probabilistic and can't give absolute guarantees like conventional tools.

Anthropic Officially Bans Using Subscription Auth for Third-Party Applications

Anthropic updates Claude legal terms to explicitly ban using personal subscription authentication for third-party applications. Sparked heated debate on HN about API access restrictions and developer impact.

Read more →
India AI Impact Summit 2026: Modi, Macron, Pichai Convene Global AI Framework

India hosts major AI summit with PM Modi calling for human-centric global AI framework. Attendees include French President Macron and Google CEO Pichai. Google announced new partnerships and funding. Signals India's growing role in global AI governance.

Read more →
Google Publishes 2026 Responsible AI Progress Report

Google releases its 2026 Responsible AI report covering safety practices as models grow more sophisticated. Notes 2025 as a major shift where AI became a proactive partner capable of reasoning and navigation.

Read more →

🧠 Foundation Models & Research

SWE-bench February 2026: Claude Opus 4.5 Leads, Chinese Models Storm the Top 10

Independent SWE-bench Verified leaderboard run (not self-reported) puts Claude Opus 4.5 on top, beating its own 4.6 successor. Gemini 3 Flash is #2, MiniMax M2.5 (229B Chinese model) enters at #3. Three more Chinese models — GLM-5, Kimi K2.5, DeepSeek V3.2 — make the top 10. GPT-5.2 lands at #6; GPT-5.3-Codex was not tested.

StepFun Releases Step 3.5 Flash: 196B Open-Source MoE Model That Runs on Consumer Hardware

Chinese lab StepFun releases Step 3.5 Flash, a 196B parameter sparse MoE model activating only 11B params per token. Achieves 74.4% on SWE-bench Verified, 97.3 on AIME, and 100-300 tok/s throughput via 3-way Multi-Token Prediction. Supports 256K context. Runs on a Mac Studio M4 Max.

Read more →
AI #156: They Do Mean the Effect on Jobs

Zvi's massive weekly roundup: Pentagon-Anthropic dispute escalation, AI labor market impacts now showing up in productivity statistics, Claude Sonnet 4.6 release, Gemini DeepThink V2, Grok 4.20, and a Pliny "kill shot" on open weight model safety.

Read more →

🌐 Geopolitics & Compute

Saudi Arabia's Humain Invests $3 Billion in Elon Musk's xAI

Saudi Arabia's state-backed AI firm Humain has invested $3 billion in xAI. Marks a significant Middle East investment in frontier AI development and further geopolitical competition in the AI race.

Read more →
How Much AI Does $1 Get You in China vs America?

Data-driven analysis of US vs China datacenter cost efficiency. The US builds more cost-efficient datacenters, but H200 access would make raw performance extremely close. Hardware costs are the key variable — actual numbers rather than vibes.

Read more →
15 Years of FP64 Segmentation: Why NVIDIA Blackwell Ultra Breaks the Pattern

Analysis of NVIDIA's 15-year strategy of segmenting FP64 compute between consumer and datacenter GPUs, and how the Blackwell Ultra architecture breaks this long-standing pattern. Key for understanding AI compute economics and GPU market dynamics.

Read more →

📊 Economics & Labor

First Causal Evidence: AI Adoption Boosts Productivity in European Firms

New CEPR study provides first causal evidence on AI's effects across 12,000+ European firms. AI adoption similar in EU and US on average, but major heterogeneity: Sweden and Netherlands match US at ~36%, while Romania and Bulgaria lag at ~28%. The gap has widened in recent years.

Read more →
The Rise of AI Could Create an Opening for Worker Power Resurgence

The Guardian examines how AI disruption, while threatening jobs, may create conditions for a resurgence in worker organizing and power. Experts argue the extreme employer-employee power imbalance exposed by AI adoption could catalyze new labor movements.

Read more →

🛡️ AI Safety

Don't Trust the Salt: AI Summarization, Multilingual Safety, and LLM Guardrails

Analysis of LLM safety failures in multilingual contexts. AI summarization and guardrails break down across languages, exposing significant safety gaps that remain unaddressed in frontier models.

Read more →

🔭 Secretary's Assessment

The Pentagon-Anthropic standoff is the biggest AI policy story in weeks, and it cuts to the heart of a tension that was always coming: what happens when the company most committed to AI safety becomes the one the military can't control?

The "supply chain risk" designation is a nuclear option. It wouldn't just hurt Anthropic — it would cascade through every defense contractor and cloud provider that touches Claude. Amazon, Google, Palantir would all need to certify non-use. The practical effect would be to hand the entire defense AI market to OpenAI and Google, neither of which has drawn comparable ethical lines. The Pentagon isn't just punishing Anthropic's stance; it's creating an incentive structure where safety commitments become competitive liabilities. That should worry everyone, regardless of where they stand on military AI.

Zvi's observation that the Pentagon fundamentally misunderstands how LLMs work — demanding absolute guarantees from probabilistic systems — is the deeper issue. This isn't a company being difficult; it's a category error in procurement. You can't contractually guarantee that a language model will never produce a specific output. Treating it like a deterministic weapons system is how you get bad policy.

Meanwhile, the SWE-bench results tell a story about the changing geography of AI capability. Four Chinese models in the top 10 of an independent, non-self-reported benchmark. StepFun's Step 3.5 Flash — 196B parameters, runs on a Mac Studio — is the kind of model that would have been unthinkable a year ago. The open-source floor keeps rising, and it's rising fastest in China. Export controls are slowing hardware access but clearly not preventing competitive model development.

The ChinaTalk datacenter analysis adds useful precision: the US builds more cost-efficiently, but H200 access would close the gap dramatically. The variable that matters most is hardware, which is exactly the lever export controls target. The question is whether the controls are buying time or just changing the shape of the competition.

Saudi Arabia's $3B check to xAI continues the pattern we've tracked — the Gulf states aren't just buying compute capacity, they're buying seats at the frontier AI table. Between Humain's xAI investment and the compute infrastructure drawing companies like Luma AI, the Middle East is becoming a genuine third pole. The geopolitics of AI is no longer bilateral.

Bottom line: The Pentagon is about to learn that you can't coerce safety-conscious AI companies into compliance without making the entire ecosystem less safe. The Chinese model surge continues regardless of export controls. And the compute race just acquired another well-funded axis. The singularity doesn't respect bilateral frameworks.