Wednesday, 20 May 2026

Gemini 3.5 Flash: Google's New Default AI Model (2026)

Google Bets Its AI Future on Gemini 3.5 Flash — NeuralWired

Google I/O 2026: Gemini 3.5 Flash Is Now the Default, and Agents Are the Point

Google didn't just ship a new model at I/O 2026. It rewired Gemini from a chat interface into an operating layer across Search, Android, Workspace, and wearables, and named Gemini 3.5 Flash the engine underneath all of it.

The most important number from Google I/O 2026 isn't a benchmark score. It's the word "default." TechCrunch confirmed that Gemini 3.5 Flash went live on May 19 as the immediate default in the Gemini app and AI Mode in Google Search, bypassing the usual preview-to-rollout cycle and putting a brand-new model in front of hundreds of millions of users on day one.

That's a confident move. It's also a calculated one. Flash is priced well below flagship models, runs at what Google calls "frontier performance at Flash-level latency and scale," and is positioned explicitly for agentic workflows rather than single-turn chat. The implication is clear: Google isn't chasing the "smartest AI" crown right now. It's chasing the one that does things, at scale, across everything it already owns.

That framing touches everything announced at this year's I/O, from the background agent called Gemini Spark to a pair of intelligent eyewear products built around hands-free Gemini access. The bet is on distribution, not just capability, and Google has more distribution than almost anyone.


Gemini 3.5 Flash Arrives as the New Center of Gravity

Flash is a multimodal model. Google's official model page lists text, image, video, audio, and PDF as accepted input types, with a 1 million token context window and 64,000 token output capacity. For developers building agents that need to ingest large documents, run long-horizon tasks, or loop through multi-step workflows, those numbers matter.

Availability is unusually broad for a first-day launch. Google confirmed Flash is live in the Gemini App, the Gemini API, Gemini Enterprise, the Gemini Enterprise Agent Platform, Google AI Mode, Google AI Studio, Google Antigravity, and Android Studio. That's eight surfaces, simultaneously, on launch day.

Model specs at launch: Gemini 3.5 Flash supports text, image, video, audio, and PDF input. Context window: 1M tokens in, 64k tokens out. Pricing: $1.50 per 1M input tokens, $9.00 per 1M output tokens. Context caching listed as free in current documentation.

The framing on Google's model page is notably specific. Flash is described as "best for frontier performance across agents and coding" and brings "advanced reasoning at Flash-level latency and scale." That's not a general-purpose pitch. It's an explicit targeting of the developer and enterprise workloads where agent usage is highest.

"Our most impressive model yet for agentic workflows."

Google DeepMind, official Gemini 3.5 Flash model page, May 19, 2026 -- Google DeepMind

The consumer framing is different but complementary. In the Gemini app and Search AI Mode, Flash isn't sold as an agent platform; it's just the model that powers answers. Most users won't know it's there. That invisibility is the point.

Benchmarks: Where Google Leads, and Where It Doesn't

Google published a benchmark table alongside Flash's launch. It's worth reading carefully, because the picture isn't uniform. Flash leads in agentic and multimodal categories, but competitors still edge it out in some coding and long-context tasks.

Benchmark What It Tests Gemini 3.5 Flash
MCP Atlas Multi-step workflows using MCP 83.6%
OSWorld-Verified Agentic computer use 78.4%
Terminal-bench 2.1 Agentic terminal coding 76.2%
MRCR v2 (128k) Long-context human recall 77.3%
Finance Agent v2 Financial analysis and decisions 57.9%
Toolathlon Real-world general tool use 56.5%
SWE-Bench Pro Single-attempt coding tasks 55.1%

The MCP Atlas score of 83.6% is the headline number for Google's enterprise pitch. MCP, the Model Context Protocol, has become a key interoperability standard for agents connecting to external tools, so a strong score there directly supports the claim that Flash can run real agentic workflows, not just toy benchmarks.

OSWorld-Verified at 78.4% is also notable. It measures how well a model can actually operate a computer, click through interfaces, complete tasks, and do it reliably. That score is directly relevant to Gemini Spark's pitch as a background task agent.

Where Flash doesn't lead: Google's own benchmark table shows GPT-5.5 ahead in certain terminal coding and long-context categories. The story here isn't that Google won every category, it's that Google is strong where it needs to be for its agent-first strategy, and weaker in areas where it's less exposed right now.

SWE-Bench Pro at 55.1% is the number most developers will scrutinize. Single-attempt coding on real-world software engineering tasks is a harsh test. It's a competitive number, not a dominant one, but Google's positioning of Flash as an agent model rather than a pure coding model gives it some cover.

Gemini Spark and the Always-On Agent

Gemini Spark is the most structurally significant product Google announced at I/O 2026. It's not a chatbot or a feature. Google describes it as a 24/7 personal AI agent that runs in the background, connects to Google apps, and handles tasks without requiring constant user input.

The autonomy framing is careful but meaningful. Google says Spark is "designed to check with you before taking major actions." That's an important constraint. It means Spark isn't a fully autonomous executor; it's an agent with a human-in-the-loop guardrail built in from the start. Whether that's a trust-building measure or a genuine architectural limit depends on how the product evolves.

"Works in the background 24/7, designed to check with you before taking major actions."

Google, Gemini Spark product page, May 19, 2026 -- Gemini.google

Access is limited at launch. Google's product page lists availability as trusted testers, AI Ultra subscribers in the U.S., and select business users. That's a small initial base, which means the real test of Spark's reliability and user adoption is still ahead.

🤖

Always On

Runs continuously in the background, handling tasks without requiring user-initiated sessions each time.

🔗

Google-Connected

Integrated with Gmail, Calendar, Drive, and other Google apps to execute multi-app workflows.

🛡️

Checks In First

Built-in human confirmation before major actions, keeping users in control of consequential steps.

🔬

Limited Access

Currently available to trusted testers, AI Ultra subscribers in the U.S., and select business users.

The strategic logic is straightforward: if you can get users to trust an always-on agent inside Google's own app ecosystem, you don't need them to switch to a competing platform. Every task Spark completes inside Google's walls is a task that didn't go to a rival agent. That's not a coincidence.

Smart Glasses, Two Ways: Audio First, Display Later

Google's wearables push at I/O 2026 was framed around "intelligent eyewear" rather than a single product. The official Android XR blog post describes two distinct form factors: audio glasses, which ship first, and display glasses, which follow. Both are built around hands-free Gemini access.

Audio glasses are launching "later this fall," according to Google. Users can invoke Gemini by saying "Hey Google" or tapping the frame, then ask it to complete tasks on their behalf. The pitch is "heads up, hands free," keeping users engaged with their environment rather than looking at a screen.

Display glasses haven't received a specific launch date. Google's blog groups them with audio glasses as part of the same intelligent eyewear line, but the sequencing suggests display hardware needs more time. That's consistent with the broader industry pattern where AR display quality remains a harder engineering problem than audio delivery.

What Google is calling this: The primary Google blog source uses "intelligent eyewear" throughout, not "Project Aura," which appears in secondary press coverage. The distinction matters if you're tracking official product naming versus early codenames.

The glasses aren't a standalone product pitch. They're a hardware extension of the same Gemini agent strategy. A pair of glasses that can execute tasks via Gemini in the background, for a user who's walking around, driving, or working with their hands, is a different use case than any phone-based agent. Google is building toward persistent ambient AI, and the glasses are the most visible expression of that direction.

The Real Cost of Running Agents at Scale

Flash is priced at $1.50 per 1M input tokens and $9.00 per 1M output tokens, per Google's official API pricing. Context caching is currently listed as free. On a per-token basis, that looks competitive with other frontier models.

The catch is how agents actually use tokens. A single agentic task can involve multiple tool calls, retries on failed steps, reading long documents as context, and generating detailed structured outputs. The effective bill for a real agent workload can multiply quickly, even at Flash prices.

Token Type Price per 1M Tokens Notes
Input tokens $1.50 Includes text, image, video, audio, PDF
Output tokens $9.00 6x more expensive than input; significant in generation-heavy agents
Context caching Free (current) Reduces repeated input costs; policy subject to change

The output token price deserves attention. At $9.00 per 1M tokens, output is six times more expensive than input. Agents that generate long-form responses, write code, or produce structured data at scale will see that ratio dominate their bills. Developers building on Flash need to design for output efficiency, not just input efficiency.

There's also a longer-term pricing risk. Context caching is currently free, which substantially reduces the cost of agents that re-read the same documents across multiple calls. That's a strong incentive to build on Flash now. But free caching is a promotional condition, not a guaranteed permanent one, and developers building production systems should model the cost with caching at some nonzero price.

Distribution vs. Trust: The Real Competition

The honest read on Google's I/O 2026 announcements is that this is a distribution play as much as a capability play. The Verge's I/O coverage captured the breadth: Gemini is now threaded across Search, Android, Workspace, and wearables. No competitor has a comparable installed base to push against.

That's the upside. The downside is that Google is making strong claims about agent reliability at a moment when agentic AI is still proving itself in production. Spark's "check with you before major actions" language is a hedge. It signals that Google knows users won't trust a fully autonomous agent yet, especially one with access to email, calendar, and documents.

The benchmark gaps matter here too. GPT-5.5 leading in some coding and long-context categories means enterprise developers evaluating agents for high-stakes workflows have real reasons to comparison-shop rather than default to Google. Distribution gets Google into the conversation; it doesn't close it.

  • Always-on agents in email and calendar raise data access and privacy questions that Google hasn't fully addressed in public documentation yet.
  • Bundling many launches at once can create the appearance of momentum while real-world adoption lags behind the announcement cadence.
  • Wearables depend on user behavior change, not just product quality, and behavior change takes longer than a product cycle.
  • Flash's benchmark table is self-reported by Google, so independent third-party verification will be the real test of the agentic claims.

None of those risks makes the I/O announcements less significant. They just define what "Google winning AI" would actually have to prove, which is real-world agent reliability, privacy trust, and user habit formation, not just launch-day benchmark tables.

Frequently Asked Questions

When is Gemini 3.5 Flash available?

Gemini 3.5 Flash is available immediately as of May 19, 2026, across the Gemini App, AI Mode in Search, Gemini API, Gemini Enterprise, Gemini Enterprise Agent Platform, Google AI Studio, Google Antigravity, and Android Studio.

How much does Gemini 3.5 Flash cost?

Google's official API pricing lists Gemini 3.5 Flash at $1.50 per 1 million input tokens and $9.00 per 1 million output tokens. Context caching is currently free, though this is a promotional condition subject to change.

What is Gemini Spark?

Gemini Spark is Google's 24/7 background AI agent, designed to connect to Google apps and execute tasks without requiring constant user input. It's built to confirm with users before taking major actions, and is currently available to trusted testers, AI Ultra subscribers in the U.S., and select business users.

When are Google's smart glasses launching?

Google confirmed audio glasses will launch "later this fall" in 2026. Display glasses are part of the same intelligent eyewear line but haven't received a specific release date. Both form factors use Gemini as the underlying AI layer.

Is Gemini 3.5 Flash multimodal?

Yes. Google's model page confirms Gemini 3.5 Flash accepts text, image, video, audio, and PDF as input types. It supports a 1 million token context window and up to 64,000 tokens of output.

How does Gemini 3.5 Flash compare to GPT-5.5?

Google's benchmark table shows Flash leading in agentic and multimodal categories, including MCP Atlas at 83.6% and OSWorld-Verified at 78.4%. GPT-5.5 leads in certain terminal coding and long-context benchmarks. Neither model dominates across all categories.

What surfaces does Gemini 3.5 Flash power?

As of launch, Flash powers the Gemini App, Google Search AI Mode, the Gemini API, Gemini Enterprise products, Google AI Studio, Google Antigravity, and Android Studio, covering consumer, developer, and enterprise surfaces simultaneously.

What is Google Antigravity?

Google Antigravity is listed as one of the eight surfaces where Gemini 3.5 Flash is available at launch, per Google's official model page. Specific product details weren't fully elaborated in launch documentation but it appears to be a developer or experimental platform surface.

The Bottom Line: A Platform Move, Not Just a Model Launch

Google used I/O 2026 to make Gemini 3.5 Flash the default model across its most important consumer surfaces, launch an always-on background agent, and announce a hardware line built around ambient AI access. Taken separately, each of those is a product update. Taken together, they're a coherent strategy: turn Gemini from a product you visit into infrastructure that runs underneath everything you already do.

The strategy is credible precisely because Google's distribution advantage is real. Hundreds of millions of users don't have to choose Gemini. They'll encounter it in Search, in Android, in Workspace, in the glasses they might put on this fall. That reach is something no challenger model, however strong on benchmarks, can replicate quickly.

What Google still has to prove is that agents work reliably enough to earn user trust, and that "reliable enough" translates into habit formation rather than a novelty cycle. The benchmark table is Google's self-assessment. The real scorecard is what Spark's users report six months from now, and whether the audio glasses create behavior change or end up in a drawer.

Watch For
01 Gemini Spark reliability reports from AI Ultra subscribers -- expect the first credible assessments within 60-90 days of broader rollout, and they'll define whether the always-on agent framing holds up outside controlled demos.
02 Audio glasses availability and reception this fall -- the "later this fall" launch window gives a narrow target; watch for preorder dates and whether Google expands access globally or limits initial availability to the U.S.
03 Third-party Gemini 3.5 Flash benchmarks -- Google's self-reported numbers tell one story; independent evaluations from developers running real agentic workloads will either confirm or complicate the MCP Atlas and OSWorld claims.
04 Context caching pricing -- currently listed as free, which makes Flash's effective cost substantially lower for agent-heavy workloads; any change to that policy will immediately reshape the developer economics Google is counting on to drive adoption.
Stay ahead of the curve. More on AI models and the agent era at NeuralWired.
Explore AI Models

Anthropic AI Agents Hit Wall Street: 10 Finance Tools (2026)

Anthropic's 10 AI Agents Just Hit Wall Street — And the Jobs Math Is Getting Uncomfortable | NeuralWired

Anthropic's 10 AI Agents Just Rewired Wall Street, and the Jobs Math Is Getting Uncomfortable

In 48 hours, Anthropic unveiled 10 ready-to-run banking agents, a $1.5 billion joint venture with Blackstone and Goldman Sachs, and a new model that leads every finance benchmark. The pitch: Claude becomes the operating layer for global capital markets. The catch: junior bankers are first in line.

Jamie Dimon and Dario Amodei don't share a stage by accident. When the CEO of JPMorgan Chase and the co-founder of Anthropic appeared together at an invite-only financial services briefing in New York on May 5, 2026, both men knew the cameras were watching. Dimon built a live Treasury asset-swap analysis dashboard from a blank Excel sheet in under 20 minutes. The message was unmistakable: this isn't a pilot program. It's production.

The New York event capped what Fortune called a "48-hour blitz" that saw Anthropic drop 10 pre-built AI agent templates for financial services, debut a Microsoft 365 integration spanning Excel, Word, PowerPoint and Outlook, announce new data partnerships with Moody's, FactSet, Morningstar, S&P Global and Dun & Bradstreet, and reveal a $1.5 billion joint venture with Blackstone, Hellman & Friedman and Goldman Sachs to embed Claude directly into hundreds of enterprises. The underlying model powering it all, Claude Opus 4.7, now leads the Vals AI Finance Agent benchmark with a score of 64.37%.

Anthropic first entered financial services in July 2025. Thirteen months later, it has Claude in production at JPMorgan Chase, Goldman Sachs, Citi, AIG and Visa. The trajectory isn't incremental. It's a structural push to become what every software vendor dreams of being: infrastructure.


The 48-Hour Wall Street Blitz

The timing was deliberate. On May 4, the day before the flagship New York event, FIS announced a partnership with Anthropic to build a Financial Crimes AI Agent. FIS isn't a minor player. The company powers roughly 12% of the global economy's transactions, deposits, payments, and credit operations. Embedding Claude inside that infrastructure means reaching thousands of financial institutions without asking any of them to switch vendors.

The Financial Crimes AI Agent compresses anti-money-laundering investigations from days to minutes. It automatically assembles evidence across a bank's core systems, evaluates activity against known typologies, and surfaces the highest-risk cases for human investigator review. BMO and Amalgamated Bank are the first development partners, with general availability planned for the second half of 2026. Crucially, client data stays within FIS-controlled infrastructure at all times; Claude functions as the reasoning layer, one step removed from the source data.

"The future is about a trusted provider who manages the data, who governs the agents, and who stands between your customers and the AI making decisions about their money."

Stephanie Ferris, CEO, FIS — FIS Press Release, May 4, 2026

Then came the joint venture. Anthropic, Blackstone and Hellman & Friedman each contributed roughly $300 million, with Goldman Sachs at $150 million; Apollo Global Management, General Atlantic, Leonard Green, GIC and Sequoia Capital also participated. The Wall Street Journal reported the $1.5 billion total, which Anthropic hasn't formally confirmed. Whatever the exact figure, the structure is novel: a private equity-backed entity designed to forward-deploy Claude directly into the portfolios of some of the world's largest PE firms. No AI company has previously had distribution at that scale.

Enterprise demand context: Anthropic CFO Krishna Rao said at the joint venture announcement that "enterprise demand for Claude is significantly outpacing any single delivery model." Separately, CEO Dario Amodei disclosed at the May 5 event that the company had projected 10x growth internally, only to see 80x adoption instead.

10 Agents, One Mission

The ten agent templates released May 5 aren't tools. They're reference architectures. Each packages three components: skills (domain knowledge and instructions for the task), connectors (governed, real-time access to external data), and subagents (additional Claude models called in for specific sub-tasks like comparables selection or methodology checks). The architecture is designed so that a bank's compliance, risk, and engineering teams can customize the templates without touching the underlying model.

📊

Pitch Agent

Hand it a target list. Get back a comps model in Excel, a pitchbook drafted in PowerPoint, and a cover note in Outlook. A full pitch package in hours.

🔍

KYC Screener

Screens know-your-customer files against verified identity data from Dun & Bradstreet's Commercial Graph and D-U-N-S system for auditable onboarding.

📅

Month-End Close

Handles accounting reconciliation, cross-checks entries against linked workbooks, and produces close narratives against a firm's own templates in Word.

📈

Market Researcher

Tracks sector and issuer developments, synthesizes news, filings, and broker research, and flags items for credit and risk review automatically.

⚖️

DCF / Comps Agent

Builds discounted cash flow models from filings and data feeds, audits formulas across linked workbooks, and runs sensitivity analyses in Excel.

🔐

Financial Crimes Agent

Via the FIS partnership: compresses AML investigations from days to minutes, assembles evidence, evaluates typologies, and surfaces high-risk cases for review.

The Microsoft 365 integration is what makes the whole stack practically usable. Once the Claude add-ins are installed, context carries across applications. An analyst who starts a DCF model in Excel doesn't re-explain the deal when the work shifts to a PowerPoint pitchbook. The cross-application memory is a genuine workflow change, not a cosmetic one. Outlook integration is listed as "coming soon."

Data access scales to match. Claude now connects to FactSet, S&P Capital IQ, MSCI, PitchBook, Morningstar, Chronograph, LSEG, Daloopa, IBISWorld, SS&C Intralinks, Third Bridge, and Verisk, alongside internal data warehouses, research repositories, and CRMs. All connections operate under governed access controls. The question of which connectors a firm enables, and what permissions each carries, is where compliance teams will spend most of their implementation time.

Compliance note: Anthropic explicitly states in its GitHub repository documentation that all agent outputs are drafts requiring qualified human review. The agents don't execute transactions, don't approve onboarding independently, and don't write directly into books of record. Every output requires sign-off from a licensed professional.

What the Banks Are Actually Saying

The closing panel at the May 5 event was unusual for how specific it got. Goldman Sachs CIO Marco Argenti, JPMorgan Chase CIO Lori Beer, and AIG CEO Peter Zaffino each described real deployment data, not roadmaps.

Argenti outlined three sequential waves of AI adoption at Goldman. First: empowering the technology team, roughly a third of the firm, to work at what he called "a completely different pace." Second: reimagining operational processes end-to-end. Third, and most consequential long-term: using AI to improve risk and investment decisions themselves.

"This is the first time that instead of buying infrastructure, you can actually buy intelligence."

Marco Argenti, CIO, Goldman Sachs — Fortune, May 5, 2026

AIG's Zaffino shared what may be the most striking benchmark from the event: Claude, out of the box, scored 88% accuracy against expert-level claims assessors in AIG's underwriting workflow. He was careful to frame the implication correctly. "The theory is, can it get better? Yes. But that assumes that the claims expert doesn't get better," he said. The human professional isn't static either.

Beer's remarks from JPMorgan were more cautionary. The bank has built a cyberthreat model for security analysts and integrated security intelligence directly into its software development process. Her point was structural: governance and risk frameworks have to be built into the platform from day one, not retrofitted. "There are capabilities we need, platforms we need to build, agent orchestration to protect and secure... We don't measure ROI on those things. They are must-dos," she told the audience.

Institution Deployment Stage Key Use Case Notable Data Point
Goldman Sachs Wave 2: operational redesign Trade accounting, client onboarding, research acceleration AI deployment scaling without proportional new hiring; COO John Waldron confirmed the bank is "scaling up without requiring much more hiring"
JPMorgan Chase Production across front and back office Cyberthreat modeling, wealth management coaching, dev tooling Connect Coach AI product boosted advisor capacity to handle more clients per advisor
AIG Production in underwriting and claims Claims assessment, policy underwriting support Claude scored 88% accuracy vs. expert claims assessors out of the box
BMO / Amalgamated Bank Development (general availability H2 2026) AML investigation via FIS Financial Crimes Agent First institutions to deploy the FIS co-designed agent
Commonwealth Bank of Australia Production Fraud prevention, customer service CTO Rodrigo Castillo called the partnership "foundational to our strategy to become a global leader in AI innovation in banking"
Bridgewater (AIA Labs) Production since 2023 Investment Analyst Assistant: Python code generation, data visualization, financial analysis Claude Opus 4 passed 5 of 7 levels of the Financial Modeling World Cup; 83% accuracy on complex Excel tasks

The Junior Banker Problem

Nicholas Lin spent two and a half years on Morgan Stanley's M&A team before moving to tech investing at Singapore's sovereign wealth fund. He's now Anthropic's product lead for financial services. Speaking to Bloomberg shortly before the May 5 launch, Lin said financial AI applications are "just a few months behind" coding applications, "which we've seen massive acceleration in." Thousands of coding jobs have already disappeared.

The ten agent templates released May 5 cover the exact tasks that define a junior analyst's day: pitchbooks, comps, earnings reviews, financial model construction, valuations. A video in the product announcement showed comparable company analysis completing in seconds. It's not a subtle implication.

"A lot of these problems we're hoping to solve are just so near and dear to my heart because I spent probably 75% of my time just doing this manual data analysis, PowerPoint creation, making sure that the text boxes really match."

Nicholas Lin, Product Lead for Financial Services, Anthropic — eFinancialCareers, May 2026

One junior banker, speaking anonymously to eFinancialCareers, confirmed the internal pressure is already real. His team has stopped hiring at analyst level. He's now expected to produce more output than before, with AI tools described by managing directors as a productivity multiplier rather than a workload reducer.

The macro picture is less clear-cut. A 2026 Cambridge Judge Business School report on AI in financial services found that 24% of industry respondents expect a net reduction in roles, up from 13% in the prior three years. But 25% expect significant reskilling without large net losses, and 10% expect a net increase in jobs. The distribution of outcomes is widening, not converging.

Deloitte's research puts potential upside at $3.5 million in additional front-office revenue per employee at the top 14 global investment banks, from productivity gains of 27% to 35% via generative AI. Those gains don't require firing people. They require the same headcount doing more, faster. Whether that translates to fewer new hires or fewer existing jobs depends on whether deal volumes grow to absorb the additional capacity. Right now, Goldman Sachs has actually cut its 2026 deal count outlook to roughly 100 IPOs, down from earlier projections, signaling the demand side isn't yet expanding to match AI supply.

What roles are actually at risk

  • Junior analyst and associate roles focused on modeling, formatting and data processing face the most direct pressure from agent templates.
  • KYC and AML analysts in compliance operations are the target of both the FIS Financial Crimes Agent and the standalone KYC Screener template.
  • Financial reporting roles at mid-market firms: 87% of CFOs at that tier are already turning to AI for financial reporting work, per PYMNTS research.
  • Senior and client-facing roles remain more insulated. They require contextual judgment, negotiation, and the kind of long-term relationship capital AI doesn't accumulate.

Regulators and the Mythos Shadow

Sitting behind all of Anthropic's financial services ambitions is a harder conversation that surfaced at the same May 5 event. Amodei warned publicly that Anthropic's restricted-access model, Claude Mythos Preview, has identified tens of thousands of high-severity software vulnerabilities, including nearly 300 in Firefox alone, and that there's a six-to-12-month window to patch them before adversarial AI systems catch up. Most of those vulnerabilities haven't been publicly disclosed because they remain unpatched.

An earlier Claude model found roughly 20 vulnerabilities in Firefox. Mythos found nearly 300. The scale of potential exploits has grown with each model generation. Anthropic has limited Mythos to a small number of partner companies precisely because of concerns about what criminals or adversarial nation-states could do with the same capability.

"The bad guys will exploit them if they are identified."

Dario Amodei, CEO, Anthropic — CNBC, May 5, 2026

UK financial regulators are taking the concern seriously. Officials from the Bank of England, Financial Conduct Authority and HM Treasury are holding urgent talks with banks and cybersecurity officials. The National Cyber Security Centre is involved. Major banks, insurers, and exchanges are to be warned about findings at a meeting scheduled within the fortnight following the Mythos disclosure. The matter is also set for discussion at the Cross Market Operational Resilience Group, co-chaired by the Bank of England's executive director for supervisory risk.

In the US, Treasury Secretary Scott Bessent convened Wall Street bank leaders separately to assess exposure. The dual reality is uncomfortable: the same company selling Wall Street its AI operating layer is simultaneously disclosing that AI has created a systemic vulnerability window the industry has roughly a year to close.

Regulatory signal: The Cambridge Judge 2026 report found that 78% of surveyed regulators view AI as significant or transformative for financial supervision by 2030, and that regulators are more concerned than industry about AI concentration risk, 43% versus 28%. Regulators are also more likely than industry to place primary accountability for AI outcomes on the regulated financial institution, not on the AI vendor.

OpenAI Is Watching

Anthropic isn't operating in a vacuum. OpenAI is pursuing a similar enterprise joint venture, reportedly raising $4 billion against a $10 billion valuation to create new channels for large-scale enterprise AI deals, per Bloomberg. The competitive dynamic is straightforward: both companies need enterprise contracts to justify the capital expenditures going into compute. Consumer subscriptions don't generate the kind of multi-year, high-margin commitments that make frontier model development economically viable at scale.

Anthropic's positioning in financial services rests on three differentiators: safety-first reputation, coding performance via Claude Code, and the depth of its vertical integration, specifically the combination of pre-built agents, Microsoft 365 plugins, and a growing ecosystem of data connectors. OpenAI's strength is brand recognition and developer mindshare. Which matters more to a Fortune 500 CIO depends on whether the bank's primary use case is customer-facing product development or back-office workflow automation.

Dimension Anthropic / Claude OpenAI / GPT
Finance-specific benchmark 64.37% on Vals AI Finance Agent (Claude Opus 4.7, category leader) Not disclosed on Vals AI Finance Agent as of publication
Pre-built finance agents 10 templates released May 2026; marketplace available now Finance-focused tools via plugins; no equivalent marketplace
Financial data partnerships FactSet, Moody's, Morningstar, PitchBook, S&P Capital IQ, MSCI, LSEG, Daloopa, Dun & Bradstreet, Verisk, Third Bridge Partnerships in progress; fewer publicly confirmed financial-data connectors
Enterprise joint venture $1.5B JV with Blackstone, H&F, Goldman Sachs (announced May 4, 2026) Reportedly pursuing similar structure; $4B raise against $10B valuation
Systemic risk posture Mythos vulnerability disclosure; proactive regulator engagement No equivalent public disclosure as of publication

The market data partnerships are where Anthropic's moat is deepest in the near term. A Claude agent that can natively reason over Moody's credit ratings, PitchBook private market data, and a firm's own internal research repository simultaneously is operationally different from a general-purpose chatbot with document upload. That kind of grounded, auditable reasoning is what enterprise risk officers need before signing off on production deployment.

Frequently Asked Questions

What are Anthropic's Claude finance agents?

Anthropic's Claude finance agents are 10 pre-built AI agent templates released in May 2026 for tasks including pitchbook creation, KYC screening, financial modeling, month-end close, and market research. Each runs as a plugin in Claude Cowork or Claude Code, and connects to major financial data providers like FactSet and Moody's. All outputs require human review before use.

Which banks are using Anthropic's Claude?

As of May 2026, Claude is in production at JPMorgan Chase, Goldman Sachs, Citi, AIG, Visa, Commonwealth Bank of Australia, and Bridgewater (via AIA Labs). BMO and Amalgamated Bank are in development with the FIS Financial Crimes AI Agent, with general availability planned for H2 2026.

Will AI replace investment banking jobs?

The short-term evidence shows hiring freezes at junior analyst levels rather than mass layoffs. A 2026 Cambridge report found 24% of financial industry respondents expect net job reductions, up from 13% in recent prior years. The jobs most at risk are modeling, formatting, KYC screening, and AML review. Senior client-facing roles are more insulated from near-term automation.

What is the Vals AI Finance Agent benchmark?

The Vals AI Finance Agent benchmark tests AI models on realistic financial analysis tasks, including equity research, financial modeling, and data synthesis. Claude Opus 4.7 currently leads the benchmark with a score of 64.37%. It is one of the primary publicly available evaluation frameworks for comparing AI performance on financial work.

What is the FIS Financial Crimes AI Agent?

The FIS Financial Crimes AI Agent is a tool co-designed by Anthropic and FIS that compresses anti-money-laundering alert investigations from days to minutes. It assembles evidence across a bank's core systems, evaluates transactions against known fraud typologies, and presents high-risk cases for human investigator review. Client data remains within FIS infrastructure throughout.

What is the Anthropic, Blackstone, and Goldman Sachs joint venture?

The joint venture, announced May 4, 2026, is a private equity-backed AI services company designed to embed Claude across hundreds of enterprises via forward-deployed engineering teams. Anthropic, Blackstone, and Hellman & Friedman each contributed roughly $300 million; Goldman Sachs contributed $150 million. The WSJ reported a $1.5 billion total, which Anthropic has not formally confirmed.

How does Claude Opus 4.7 differ from earlier Claude versions for financial work?

Claude Opus 4.7 is Anthropic's current flagship for financial tasks and leads industry benchmarks on the Vals AI Finance Agent evaluation. When deployed by FundamentalLabs on Excel work, Claude Opus 4 passed 5 of 7 levels in the Financial Modeling World Cup and scored 83% accuracy on complex Excel tasks. Earlier Claude models showed capable but lower performance across the same task categories.

What regulatory concerns exist around AI in financial services?

UK regulators from the Bank of England, FCA, and HM Treasury are currently assessing risks linked to Anthropic's Mythos model, which has identified tens of thousands of software vulnerabilities. The Cambridge Judge 2026 report found 78% of regulators view AI as transformative for financial supervision by 2030, with regulators more concerned than industry about AI concentration risk and accountability gaps.

The Operating Layer Play

Anthropic's financial services push makes sense as a business strategy only if you accept one underlying premise: that enterprise AI is a winner-take-most market, not a diversified one. If banks end up with one primary AI reasoning layer embedded across their Excel models, their compliance workflows, their AML systems, and their pitchbook assembly, then the company that occupies that layer collects the compound interest on every deal, every investigation, every quarter-end close. That's the prize Anthropic is building toward.

The joint venture with Blackstone and Goldman Sachs is the clearest signal of that ambition. Private equity firms have portfolio companies in manufacturing, healthcare, logistics, retail, and real estate. A forward-deployed Claude inside those companies, deployed through a PE-backed services firm, doesn't need Anthropic to sell each deal individually. The distribution builds itself.

What remains unresolved is the regulatory question. Anthropic has been unusually forthcoming about AI-generated vulnerabilities and systemic risks, a posture that distinguishes it from competitors but also raises the question of whether the product and the risk can be cleanly separated. You can't sell a bank on Claude as its compliance operating layer while simultaneously disclosing that AI has created a vulnerability window that bad actors may exploit before the industry patches it. Both things are true, and financial institutions are not historically comfortable operating in that kind of ambiguity.

The workforce question compounds the tension. Anthropic's own product lead describes the agent templates as solving problems that were "near and dear" to his heart as a junior banker. That's an honest framing. It's also a company describing its product as a direct replacement for early-career financial labor, in public, while pitching that same product to the firms that employ that labor. The banks aren't going to stop buying. But the junior analysts already on payroll are watching what their managing directors do next.

Watch For
01 FIS Financial Crimes AI Agent general availability, planned H2 2026. If BMO and Amalgamated Bank publish outcome data on AML investigation times and false-positive rates, it will be the first hard evidence of whether the agent performs in production as claimed.
02 Bank of England / FCA guidance on AI concentration risk in financial services, expected in the next regulatory cycle. The Cross Market Operational Resilience Group meeting within the fortnight of the Mythos disclosure will likely produce formal risk guidance for UK-regulated institutions.
03 Junior analyst headcount data at tier-one banks through Q3 2026 earnings calls. Goldman Sachs COO John Waldron has already confirmed the bank is scaling without proportional new hiring. Whether that trend appears in disclosed headcount figures will determine how quickly the workforce conversation moves from anecdote to data.
04 OpenAI's competing enterprise joint venture structure, reportedly being finalized against a $10 billion valuation. Its composition and distribution model will define whether Anthropic's $1.5 billion JV retains a structural advantage in PE-backed enterprise distribution or faces a direct counteroffer.
Stay ahead of the curve. More on AI and enterprise finance at NeuralWired.
Explore AI & Finance

Tuesday, 19 May 2026

Photonic AI Chip Breakthrough: Penn's 4 fJ Switch (2026)

@import url('https://fonts.googleapis.com/css2?family=Syne:wght@600;700;800&family=Inter:wght@400;500;600&display=swap');

Penn's 4 fJ Light Switch Could Finally Fix Photonic AI's Hardest Problem

A team at the University of Pennsylvania has demonstrated all-optical switching at just 4 femtojoules, targeting the nonlinear activation bottleneck that has long kept photonic chips out of serious AI workloads.

The most power-hungry moment in a photonic neural network isn't moving data. It's the split second when the system has to decide whether a signal matters. That decision, called a nonlinear activation, has stubbornly required converting light back to electricity and then back again, burning time and energy at every step. A new result from Penn may have found a way around it.

Published in Physical Review Letters on April 10, 2026, the work by Bo Zhen's group at Penn demonstrates strongly nonlinear nanocavity exciton-polaritons in a gate-tunable monolayer semiconductor. The device switches optically at roughly 4 femtojoules per operation, on picosecond timescales, without touching a single electron in conversion. That combination is genuinely unusual.

It's not a chip. It's not a product announcement. What it is, according to the paper and Penn's own announcement, is a materials-level demonstration that the specific kind of nonlinearity AI needs can happen in a photonic device at energies low enough to matter for real computing. The distance between that result and a working AI accelerator is still significant. But the bottleneck it addresses has been visible for years, and no one had closed it at this efficiency before.


The Physics Behind the Breakthrough

Exciton-polaritons are hybrid light-matter particles that form when photons inside a cavity couple strongly enough to electrons in a semiconductor that the two can't be described separately anymore. They behave partly like light (they move fast, they don't interact much with the lattice) and partly like matter (they can interact with each other). That second quality is the point.

The Penn team used a single-atom-thick layer of molybdenum diselenide (MoSe2) as the semiconductor, embedded in a photonic crystal nanocavity. The gate-tunable part means the researchers can dial the coupling strength electrically, giving them precise control over when the device is in strong-coupling mode and when it isn't. That control is what makes switching possible.

Technical note: The nonlinearity in this device comes from exciton dephasing at high polariton populations. As excitation increases, more excitons interact and lose phase coherence faster, which weakens the coupling between excitons and photons. Once that coupling drops below a threshold, the device exits the strong-coupling regime entirely. That phase transition is what produces the switching behavior described in the arXiv preprint.

The key word in the paper's title is "strongly." Previous demonstrations of polariton-based switching existed, but they either required cryogenic temperatures, operated at much higher energies, or lacked the gate-tunability needed to integrate into a real device stack. The Penn result works at accessible temperatures in a structure designed with practical integration in mind.

"The platform works by coupling photons with electrons in an atomically thin semiconductor so light can interact strongly enough to perform signal switching."

Bo Zhen, Jin K. Lee Presidential Associate Professor, University of Pennsylvania — Penn Today

The photonic crystal nanocavity matters too. Unlike earlier bulk optical approaches, the nanocavity concentrates the electromagnetic field into a tiny mode volume, amplifying the light-matter interaction enough to make polaritons at excitation levels far below what previous platforms needed. Smaller mode volume, lower switching energy. That's the chain of logic that gets you to 4 fJ.

Why 4 Femtojoules Matters

Four femtojoules is 0.000000000000004 joules. To calibrate: a typical electronic transistor switch in a modern logic circuit consumes somewhere between 1 and 100 femtojoules per operation depending on process node, voltage, and load. A state-of-the-art CMOS switch in a 3nm process operates around 1 to 5 fJ. Penn's optical switch is operating in that same range.

Switching Mechanism Typical Energy per Op Operating Speed Temperature
CMOS transistor (3nm) 1-5 fJ Sub-nanosecond Room temp
Mach-Zehnder modulator (Si photonics) 100-1000 fJ Sub-nanosecond Room temp
Earlier polariton switches 10-100 fJ (cryogenic) Picoseconds Cryogenic
Penn MoSe2 nanocavity (2026) ~4 fJ Picoseconds Accessible temp

The comparison to electronic switches matters because photonic AI has always had a problem with the conversion steps. You can transmit data in light at very low energy. But the moment you need a nonlinear operation, most architectures have converted back to electronics, done the computation, and then re-encoded into light. Each conversion burns energy and adds latency. If a photonic nonlinear element can operate at energies competitive with the electronics it replaces, the conversion penalty becomes avoidable.

Important caveat: The 4 fJ figure describes a single switching event in a lab demonstration, not a full inference workload or training run. Energy per operation in a deployed system depends on many additional factors: coupling losses, control overhead, memory access, and system architecture. The number establishes a lower bound on what's physically possible, not a projection of what a chip would consume.

Still, lower bounds are what determine whether a path is worth pursuing. Prior to results like this one, the lower bound for optical nonlinear switching was high enough that it was hard to argue photonics could match electronics on activation energy. That argument is harder to sustain now.

The AI Hardware Connection

Neural networks are, at their mathematical core, chains of matrix multiplications and nonlinear functions. Photonic chips have been excellent at the first part for years. Light traveling through interference patterns and beam splitters can implement matrix-vector products almost passively, with very low energy consumption per multiply-accumulate operation. That's why companies like Lightmatter and Luminous Computing have attracted serious investment: the linear math case for optical computing is real.

The nonlinear case has been the stumbling block. Every activation function, every threshold, every sigmoid or ReLU in a neural network requires a nonlinear element. In a hybrid optoelectronic system, that means a photodetector, an amplifier, a modulator, and a laser driver, all chained together. The overhead compounds with depth. A 50-layer network goes through 50 rounds of conversion. At scale, that's the dominant cost.

Linear Ops (Solved)

Photonic matrix multiplications via interferometers are already highly efficient. This is the established strength of optical computing architectures.

Nonlinear Ops (The Gap)

Activation functions require nonlinearity. Until now, achieving this optically at competitive energies has been the key unsolved problem for all-optical neural nets.

🔬

Penn's Contribution

A gate-tunable polariton switch that operates at 4 fJ provides a credible materials-level route to all-optical nonlinear activation at competitive energy.

🔗

Penn's 2025 Context

Penn had already demonstrated a programmable photonic chip for nonlinear neural networks in 2025, making this a deeper materials advance, not a first concept.

Penn's announcement frames the result specifically around this gap. According to the EurekAlert press release, one application target is processing camera data directly on a photonic chip, without round-tripping through digital electronics. That's a concrete use case where the latency and energy of optoelectronic conversion is genuinely a systems problem, not just a theoretical concern.

"Many photonic AI chips still need electronic conversion for nonlinear activation steps. This result is meant to reduce that bottleneck."

Research summary, University of Pennsylvania — Physical Review Letters, April 2026

From Lab Bench to Published Science

The research didn't appear overnight. The preprint landed on arXiv in November 2024, meaning the underlying measurements were complete well before the journal publication. Peer review at Physical Review Letters took roughly five months, which is fairly typical for a result of this type.

  • November 24, 2024: Preprint posted to arXiv. Title confirms the MoSe2 monolayer architecture and the all-optical switching claim at the femtojoule scale.
  • April 10, 2026: Paper published in Physical Review Letters (DOI: 10.1103/gc15-qsvf). PubMed indexing confirms peer review completion.
  • April 22, 2026: Penn Today publishes "Making 'light' work of computing," contextualizing the result for a general technical audience.
  • May 14, 2026: EurekAlert press distribution amplifies the AI angle and the camera-chip application case.
  • May 18, 2026: ScienceDaily republication extends the reach to science-adjacent audiences.

The publication in Physical Review Letters matters for credibility. PRL publishes roughly 3,000 letters per year across all of physics, with an acceptance rate under 25%. The review process for a result claiming device-level nonlinear switching at femtojoule energies would have required detailed scrutiny of the measurement methodology, the energy calibration, and the claims about the physical mechanism. The fact that it cleared that bar doesn't make it a finished technology, but it does mean the core numbers survived independent expert review.

What Still Stands in the Way

The skeptical reading of this result is both fair and important. A single device switching at 4 fJ in a well-controlled lab environment is not the same as a manufacturable nonlinear element in a deployed photonic AI chip. The gap between those two things involves several distinct engineering challenges, none of which the Penn paper claims to solve.

2D Material Yield and Uniformity

MoSe2 monolayers are grown or exfoliated, not printed from a mask like silicon. At production scale, getting consistent coupling quality across thousands or millions of nanocavities on a single die is an unsolved manufacturing problem. Defects in atomically thin materials produce wildly variable device performance.

Silicon Photonics Integration

The dominant photonic chip platform globally is silicon photonics, built on CMOS-compatible fabs. MoSe2 on a photonic crystal nanocavity is not natively compatible with that stack. Hybrid integration is possible in principle, but it adds process steps, reduces yield, and complicates packaging.

Control Electronics Overhead

The gate-tunable architecture requires electrical control signals to set the coupling condition. That control circuitry consumes power and adds complexity. The 4 fJ switching energy figure doesn't include the overhead of generating and routing those gate signals at scale.

Operating Conditions

The paper doesn't specify room-temperature operation explicitly in the abstract. Earlier polariton results required cryogenic cooling, which would make commercial deployment impractical. Penn's framing suggests accessibility, but independent replication under varied conditions will be needed before that's confirmed.

Perspective: A 2022 analysis in Physical Review Applied on photonic neural network energy efficiency found that switching energy figures from individual device demonstrations often don't account for system-level overhead, interconnect losses, and control costs. The field's history includes several "breakthrough" devices that looked impressive in isolation but didn't scale to useful system architectures.

Where This Fits in Photonic Computing

Photonic computing has had an interesting trajectory. The basic idea has been around since the 1980s, went through a hype cycle in the late 2010s alongside the first wave of AI hardware investment, and has since stratified into a few distinct camps: analog optical matrix processors (Lightmatter, Luminous), optical interconnects for data centers (Ayar Labs, Celestial AI), and fundamental physics research into all-optical computation (largely academic).

Penn's work sits firmly in the third category, with implications for the first. The 2025 Penn photonic chip for nonlinear neural networks was a system-level demonstration that this research group is thinking about practical architectures, not just device physics. The 2026 PRL paper goes one level deeper, to the materials and mechanism question: what physical system can provide the nonlinearity at the energy scale that would actually help?

That's a useful sequence. System-level work identifies the problem precisely; materials-level work attacks the root cause. The risk in photonic computing historically has been inverting that order, demonstrating clever materials without knowing where they fit in a real architecture. Penn appears to be working it in the right direction.

The broader field is watching. The nonlinear element problem isn't unique to Penn's approach. Other groups are pursuing phase-change materials, carrier-injection-based silicon modulators, and nano-optomechanical effects. Each has different tradeoffs on speed, energy, and manufacturability. Penn's polariton approach is now among the most energy-efficient demonstrations on record, which changes the competitive landscape for that specific figure of merit.

Frequently Asked Questions

What are exciton-polaritons and why do they matter for computing?

Exciton-polaritons are hybrid quantum particles that form when photons couple strongly to electron-hole pairs (excitons) in a semiconductor. They inherit properties from both light and matter, including the ability to interact with each other. That interaction is what enables nonlinear optical behavior, the key function needed for neural network activation layers.

How does 4 femtojoules compare to existing AI chip energy use?

Modern CMOS transistors in leading-edge processes switch at roughly 1 to 5 fJ per operation. Penn's 4 fJ optical switch is competitive with that figure, which is significant because earlier optical nonlinear devices typically consumed 10 to 1,000 times more energy per switch. The comparison only holds for the switching event itself, not full system power.

Does this mean photonic AI chips can replace GPUs?

Not yet, and not directly. This result addresses one specific bottleneck in photonic computing, the nonlinear activation function, at the device physics level. A complete AI accelerator requires memory, control logic, interconnects, and a manufacturable process. The Penn work is a necessary but not sufficient step toward that larger goal.

What is a photonic crystal nanocavity?

A photonic crystal nanocavity is a precisely engineered structure with a periodic pattern of holes or features that traps and concentrates light in a tiny volume. By reducing the mode volume, it strengthens the interaction between light and any material inside, making effects like strong exciton-photon coupling achievable at much lower optical power levels.

Who is Bo Zhen and what group did this work?

Bo Zhen is the Jin K. Lee Presidential Associate Professor at the University of Pennsylvania. His group works at the intersection of photonics and quantum materials. The 2026 Physical Review Letters paper is part of a broader research program that also produced a programmable photonic chip demonstration in 2025, establishing the group as a leader in applied photonic computing research.

What semiconductor material is used in the Penn device?

The device uses a monolayer of molybdenum diselenide (MoSe2), a transition metal dichalcogenide. In its single-atom-thick form, MoSe2 has strong light-matter interaction properties that bulk semiconductors lack. The monolayer is placed inside a photonic crystal nanocavity and coupled via an electrostatic gate that allows researchers to tune the coupling strength.

What are the main barriers before this technology reaches commercial AI hardware?

The main barriers are: consistent manufacturing of atomically thin MoSe2 at scale, integration with standard silicon photonics fabrication lines, control electronics overhead not captured in the switching energy figure, and verification of room-temperature operating conditions. Each is a meaningful engineering challenge requiring dedicated research programs.

How long has photonic computing been researched as an AI hardware approach?

Optical computing concepts date to the 1980s. The modern wave of interest in photonic AI hardware accelerated around 2017 to 2019, coinciding with the first deep learning hardware boom. Since then, companies like Lightmatter and academic groups at MIT, Stanford, and Penn have focused on specific solvable subproblems, with nonlinear activation being a persistent open question until recently.

What Comes Next

The Penn result doesn't collapse the distance between a lab device and a commercial AI chip. It does something more modest and more durable: it removes one item from the list of reasons that distance seemed impassable. The nonlinear activation problem at the physics level now has a credible, peer-reviewed answer at femtojoule energies. That's not the same as a product, but it's a prerequisite for one.

The photonic computing field's history is littered with demonstrations that went nowhere because they solved isolated device problems without addressing the system architecture around them. Penn's sequential research program, from nonlinear neural network chips to the deeper materials question, suggests a group that understands that failure mode. Whether the MoSe2 nanocavity platform survives contact with manufacturing reality is the next test. That test will happen in fabs, not in physics journals.

For AI hardware, the more immediate implication isn't about training frontier models on light. It's about inference at the edge. Camera chips, sensor arrays, robotics, and medical imaging all involve scenarios where processing speed and power consumption matter more than raw throughput, and where the cost of optoelectronic conversion is a real design constraint. If Penn's platform can be integrated into those architectures even partially, the payoff starts before any general-purpose optical GPU exists.

Watch For
01 Independent replication of the 4 fJ figure at room temperature, confirming operating conditions that matter for commercial viability. Expect preprints from competing groups within 12 to 18 months.
02 Penn or a partner fab demonstrating multi-device arrays with consistent coupling performance, the minimum threshold for any useful photonic circuit beyond single-device demonstrations.
03 Photonic AI chip companies (Lightmatter, Celestial AI, others) publicly addressing the nonlinear element strategy in their roadmaps. This result changes the viable options for how they architect activation layers.
04 DARPA or DOE program announcements targeting 2D-material photonic integration. Results at this energy level typically attract defense and national lab funding within 1 to 2 years of peer-reviewed publication.
Stay ahead of the curve. More on photonic computing and AI hardware at NeuralWired.
Explore AI Hardware

Gemini 3.5 Flash: Google's New Default AI Model (2026)

Google Bets Its AI Future on Gemini 3.5 Flash — NeuralWired In This Article Gemini 3.5 Flash Arrives B...