Microsoft has introduced Critique, a new multi-model deep research system inside Researcher, the deep research agent in Microsoft 365 Copilot, as part of a broaderMicrosoft has introduced Critique, a new multi-model deep research system inside Researcher, the deep research agent in Microsoft 365 Copilot, as part of a broader

Microsoft Introduced Critique, A New Multi-Model Deep Research System In M365 Copilot

2026/04/06 14:00
6 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com
Microsoft Introduced Critique, A New Multi-Model Deep Research System In M365 Copilot

Microsoft has introduced Critique, a new multi-model deep research system inside Researcher, the deep research agent in Microsoft 365 Copilot, as part of a broader push to make Copilot feel more dependable for serious knowledge work instead of just fast drafting. 

According to Microsoft, Critique is designed for complex research tasks and works by splitting the job into two parts: one model handles planning, retrieval, synthesis, and drafting, while a second model reviews and refines the output before the final report is produced. Microsoft says the system uses models from frontier labs including OpenAI and Anthropic, and that it is available now through the company’s Frontier program. 

Reuters reported that in Critique’s current setup, OpenAI’s GPT generates the response and Anthropic’s Claude reviews it for accuracy and quality before the answer reaches the user. Microsoft has also said it wants this workflow to become bi-directional later on, allowing models to review each other in both directions. 

What Critique actually does inside Microsoft 365 Copilot

Microsoft’s own description makes it clear that Critique is not just a cosmetic feature or a new button slapped onto Copilot.It works inside Researcher in Microsoft 365 Copilot and is built for deeper tasks where getting it right matters just as much as getting it done fast. One model does the digging and drafts the report, while the second steps in like an editor, checking the facts, sharpening the structure, and helping turn it into a more reliable final piece.

Microsoft says the whole idea is to separate generation from evaluation, rather than asking one model to brainstorm, write, fact-check, and polish its own work all at once. That distinction matters because a lot of AI failure comes from exactly that one-model bottleneck. When a single system is asked to do everything, it can produce something that looks polished while quietly missing gaps, overreaching on claims, or leaning on weak evidence. 

Microsoft says Critique’s review layer is built around rubric-based evaluation, with attention to source reliability, report completeness, and strict evidence grounding. In plain English, the second model is there to ask whether the draft actually answered the question, whether the sourcing is solid, and whether the final narrative is supported instead of merely sounding confident. 

Microsoft is not pitching Critique as a side experiment

One of the more important details in Microsoft’s announcement is that Critique will be the default experience in Researcher when Auto is selected in the model picker. That signals the company sees this as more than an optional lab feature for power users. It is effectively treating multi-model review as the new baseline for deep research quality inside Microsoft 365 Copilot. That is a meaningful product choice, because it suggests Microsoft believes enterprise customers care less about raw response speed than they do about fewer hallucinations, stronger structure, and more confidence in the finished report. 

That also fits neatly into Microsoft’s broader messaging around Wave 3 of Microsoft 365 Copilot, where the company has been pushing the idea of Copilot as a “system for work” built on a multi-model advantage rather than on any single AI lab. In Microsoft’s framing, Copilot is meant to pull the best available intelligence from across the industry, grounded in work context through what it calls Work IQ and protected by enterprise data controls. Critique is one of the clearest examples yet of that strategy moving from marketing language into a visible product feature. 

The benchmark numbers are a big part of Microsoft’s sales pitch

Microsoft is not only saying Critique feels better. It is saying the system performed better on a formal benchmark. In its technical write-up, the company says it tested Critique on the DRACO benchmark, short for Deep Research Accuracy, Completeness, and Objectivity, which covers 100 complex research tasks across 10 domains. Microsoft says responses were judged across factual accuracy, breadth and depth of analysis, presentation quality, and citation quality, and that Critique outperformed the single-model version of Researcher across all four measures. 

The company highlighted the largest gains in breadth and depth of analysis, followed by presentation quality and factual accuracy. It also says the improvements were statistically significant and that Researcher with Critique delivered a +7.0 point aggregated score improvement, or +13.88% over Perplexity Deep Research (Claude Opus 4.6 model), which Microsoft described as the best system reported in the benchmark paper. 

Microsoft Introduced Critique, A New Multi-Model Deep Research System In M365 Copilot

Data | Source: Microsoft

That is an eye-catching claim, especially because the deep research race has become one of the most competitive fronts in enterprise AI. Research tools are no longer being judged only by whether they can gather information, but by whether they can assemble a report that feels decision-ready. 

Microsoft’s argument is that the review layer forces researchers to identify missing angles, tighten organization, challenge weak claims, and use citations more carefully. Whether customers experience those gains in real workflows will matter more than benchmark charts, but Microsoft is clearly trying to signal that this is a measurable quality jump rather than a vague model update. 

Council shows Microsoft is thinking beyond one “best answer”

Critique is not the only feature Microsoft introduced alongside this update. The company also launched Council, a multi-model comparison mode inside Researcher. Microsoft says Council runs Anthropic and OpenAI models simultaneously, allowing each to generate a full standalone report. A separate judge model then creates a distilled summary showing where the reports agree, where they diverge, and what each uniquely contributes. Microsoft Support describes this as Model Council, a mode that preserves both full reports and adds a comparison summary to help users decide which output is stronger or how to combine them. 

That is a very interesting signal about where enterprise AI may be heading. For a while, the industry behaved as if the goal was to find one model that could replace all the others. Microsoft’s latest move suggests the more realistic future may be one where companies do not trust any single model enough to make it the only voice in the room. 

The timing of Critique is not accidental. Microsoft has been under pressure to show that Microsoft 365 Copilot is becoming more useful, more differentiated, and more valuable as competition intensifies. 

Reuters tied the rollout of Critique and Council to Microsoft’s effort to improve Copilot adoption in a market where rivals including Google’s Gemini and Anthropic’s Claude products are pushing hard into workplace AI. Axios also noted that Microsoft’s multi-model strategy has another benefit: it shows the company is not locked into overdependence on OpenAI at a time when frontier model leadership can shift quickly. 

The post Microsoft Introduced Critique, A New Multi-Model Deep Research System In M365 Copilot appeared first on Metaverse Post.

Market Opportunity
DeepBook Logo
DeepBook Price(DEEP)
$0.028425
$0.028425$0.028425
+6.13%
USD
DeepBook (DEEP) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

$30,000 in PRL + 15,000 USDT

$30,000 in PRL + 15,000 USDT$30,000 in PRL + 15,000 USDT

Deposit & trade PRL to boost your rewards!