Details the Q-Former architecture: a 12-layer BERT-based model using 32 learnable query embeddings. These queries use cross-attention to extract visual information for MLLM input.Details the Q-Former architecture: a 12-layer BERT-based model using 32 learnable query embeddings. These queries use cross-attention to extract visual information for MLLM input.

Visual Prompt Generation: Cross-Attention in Q-Former

2025/11/20 00:00
Okuma süresi: 2 dk
Bu içerikle ilgili geri bildirim veya endişeleriniz için lütfen crypto.news@mexc.com üzerinden bizimle iletişime geçin.

Abstract and 1 Introduction

  1. Related Work

    2.1. Multimodal Learning

    2.2. Multiple Instance Learning

  2. Methodology

    3.1. Preliminaries and Notations

    3.2. Relations between Attention-based VPG and MIL

    3.3. MIVPG for Multiple Visual Inputs

    3.4. Unveiling Instance Correlation in MIVPG for Enhanced Multi-instance Scenarios

  3. Experiments and 4.1. General Setup

    4.2. Scenario 1: Samples with Single Image

    4.3. Scenario 2: Samples with Multiple Images, with Each Image as a General Embedding

    4.4. Scenario 3: Samples with Multiple Images, with Each Image Having Multiple Patches to be Considered and 4.5. Case Study

  4. Conclusion and References

\ Supplementary Material

A. Detailed Architecture of QFormer

B. Proof of Proposition

C. More Experiments

\ Figure 7. Overview of QFormer

A. Detailed Architecture of QFormer

The architecture overview is depicted in Figure 7. Specifically, QFormer is initialized as a BERT-based model[8] comprising a total of L = 12 layers. In contrast to typical BERT models that process textual inputs, QFormer takes R = 32 learnable query embeddings as inputs. These embeddings are utilized to extract visual information from the input visual data during Stage-1 pretraining in BLIP2[22]. Subsequently, they serve as visual prompt embeddings for the LLM inputs after projection.

\ Inside the QFormer, each layer includes a self-attention module composed of a Multi-Head Attention component and a Forward module (consisting of Linear, LayerNorm, and Residual Connection). The cross-attention module, initialized with random values, is inserted every G layers, where learnable query embeddings interact with visual embeddings. In the main paper, for the sake of conciseness, we condensed the representation of the multi-head attention and forward modules into self(cross) attention modules. Furthermore, we exclusively illustrated the modifications made to the cross-attention module in MIVPG, as the self-attention modules remain unchanged. The final QFormer output is represented by the last layer’s query embeddings.

\ For a more comprehensive understanding, readers are encouraged to refer to [22].

\

:::info Authors:

(1) Wenliang Zhong, The University of Texas at Arlington (wxz9204@mavs.uta.edu);

(2) Wenyi Wu, Amazon (wenyiwu@amazon.com);

(3) Qi Li, Amazon (qlimz@amazon.com);

(4) Rob Barton, Amazon (rab@amazon.com);

(5) Boxin Du, Amazon (boxin@amazon.com);

(6) Shioulin Sam, Amazon (shioulin@amazon.com);

(7) Karim Bouyarmane, Amazon (bouykari@amazon.com);

(8) Ismail Tutar, Amazon (ismailt@amazon.com);

(9) Junzhou Huang, The University of Texas at Arlington (jzhuang@uta.edu).

:::


:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Piyasa Fırsatı
Prompt Logosu
Prompt Fiyatı(PROMPT)
$0.03361
$0.03361$0.03361
+16.37%
USD
Prompt (PROMPT) Canlı Fiyat Grafiği
Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen crypto.news@mexc.com ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

BDACS, Woori Bank Launch South Korea’s First Won-Backed Stablecoin on Avalanche

BDACS, Woori Bank Launch South Korea’s First Won-Backed Stablecoin on Avalanche

The post BDACS, Woori Bank Launch South Korea’s First Won-Backed Stablecoin on Avalanche appeared on BitcoinEthereumNews.com. In brief Digital asset custodian BDACS has launched KRW1, South Korea’s first fully regulated won-backed stablecoin, through a partnership with Woori Bank. Each token maintains full collateralization with Korean won held in Woori Bank escrow, according to BDACS. The launch comes amid competing parliamentary bills that debate interest payments and capital requirements for stablecoin issuers. Digital asset custodian BDACS has launched KRW1, South Korea’s first fully regulated won-backed stablecoin, in partnership with Woori Bank. The announcement follows completion of a proof of concept validating technical infrastructure spanning fiat deposits, token issuance, and blockchain verification, as per a Thursday press release. Each KRW1 token maintains full collateralization through South Korean won held in escrow at Woori Bank, with real-time banking API integration providing transparent proof of reserves, according to BDACS’ statement. The company trademarked the KRW1 brand in December 2023, building infrastructure before the advent of formal regulations. KRW1 launched on the Avalanche blockchain, chosen for its “high-performance capabilities” and recognition by Korea’s Internet & Security Agency for “reliability in public-sector applications.” “The successful test pilot of KRW1 demonstrates the need for a highly-performant and reliable blockchain tailored for a regulatory-compliant stablecoin,” Justin Kim, Head of Asia at Ava Labs, said in the statement. BDACS envisions KRW1 serving remittances, payments, investments, and deposits, with public-sector deployment planned for low-cost payment and settlement systems in emergency relief disbursements. The company plans to expand KRW1 to additional blockchains and explore collaborations with global stablecoin networks, including potential partnerships with USD-backed issuers Circle and Tether, according to the press release. Stablecoins in Asia South Korean internet giant Kakao is also developing a won-pegged token through its Kaia blockchain, having registered trademarks including “KRWGlobal” and “KRWKaia” in August, Decrypt reported earlier. The launch comes as Korea’s neighbors advance their own stablecoin initiatives, with Japan’s JPYC…
Paylaş
BitcoinEthereumNews2025/09/18 19:28
Ripple CEO Reacts to BBB Rating for Ripple Prime, Lists Three Points It Validates

Ripple CEO Reacts to BBB Rating for Ripple Prime, Lists Three Points It Validates

The post Ripple CEO Reacts to BBB Rating for Ripple Prime, Lists Three Points It Validates appeared on BitcoinEthereumNews.com. Brad Garlinghouse, CEO of Ripple
Paylaş
BitcoinEthereumNews2026/04/03 11:28
US Dollar Index (DXY) Forecast: Critical Double Top Pattern Looms at 100.60 Resistance

US Dollar Index (DXY) Forecast: Critical Double Top Pattern Looms at 100.60 Resistance

BitcoinWorld US Dollar Index (DXY) Forecast: Critical Double Top Pattern Looms at 100.60 Resistance Financial analysts are closely monitoring the US Dollar Index
Paylaş
bitcoinworld2026/04/03 10:35

Trade GOLD, Share 1,000,000 USDT

Trade GOLD, Share 1,000,000 USDTTrade GOLD, Share 1,000,000 USDT

0 fees, up to 1,000x leverage, deep liquidity