Computer Vision

Your weight: normal

0.

GeM-NR: Geometry-Aware Multi-View Editing for Nonrigid Scene Changes (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers propose GeM-NR, a method for editing nonrigid scene changes in multi-view scenarios, leveraging geometry-aware techniques.

computer-vision pattern-recognition
0.

Continual Visual and Verbal Learning Through a Child's Egocentric Input (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers propose a model that learns from a child's egocentric input, combining visual and verbal data to improve learning capabilities. The model is designed to mimic a child's learning process, allowing for continual learning and adaptation.

artificial-intelligence computer-vision pattern-recognition
0.

Who Needs Labels? Adapting Vision Foundation Models With the Metadata You Already Have (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers propose adapting vision foundation models using existing metadata, reducing the need for labeled data. This approach is presented in a paper on arXiv AI.

computer-vision metadata models
0.

REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image (shirleymaxx.github.io)

0 points 1 sources 1 minutes ago cluster

Researchers from Carnegie Mellon University have developed REST3D, a method to reconstruct physically stable 3D scenes from a single image. This is achieved through a combination of techniques that ensure visually consistent and interactive 3D scenes.

3d-reconstruction ai computer-vision
0.

Formalizing the Binding Problem (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers Lianghuan Huang et al. submitted a paper to arXiv, formalizing the binding problem in computer vision and pattern recognition. The paper, titled Formalizing the Binding Problem, explores the concept and its implications.

computer-vision pattern-recognition
0.

AdaCodec: A Predictive Visual Code for Video MLLMs (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers from various institutions have proposed AdaCodec, a predictive visual code for video MLLMs (Multimodal Large Language Models). The code is designed to improve the performance of video MLLMs by leveraging visual information.

computer-vision devtools ml
0.

Modeling Depth Ambiguity: A Mixture-Density Representation for Flying-Point-Free Depth Estimation (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers propose a mixture-density representation for flying-point-free depth estimation, addressing depth ambiguity in computer vision. The approach is described in a paper submitted to arXiv on June 1, 2026.

computer-vision depth-estimation models
0.

Why Not Hyperparameter-Friendly Optimisation? A Monotonic Adaptive Norm Rescaling Approach For Long-Tailed Recognition (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers propose a monotonic adaptive norm rescaling approach for long-tailed recognition, aiming to improve hyperparameter-friendliness in optimization.

computer-vision pattern-recognition
0.

Fast and Lightweight Novel View Synthesis with Differentiable Multiplane Image (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers Kaidi Zhang and Guanxu Zhu proposed a novel view synthesis method using differentiable multiplane images, achieving fast and lightweight results.

computer-vision pattern-recognition
0.

TunerDiT: Training-free Progressive Steering of Diffusion Transformer for Multi-Event Video Generation (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers propose TunerDiT, a training-free progressive steering method for multi-event video generation using diffusion transformers. This approach enables efficient video generation without requiring extensive training data.

computer-vision diffusion-transformers pattern-recognition
0.

Vision-Language Models Suppress Female Representations Under Ambiguous Input (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers found that vision-language models tend to suppress female representations when given ambiguous input, according to a study published on arXiv. The study analyzed the performance of these models on tasks involving gender classification.

bias computer-vision models
0.

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers introduce VideoMLA, a low-rank latent KV cache for minute-scale autoregressive video diffusion. This approach aims to improve video generation efficiency.

computer-vision pattern-recognition
0.

GPIC: A Giant Permissive Image Corpus for Visual Generation (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers introduce GPIC, a dataset of approximately 28 trillion pixels, comprising diverse internet images captioned by a state-of-the-art vision-language model. The dataset is permissively licensed for research and commercial use.

computer-vision generative-modeling image-dataset
0.

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers propose LocateAnything, a vision-language grounding model that uses parallel box decoding for fast and high-quality results, outperforming existing methods in various tasks.

artificial-intelligence computer-vision pattern-recognition
0.

When Eyes Betray AI: Social Gaze Consistency as a Semantic Cue for AI-Generated Image Detection (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers propose using social gaze consistency to detect AI-generated images, leveraging the fact that humans tend to gaze at specific points in images, which can be inconsistent in AI-generated content.

ai-generated-images computer-vision image-detection
0.

Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation (arxiv.org)

0 points 1 sources 5 hours ago cluster

Researchers proposed a method to improve the capacity of multimodal large language models for subject-driven generation, used in text-to-image synthesis applications.

computer-vision large-language-models pattern-recognition
0.

Channel-wise Vector Quantization (arxiv.org)

0 points 1 sources 5 hours ago cluster

Researchers proposed a novel method, Channel-wise Vector Quantization, for efficient image processing based on vector quantization, aiming to reduce computational complexity in computer vision tasks.

computer-vision pattern-recognition