Bibliography

Bibliography

[1]
Unknown Author Paper
Untitled
2023
[2]
Henrik Aanaes, Rasmus Jensen, George Vogiatzis, Engin Tola, Anders\bibnamedelima R. Dahl
Large‐Scale Data for Multiple‐View Stereopsis
2016
[3]
Amir Ahmadyan
Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations
2021
[4]
Yuval Alaluf, Or Patashnik, Zongze Wu, Asif Zamir, Eli Shechtman, Dani Lischinski, Daniel Cohen-Or Paper
Third Time's the Charm? Image and Video Editing with StyleGAN3
2022
[5]
Jay Alammar
The Illustrated Transformer
2018
[6]
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Juliette Milbach, João Carreira, Ethan Faulkner, David\bibnamedelima F. Ross, Rohit Girdhar
Flamingo: A Visual Language Model for Few-Shot Learning
2022
[7]
Bogdan Alexe, Thomas Deselaers, Vittorio Ferrari
Measuring the Objectness of Image Windows
2012
[8]
Nicholas\bibnamedelima Vieau Alger
Data-scalable Hessian preconditioning for distributed parameter PDE-constrained inverse problems
2019
[9]
Kara-Ali Aliev, Artem Sevastopolsky, Maria Kolos, Dmitry Ulyanov, Victor Lempitsky Paper
Neural Point-Based Graphics
2020
[10]
Xiang An, Yin Xie, Kaicheng Yang, Wenkang Zhang, Xiuwei Zhao, Zheng Cheng, Yirui Wang, Songcen Xu, Changrui Chen, Chunsheng Wu, Huajie Tan, Chunyuan Li, Jing Yang, Jie Yu, Xiyao Wang, Bin Qin, Yumeng Wang, Zizhen Yan, Ziyong Feng, Ziwei Liu, Bo Li, Jiankang Deng Paper
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
2025
[11]
{Andrej Karpathy
{convnetjs
2015
[12]
Appen Paper
Computer Vision vs Machine Vision: What's the Difference?
[13]
Relja Arandjelović, Andrew Zisserman
Look, Listen and Learn
2017
[14]
Martin Arjovsky, Soumith Chintala, L{'e Bottou Paper
Wasserstein GAN
2017
[15]
Maryam Armandpour, Yujia Li, Bryan Perozzi, Rami Al-Rfou, Tommi Jaakkola, Junchi Yan, Yao Ma, Danai Koutra Paper
Partitioning Gated Mechanisms for Graph Generation
2021
[16]
Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, Cordelia Schmid Paper
ViViT: A Video Vision Transformer
2021
[17]
Ali Athar, Jonathon Luiten, Paul Voigtlaender, Tarasha Khurana, Achal Dave, Bastian Leibe, Deva Ramanan
BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video
2023
[18]
Zhihu Author
Understanding Classifier-Free Guidance in Stable Diffusion
2023
[19]
Anas Awadalla, Irena Gao, Josh Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bitton, Samir Gadre, Shiori Sagawa, Jenia Jitsev, Simon Kornblith, Pang\bibnamedelima Wei Koh, Gabriel Ilharco, Mitchell Wortsman, Ludwig Schmidt Paper
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
2023
[20]
Jimmy Ba, Volodymyr Mnih, Koray Kavukcuoglu
Multiple Object Recognition with Visual Attention
2015
[21]
Moussa Baccouche, Franck Mamalet, Christian Wolf, Christophe Garcia, Atilla Baskurt
Sequential Deep Learning for Human Action Recognition
2011
[22]
Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio Paper
Neural Machine Translation by Jointly Learning to Align and Translate
2016
[23]
Jinze Bai, Benlin Yang, Zhengxiao Wang, Junjie Bai Paper
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
2023
[24]
Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, Jingren Zhou Paper
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
2023
[25]
Jun Bai
BasicTAD: An Empirical Strong Baseline for Temporal Action Detection
2023
[26]
Allison\bibnamedelima H. Baker, Alexander Pinard, Dorit\bibnamedelima M. Hammerling Paper
On a Structural Similarity Index Approach for Floating-Point Data
2022
[27]
Nicolas Ballas, Li Yao, Chris Pal, Aaron Courville
Delving Deeper into Convolutional Networks for Learning Video Representations
2016
[28]
Adrien Bardes, Jean Ponce, Yann LeCun
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
2022
[29]
Jonathan\bibnamedelima T. Barron, Ben Mildenhall, Dor Verbin, Pratul\bibnamedelima P. Srinivasan, Peter Hedman Paper
Mip-NeRF 360: Unbounded Anti-Aliased in-the-Wild Scene Rendering
2022
[30]
Jonathan\bibnamedelima T. Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, Pratul\bibnamedelima P. Srinivasan Paper
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields
2021
[31]
Jonathan\bibnamedelima T. Barron, Ben Mildenhall, Dor Verbin, Peter Hedman, Pratul\bibnamedelima P. Srinivasan Paper
Zip-NeRF: Anti-Aliased, Compositional Neural Representation for Editable 3D Scenes
2023
[32]
David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba Paper
Network Dissection: Quantifying Interpretability of Deep Visual Representations
2017
[33]
Maksym Bekuzarov
Losses Explained: Contrastive Loss
2022
[34]
Maksym Bekuzarov, Ariana Bermudez, Joon-Young Lee, Hao Li
XMem++: Production-level Video Segmentation From Few Annotated Frames
2023
[35]
Irwan Bello, William Fedus, Xianzhi Du, Ekin\bibnamedelima D. Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph Paper
Revisiting ResNets: Improved Training and Scaling Strategies
2021
[36]
Y. Bengio, P. Simard, P. Frasconi
Learning long-term dependencies with gradient descent is difficult
1994
[37]
Yoshua Bengio, Aaron Courville, Pascal Vincent
Representation learning: A review and new perspectives
2013
[38]
{Benjamin Paepper Paper
{Interactive Visualization of Stable Diffusion Image Embeddings
2023
[39]
James Bergstra, Yoshua Bengio Paper
Random Search for Hyper-Parameter Optimization
2012
[40]
James Bergstra, Yoshua Bengio Paper
Random Search for Hyper-Parameter Optimization
2012
[41]
Maxim Berman, Herve Jegou, Andrea Vedaldi, Iasonas Kokkinos, Matthijs Douze
MultiGrain: a Unified Image Embedding for Classes and Instances
2019
[42]
Gedas Bertasius, Heng Wang, Lorenzo Torresani Paper
Is Space-Time Attention All You Need for Video Understanding?
2021
[43]
Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, Filip Pavetic Paper
FlexiViT: One Model for All Patch Sizes
2023
[44]
Jia-Wang Bian, Yu Wang, Jiayu Li, Victor\bibnamedelima Adrian Prisacariu Paper
NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior
2023
[45]
Mikolaj Binkowski, Danica\bibnamedelima J. Sutherland, Michael Arbel, Arthur Gretton Paper
Demystifying MMD GANs
2018
[46]
Andreas Blattmann, Tim Dockhorn, Dominik Lorenz, Patrick Esser, Robin Rombach Paper
Stable Video Diffusion: Scaling Latent Video Diffusion Models
2023
[47]
Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan\bibnamedelima Mark Liao
Yolov4: Optimal speed and accuracy of object detection
2020
[48]
Daniel Bolya, Po-Yao Huang, Peize Sun, Jang\bibnamedelima Hyun Cho, Andrea Madotto, Chen Wei, Tengyu Ma, Jiale Zhi, Jathushan Rajasegaran, Hanoona Rasheed, Junke Wang, Marco Monteiro, Hu Xu, Shiyu Dong, Nikhila Ravi, Daniel Li, Piotr Dollár, Christoph Feichtenhofer Paper
Perception Encoder: The best visual embeddings are not at the output of the network
2025
[49]
Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Christoph Feichtenhofer, Judy Hoffman Paper
Token Merging: Your ViT But Faster
2023
[50]
Lukas Bossard, Matthieu Guillaumin, Luc Van\bibnamedelima Gool Paper
Food-101 -- Mining Discriminative Components with Random Forests
2014
[51]
Joshua\bibnamedelimb B.\bibnamedelimi Tenenbaum Brenden\bibnamedelimb M.\bibnamedelimi Lake
Human-Level Concept Learning Through Probabilistic Program Induction
2015
[52]
Andrew Brock, Jeff Donahue, Karen Simonyan
Large Scale GAN Training for High Fidelity Natural Image Synthesis
2019
[53]
Andrew Brock, Soham De, Samuel\bibnamedelima L. Smith, Karen Simonyan Paper
High-Performance Large-Scale Image Recognition Without Normalization
2021
[54]
Andrew Brock, Soham De, Samuel\bibnamedelima L. Smith, Karen Simonyan Paper
High-Performance Large-Scale Image Recognition Without Normalization
2021
[55]
Andrew Brock, Theodore Lim, J.M. Ritchie, Nick Weston
Neural Photo Editing with Introspective Adversarial Networks
2017
[56]
Rodney\bibnamedelima A Brooks, Thomas\bibnamedelima O Binford
Model-based three-dimensional interpretations of two-dimensional images
1979
[57]
Tim Brooks, Aleksander Holynski, Alexei\bibnamedelima A. Efros Paper
InstructPix2Pix: Learning to Follow Image Editing Instructions
2023
[58]
Tom\bibnamedelima B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel\bibnamedelima M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei Paper
Language Models are Few-Shot Learners
2020
[59]
Tom\bibnamedelima B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel\bibnamedelima M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei Paper
Language Models are Few-Shot Learners
2020
[60]
Joy Buolamwini, Timnit Gebru
Gender shades: Intersectional accuracy disparities in commercial gender classification
2018
[61]
Han Cai, Ligeng Zhu, Song Han Paper
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
2019
[62]
Zhi Cai, Songtao Liu, Guodong Wang, Zheng Ge, Xiangyu Zhang, Di Huang Paper
Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss
2024
[63]
John Canny
A computational approach to edge detection
1986
[64]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko Paper
End-to-End Object Detection with Transformers
2020
[65]
Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan\bibnamedelima Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras Paper
SAM 3: Segment Anything with Concepts
2025
[66]
Nicholas Carlini, David Wagner
Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods
2017
[67]
Nicholas Carlini, David Wagner
Towards Evaluating the Robustness of Neural Networks
2017
[68]
Nicholas Carlini, Mitchell Wortsman, Florian Tramer, Ivan Evtimov Paper
Universal and Transferable Adversarial Attacks on Aligned Language Models
2023
[69]
Mathilde Caron, Hugo Touvron, Ishan Misra
Emerging Properties in Self-Supervised Vision Transformers
2021
[70]
Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze
Deep Clustering for Unsupervised Learning of Visual Features
2018
[71]
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin
Emerging Properties in Self-Supervised Vision Transformers
2021
[72]
Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
2020
[73]
Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin Paper
Unsupervised Pre-Training of Image Features on Non-Curated Data
2019
[74]
João Carreira, Andrew Zisserman
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
2017
[75]
William Chan, Navdeep Jaitly, Quoc\bibnamedelima V. Le, Oriol Vinyals
Listen, Attend and Spell
2016
[76]
Angel\bibnamedelima X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, Fisher Yu Paper
ShapeNet: An Information-Rich 3D Model Repository
2015
[77]
Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William\bibnamedelima T. Freeman Paper
MaskGIT: Masked Generative Image Transformer
2022
[78]
Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David\bibnamedelima A. Ross, Jia Deng, Li Fei-Fei Paper
Rethinking the Faster R-CNN Architecture for Temporal Action Localization
2018
[79]
Anpei Chen, Zexiang Xu, Fuqiang Zhao, Xiaoshuai Zhang, Fanbo Xiang, Jingyi Yu, Hao Su Paper
MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo
2021
[80]
Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, Hao Su Paper
TensoRF: Tensorial Radiance Fields
2022
[81]
Jianbo Chen, Yujia Zhang, Huan Sharma, Jianfeng Yi, Cho-Jui Hsieh
HopSkipJumpAttack: A Query-Efficient Decision-Based Attack
2020
[82]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan\bibnamedelima L. Yuille
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
2017
[83]
Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever
Generative Pretraining From Pixels
2020
[84]
Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever
Generative Pretraining from Pixels
2020
[85]
Minghua Chen, Chen-Hsuan Lin, Ching-Yao Wang, Min Sun, Leonidas Guibas, Hao Su
Fantasia3D: Disentangling Geometry and Appearance for High-Quality Text-to-3D Content Creation
2023
[86]
Ricky\bibnamedelimb T.\bibnamedelimi Q. Chen, Yaron Lipman Paper
Flow Matching on General Geometries
2023
[87]
Ricky\bibnamedelimb T.\bibnamedelimi Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud Paper
Neural Ordinary Differential Equations
2019
[88]
Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton Paper
A Simple Framework for Contrastive Learning of Visual Representations
2020
[89]
Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton
SimCLRv2: A Stronger Baseline for Contrastive Learning
2020
[90]
X. Chen
Adaptive Image Transformer for One-Shot Object Detection
2021
[91]
Xinlei Chen
Rethinking Video Representation Learning via Per-Clip Supervision
2021
[92]
Xinlei Chen, Kaiming He
Exploring Simple Siamese Representation Learning
2021
[93]
Xinlei Chen, Saining Xie, Kaiming He Paper
{An Empirical Study of Training Self-Supervised Vision Transformers
2021
[94]
Xinlei Chen, Saining Xie, Kaiming He Paper
An Empirical Study of Training Self-Supervised Vision Transformers
2021
[95]
Xinlei Chen, Haoqi Fan, Ross Girshick, Kaiming He Paper
Improved Baselines with Momentum Contrastive Learning
2020
[96]
Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El\bibnamedelima Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu Paper
UNITER: Learning Universal Image-Text Representations
2020
[97]
Yu Chen, Gim\bibnamedelima Hee Lee Paper
Deep Bundle-Adjusting Generalizable Neural Radiance Fields
2023
[98]
Bowen Cheng, Alexander\bibnamedelima G. Schwing, Alexander Kirillov Paper
Per-Pixel Classification is Not All You Need for Semantic Segmentation
2021
[99]
Bowen Cheng, Ishan Misra, Alexander\bibnamedelima G. Schwing, Alexander Kirillov, Rohit Girdhar Paper
Masked-Attention Mask Transformer for Universal Image Segmentation
2022
[100]
Bowen Cheng, Yuhang Fang, Haotian Liu, Zhichao Li, Shaoshuai Zhang, Hengshuang Zhao, Jianfeng Gao, Yu Qiao, Jifeng Dai Paper
YOLO-World: Real-Time Open-Vocabulary Object Detection
2024
[101]
Ho\bibnamedelima Kei Cheng, Seoung\bibnamedelima Wug Oh, Brian Price, Joon-Young Lee, Alexander Schwing
Putting the Object Back into Video Object Segmentation
2024
[102]
Ming-Ming Cheng, Ziming Zhang, Wen-Yan Lin, Philip Torr
BING: Binarized normed gradients for objectness estimation at 300fps
2014
[103]
Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing Paper
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
2024
[104]
François Chollet
Xception: Deep Learning with Depthwise Separable Convolutions
2017
[105]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, Yoshua Bengio Paper
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
2014
[106]
Özgün Çiçek, Ahmed Abdulkadir, Soeren\bibnamedelima S. Lienkamp, Thomas Brox, Olaf Ronneberger
3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation
2016
[107]
Aidan Clark, Jeff Donahue, Karen Simonyan Paper
Adversarial Video Generation on Complex Datasets
2019
[108]
Djork-Arné Clevert
Fast and accurate deep network learning by exponential linear units (elus)
2015
[109]
Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter Paper
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
2016
[110]
Joseph\bibnamedelima Paul Cohen, Michael Luck, Sina Honari
Distribution Matching Losses Can Hallucinate Features in Medical Image Translation
2018
[111]
Ekin\bibnamedelima D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, Quoc\bibnamedelima V Le
AutoAugment: Learning Augmentation Policies from Data
2019
[112]
Ekin\bibnamedelima D. Cubuk, Barret Zoph, Jonathon Shlens, Quoc\bibnamedelima V. Le
RandAugment: Practical Data Augmentation with No Separate Search
2020
[113]
Angela Dai, Angel\bibnamedelima X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nie{ß
ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes
2017
[114]
Wenliang Dai, Junnan Li, Dongxu Li, Anthony\bibnamedelimb Meng\bibnamedelima Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi Paper
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
2023
[115]
Google DeepMind
Gemini: A Family of Highly Capable Multimodal Models
2023
[116]
Mostafa Dehghani, Basil Mustafa, Josip Djolonga, Jonathan Heek, Matthias Minderer, Mathilde Caron, Andreas Steiner, Joan Puigcerver, Robert Geirhos, Ibrahim Alabdulmohsin, Avital Oliver, Piotr Padlewski, Alexey Gritsenko, Mario Lučić, Neil Houlsby Paper
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
2023
[117]
Mitchell Deitke, Roozbeh Mottaghi, Manolis Savva, Luke Zettlemoyer, Dieter Fox
Objaverse: A Universe of Annotated 3D Objects
2023
[118]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Li Fei-Fei
ImageNet: A large-scale hierarchical image database
2009
[119]
Jacob Devlin, Bharath Gupta, Ross Girshick
Exploring Nearest Neighbor Approaches for Image Captioning
2015
[120]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova Paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019
[121]
Terrance DeVries, Graham\bibnamedelima W. Taylor Paper
Improved Regularization of Convolutional Neural Networks with Cutout
2017
[122]
Prafulla Dhariwal, Alex Nichol Paper
Diffusion Models Beat GANs on Image Synthesis
2021
[123]
Prafulla Dhariwal, Alexander Nichol
Diffusion Models Beat GANs on Image Synthesis
2021
[124]
Henghui Ding, Chang Liu, Shuting He, Xudong Jiang, Philip\bibnamedelimb H.\bibnamedelimi S. Torr, Song Bai
MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
2023
[125]
Shuangrui Ding, Rui Qian, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Yuwei Guo, Dahua Lin, Jiaqi Wang Paper
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
2024
[126]
Xiaohan Ding, Yuchen Guo, Guiguang Ding, Jungong Han
ACNet: Strengthening the Kernel Skeletons for Powerful {CNN
2019
[127]
Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, Jian Sun Paper
RepVGG: Making VGG-style ConvNets Great Again
2021
[128]
Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio Paper
Density estimation using Real NVP
2017
[129]
Carl Doersch, Abhinav Gupta, Alexei\bibnamedelima A. Efros
Unsupervised Visual Representation Learning by Context Prediction
2015
[130]
Jeff Donahue, Karen Simonyan
Large Scale Adversarial Representation Learning
2019
[131]
Jeff Donahue, Lisa\bibnamedelima Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description
2015
[132]
et\bibnamedelima al. Doron\bibnamedelima Levy
Deep Learning in Medical Imaging: Revolutionizing Radiology
2016
[133]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020
[134]
Danny Driess, Fei Xia, Mehdi\bibnamedelima S.\bibnamedelimi M. Sajjadi, Corey Lynch
PaLM-E: An Embodied Multimodal Language Model
2023
[135]
Quang\bibnamedelima Vu Duc, Trang Phung, Mai Nguyen, Bao\bibnamedelima Yen Nguyen, Thu\bibnamedelima Hien Nguyen, Ngoc\bibnamedelimb Hoang\bibnamedelima Thanh Dang, Yu-Dong Zhang, João\bibnamedelimb Manuel\bibnamedelimb R.\bibnamedelimi S. Tavares, Bo-Hao Chen
Self-Knowledge Distillation: An Efficient Approach for Falling Detection
2022
[136]
Vincent Dumoulin, Jonathon Shlens, Manjunath Kudlur
A learned representation for artistic style
2017
[137]
Emilien Dupont, Miguel\bibnamedelima Ángel Bautista, Alex Colburn, Aditya Sankar, Carlos Guestrin, Joshua Susskind, Qi Shan
Equivariant Neural Rendering
2020
[138]
Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson
With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations
2021
[139]
Dylan Ebert Paper
Introduction to 3D Gaussian Splatting
[140]
Alexei\bibnamedelima A. Efros, Thomas\bibnamedelima K. Leung
Texture Synthesis by Non‑Parametric Sampling
1999
[141]
David Eigen, Christian Puhrsch, Rob Fergus Paper
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
2014
[142]
Ron Eldan, Ohad Shamir
The power of depth for feedforward neural networks
2016
[143]
Kemal Erdem Paper
Understanding Region of Interest - Part 2 (RoI Align)
2020
[144]
Linus Ericsson, Henry Gouk, Timothy Hospedales
Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks
2021
[145]
Aleksandr Ermolov, Aliaksandr Siarohin, Enver Sangineto, Elisa Ricci, Nicu Sebe
Whitening and Coloring Batch Transform for GANs
2021
[146]
SM Eslami, Danilo\bibnamedelima Jimenez Rezende, Frédéric Besse, Fabio Viola, Ari Morcos, Marta Garnelo, Avraham Ruderman, Andrei Rusu, Daniel Danihelka
Neural Scene Representation and Rendering
2018
[147]
Stefano Esposito
KiloNeuS: A Versatile Neural Implicit Surface Representation for Real-Time Rendering
2022
[148]
Patrick Esser, Robin Rombach, Björn Ommer Paper
Taming Transformers for High-Resolution Image Synthesis
2021
[149]
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, Robin Rombach Paper
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
2024
[150]
Mark Everingham, Luc Van\bibnamedelima Gool, Christopher\bibnamedelima KI Williams, John Winn, Andrew Zisserman
The PASCAL visual object classes (VOC) challenge
2010
[151]
Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jiteng Dai, Chao-Yuan Baral, Alaaeldin El-Nouby, Rohit Girdhar, Armand Joulin
Multiscale Vision Transformers
2021
[152]
Haoqiang Fan, Hao Su, Leonidas\bibnamedelima J. Guibas
A Point Set Generation Network for 3D Object Reconstruction from a Single Image
2017
[153]
Christoph Feichtenhofer
X3D: Expanding Architectures for Efficient Video Recognition
2020
[154]
Christoph Feichtenhofer, Axel Pinz, Andrew Zisserman Paper
What Have We Learned from Deep Representations for Action Recognition?
2018
[155]
Christoph Feichtenhofer, Haoqi Fan, Yanghao Li, Kaiming He
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
2021
[156]
Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He Paper
Deep Insights into Convolutional Networks for Video Recognition
2019
[157]
Christoph Feichtenhofer, Haoqi Fan, Yanghao Li, Kaiming He
Masked Autoencoders as Spatiotemporal Learners
2022
[158]
Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He
SlowFast Networks for Video Recognition
2019
[159]
Martin\bibnamedelima A Fischler, Robert\bibnamedelima A Elschlager
The representation and matching of pictorial structures
1973
[160]
Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, Angjoo Kanazawa Paper
Plenoxels: Radiance Fields without Neural Networks
2021
[161]
Tsu-Jui Fu, Linjie Li
VIOLET: End-to-End Video-Language Transformers with Masked Visual-token Modeling
2021
[162]
Yifan Fu, Zhijie Liu, Bingchen Huang, Shuran Li, Kaixuan Wang, Hang Zhao
COLMAP-Free 3D Gaussian Splatting
2024
[163]
Kunihiko Fukushima
Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position
1980
[164]
Samir\bibnamedelima Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang\bibnamedelima Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt Paper
DataComp: In search of the next generation of multimodal datasets
2023
[165]
Chen Gao, Ayush Saraf, Johannes Kopf, Jia-Bin Huang
NeRF-Editing: Geometry Editing of Neural Radiance Fields
2022
[166]
Jiyang Gao, Chen Sun, Zhenheng Yang, Ram Nevatia
TALL: Temporal Activity Localization via Language Query
2017
[167]
Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei Zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao Paper
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
2023
[168]
Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky\bibnamedelimb T.\bibnamedelimi Q. Chen, Gabriel Synnaeve, Yossi Adi, Yaron Lipman Paper
Discrete Flow Matching
2024
[169]
Leon\bibnamedelima A. Gatys, Alexander\bibnamedelima S. Ecker, Matthias Bethge
Image Style Transfer Using Convolutional Neural Networks
2016
[170]
Leon\bibnamedelima A. Gatys, Alexander\bibnamedelima S. Ecker, Matthias Bethge Paper
Texture Synthesis Using Convolutional Neural Networks
2015
[171]
Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin\bibnamedelima D. Cubuk, Quoc\bibnamedelima V. Le, Barret Zoph Paper
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation
2021
[172]
Spyros Gidaris, Praveer Singh, Nikos Komodakis
Unsupervised Representation Learning by Predicting Image Rotations
2018
[173]
Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan\bibnamedelima Vasudev Alwala, Armand Joulin, Ishan Misra
ImageBind: One Embedding Space to Bind Them All
2023
[174]
Rohit Girdhar, Mannat Singh, Christoph Feichtenhofer, Haoqi Fan, Ishan Misra
OmniMAE: Single Model Masked Pretraining on Images and Videos
2023
[175]
Ross Girshick Paper
Fast R-CNN
2015
[176]
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik
Rich feature hierarchies for accurate object detection and semantic segmentation
2014
[177]
Georgia Gkioxari, Jitendra Malik, Justin Johnson Paper
Mesh R-CNN
2020
[178]
Xavier Glorot, Yoshua Bengio
Understanding the difficulty of training deep feedforward neural networks
2010
[179]
Gabriel Goh, Nick Cammarata, Chelsea Voss, Shan Carter, Michael Petrov, Ludwig Schubert, Alec Radford, Chris Olah
Multimodal neurons in artificial neural networks
2021
[180]
Ian\bibnamedelima J Goodfellow, Jonathon Shlens, Christian Szegedy
Explaining and harnessing adversarial examples
2014
[181]
Ian\bibnamedelima J. Goodfellow, Jonathon Shlens, Christian Szegedy
Explaining and Harnessing Adversarial Examples
2015
[182]
Jianping Gou, Baosheng Yu, Stephen\bibnamedelima J. Maybank, Dacheng Tao Paper
Knowledge Distillation: A Survey
2020
[183]
Benjamin Graham Paper
Fractional Max-Pooling
2015
[184]
Benjamin Graham, Laurens Maaten Paper
Submanifold Sparse Convolutional Networks
2017
[185]
Will Grathwohl, Ricky\bibnamedelimb T.\bibnamedelimi Q. Chen, Jesse Bettencourt, Ilya Sutskever, David Duvenaud Paper
FFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models
2019
[186]
Grauman, Westbury, Byrne, Chavis, al. Paper
Ego4D: Around the World in 3,000 Hours of Egocentric Video
2022
[187]
Karol Gregor, Ivo Danihelka, Alex Graves, Danilo\bibnamedelima Jimenez Rezende, Daan Wierstra
DRAW: A Recurrent Neural Network For Image Generation
2015
[188]
Jean-Bastien Grill, Florian Strub, Florent Altché
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
2020
[189]
Amos Gropp, Lior Yariv, Niv Haim, Matan Atzmon, Yaron Lipman Paper
Implicit Geometric Regularization for Learning Shapes
2020
[190]
Albert Gu, Tri Dao, Ankit Gupta, Stefano Ermon, Christopher Ré Paper
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
2023
[191]
Chunhui Gu, Chen Sun, David\bibnamedelima A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik Paper
AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions
2018
[192]
Jiatao Gu, Chris Bradbury, Zhengdong Lu, Victor\bibnamedelimb O.\bibnamedelimi K. Li, Xavier Chen
Non-autoregressive neural machine translation
2018
[193]
Xiuye Gu, Tsung-Yi Lin, Weicheng Kuo, Yin Cui Paper
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
2022
[194]
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville Paper
Improved Training of Wasserstein GANs
2017
[195]
Chuan Guo, Geoff Pleiss, Yu Sun, Kilian\bibnamedelima Q. Weinberger
On Calibration of Modern Neural Networks
2017
[196]
Agrim Gupta, Piotr Dollar, Ross Girshick
LVIS: A Dataset for Large Vocabulary Instance Segmentation
2019
[197]
Agrim Gupta, Piotr Dollar, Ross Girshick Paper
ODinW: Evaluating and Harnessing Object Detection in the Wild
2022
[198]
Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, Alexandre Alahi
Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks
2018
[199]
Raia Hadsell, Sumit Chopra, Yann LeCun
Dimensionality Reduction by Learning an Invariant Mapping
2006
[200]
Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, Chang Xu Paper
GhostNet: More Features from Cheap Operations
2020
[201]
Tengda Han, Weidi Xie, Andrew Zisserman
Memory-augmented Dense Predictive Coding for Video Representation Learning
2020
[202]
Ayaan Haque, Matthew Tancik, Alexei\bibnamedelima A. Efros, Aleksander Holynski, Angjoo Kanazawa Paper
Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions
2023
[203]
Chris Harris, Mike Stephens
A combined corner and edge detector
1988
[204]
Soufiane Hayou, Nikhil Ghosh, Bin Yu Paper
LoRA+: Efficient Low Rank Adaptation of Large Models
2024
[205]
Kaiming He, Ross Girshick, Piotr Dollár Paper
Rethinking ImageNet Pre-training
2018
[206]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Deep Residual Learning for Image Recognition
2016
[207]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification
2015
[208]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Identity mappings in deep residual networks
2016
[209]
Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross\bibnamedelima B. Girshick
Mask R-CNN
2020
[210]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick
Masked Autoencoders Are Scalable Vision Learners
2022
[211]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick
Momentum Contrast for Unsupervised Visual Representation Learning
2020
[212]
Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li Paper
Bag of Tricks for Image Classification with Convolutional Neural Networks
2018
[213]
Peter Hedman, Julien Philip, Toby Price, George Drettakis
Deep Blending for Free-Viewpoint Image-Based Rendering
2018
[214]
Olivier\bibnamedelima J. Hénaff, Aravind Srinivas, Jeffrey\bibnamedelima De Fauw, Ali Razavi, Carl Doersch, S.\bibnamedelimi M.\bibnamedelimi Ali Eslami, Aaron Oord Paper
Data-Efficient Image Recognition with Contrastive Predictive Coding
2020
[215]
Dan Hendrycks, Kevin Gimpel
Gaussian error linear units (gelus)
2016
[216]
Byeongho Heo, Song Park, Dongyoon Han, Sangdoo Yun Paper
Rotary Position Embedding for Vision Transformer
2024
[217]
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, Daniel Cohen-Or Paper
Prompt-to-Prompt Image Editing with Cross Attention Control
2022
[218]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Sepp Hochreiter Paper
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
2017
[219]
Geoffrey Hinton, Oriol Vinyals, Jeff Dean Paper
Distilling the Knowledge in a Neural Network
2015
[220]
Geoffrey\bibnamedelima E. Hinton, Ruslan\bibnamedelima R. Salakhutdinov
Reducing the Dimensionality of Data with Neural Networks
2006
[221]
Ester Hlav
Kaiming He Initialization in Neural Networks — Math Proof
2023
[222]
Ester Hlav
Xavier Glorot Initialization in Neural Networks: Math Proof
2023
[223]
Jonathan Ho, Ajay Jain, Pieter Abbeel Paper
Denoising Diffusion Probabilistic Models
2020
[224]
Jonathan Ho, Tim Salimans Paper
Classifier-Free Diffusion Guidance
2022
[225]
Jonathan Ho, Chitwan Saharia, William Chan, David\bibnamedelima J. Fleet, Mohammad Norouzi, Tim Salimans Paper
Cascaded Diffusion Models for High Fidelity Image Generation
2021
[226]
Quan Hoang, Tu Nguyen, Dinh Le, Minh Hoai Paper
MGAN: Training generative adversarial nets with multiple generators
2018
[227]
Sepp Hochreiter, Jürgen Schmidhuber
Long Short-Term Memory
1997
[228]
Peter Holderrieth, Marton Havasi, Jason Yim, David Lopez-Paz, Ricky\bibnamedelimb T.\bibnamedelimi Q. Chen, Yaron Lipman Paper
Generator Matching: Generative Modeling with Arbitrary Markov Processes
2024
[229]
Andrew Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
2017
[230]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc\bibnamedelima V. Le, Hartwig Adam
Searching for MobileNetV3
2019
[231]
Jeremy Howard, Sebastian Ruder
Universal Language Model Fine-tuning for Text Classification
2018
[232]
Ting-I Hsieh, Yi-Chen Lo, Hwann-Tzong Chen, Tyng-Luh Liu Paper
One-Shot Object Detection with Co-Attention and Co-Excitation
2019
[233]
Edward\bibnamedelima J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen Paper
LoRA: Low-Rank Adaptation of Large Language Models
2021
[234]
Jie Hu, Li Shen, Gang Sun Paper
Squeeze-and-Excitation Networks
2018
[235]
Qianjiang Hu, Xiao Wang, Wei Hu, Guo‑Jun Qi
Adversarial Contrast for Efficient Learning of Unsupervised Representations from Self‑Trained Negative Adversaries
2021
[236]
Weiming Hu, Qiang Wang, Li Zhang, Luca Bertinetto, Philip\bibnamedelimb H.\bibnamedelimi S. Torr Paper
SiamMask: A Framework for Fast Online Object Tracking and Segmentation
2022
[237]
Wenbo Hu, Yuling Wang, Lin Ma, Bangbang Yang, Lin Gao, Xiao Liu, Yuewen Ma Paper
Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields
2023
[238]
Xuefeng Hu, Ke Zhang, Lu Xia, Albert Chen, Jiajia Luo, Yuyin Sun, Ken Wang, Nan Qiao, Xiao Zeng, Min Sun, Cheng-Hao Kuo, Ram Nevatia Paper
ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation
2023
[239]
Ying Hu, Chenyi Zhuang, Pan Gao
DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer
2024
[240]
Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah\bibnamedelima A. Smith, Jiebo Luo
PromptCap: Prompt-guided Task-aware Image Captioning
2023
[241]
Zhengdong Hu, Yifan Sun, Jingdong Wang, Yi Yang Paper
{DAC
2023
[242]
Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, Kilian Weinberger Paper
Deep Networks with Stochastic Depth
2016
[243]
Gao Huang, Zhuang Liu, Laurens Maaten, Kilian\bibnamedelima Q. Weinberger Paper
Densely Connected Convolutional Networks
2018
[244]
Hai Huang, Randall Balestriero
ALLoRA: Adaptive Learning Rate Mitigates LoRA Fatal Flaws
2024
[245]
Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren, Zhou Zhao, Shinji Watanabe Paper
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
2023
[246]
Shihua Huang, Yongjie Hou, Longfei Liu, Xuanlong Yu, Xi Shen Paper
Real-Time Object Detection Meets {DINOv3
2025
[247]
Xiao\bibnamedelima Shi Huang, Felipe Perez, Jimmy Ba, Maksims Volkovs, Hal\bibnamedelima Daumé III, Aarti Singh Paper
Improving Transformer Optimization Through Better Initialization
2020
[248]
Xun Huang, Serge Belongie Paper
Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization
2017
[249]
David\bibnamedelima H. Hubel, Torsten\bibnamedelima N. Wiesel
Receptive fields of single neurones in the cat's striate cortex
1959
[250]
Jonathan Hui
StyleGAN and StyleGAN2: Learn to generate and control images
2020
[251]
Becoming Human Paper
All About Normalization
2018
[252]
{IDEA-Research
Grounded SAM 2: Ground and Track Anything in Videos
2024
[253]
{Intellindust AI Lab Paper
{DEIMv2
2025
[254]
Sergey Ioffe, Christian Szegedy Paper
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015
[255]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei\bibnamedelima A. Efros
Image-to-Image Translation with Conditional Adversarial Networks
2017
[256]
Jerome Jabri, Jean-Baptiste Alayrac, Andrew Zisserman
{STC
2020
[257]
Arthur Jacot, Franck Gabriel, Clément Hongler Paper
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
2018
[258]
Ajay Jain, Ben Mildenhall, Jonathan\bibnamedelima T. Barron, Pieter Abbeel, Ben Poole Paper
Zero-Shot Text-Guided Object Generation with Dream Fields
2022
[259]
Jitesh Jain, Jiachen Li, Mang\bibnamedelima Tik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi Paper
OneFormer: One Transformer To Rule Universal Image Segmentation
2023
[260]
Clement Jambon, Benedikt Kerbl, Georgios Kopanas, Sotiris Diolatzis, Tim Leimkühler, Georges Drettakis Paper
NeRFshop: Interactive Editing of Neural Radiance Fields
2023
[261]
Eric Jang, Shixiang Gu, Ben Poole Paper
Categorical Reparameterization with Gumbel-Softmax
2017
[262]
Rasmus Jensen, Anders Dahl, George Vogiatzis, Engin Tola, Henrik Aan{æ
Large scale multi-view stereopsis evaluation
2014
[263]
Shuiwang Ji, Wei Xu, Ming Yang, Kai Yu
3D Convolutional Neural Networks for Human Action Recognition
2010
[264]
Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc\bibnamedelima V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig Paper
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
2021
[265]
Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc\bibnamedelima V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig Paper
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
2021
[266]
Yu-Gang Jiang, Jingen Liu, Amir Roshan\bibnamedelima Zamir, George Toderici, Ivan Laptev, Mubarak Shah, Rahul Sukthankar
THUMOS Challenge: Action Recognition with a Large Number of Classes
2014
[267]
Mohammad\bibnamedelima Mahdi Johari, Yann Lepoittevin, François Fleuret Paper
GeoNeRF: Generalizing NeRF with Geometry Priors
2022
[268]
Justin Johnson, Alexandre Alahi, Li Fei-Fei
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
2016
[269]
Justin Johnson, Andrej Karpathy, Li Fei-Fei Paper
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
2015
[270]
Justin Johnson, Bharath Hariharan, Laurens Maaten, Judy Hoffman, Li Fei-Fei, C.\bibnamedelimi Lawrence Zitnick, Ross Girshick Paper
Inferring and Executing Programs for Visual Reasoning
2017
[271]
Damjan Kalajdzievski
A Rank Stabilization Scaling Factor for Fine‑Tuning with LoRA
2023
[272]
Aishwarya Kamath, Mannat Singh, Jean-Baptiste Alayrac, Mathilde Caron, Sagnik Goyal, Ishan Misra, Marc'Aurelio Ranzato, Gabriel Synnaeve, Armand Joulin
{MDETR
2021
[273]
Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, Taesung Park Paper
Scaling up GANs for Text-to-Image Synthesis
2023
[274]
Andrej Karpathy, Li Fei-Fei
Deep Visual-Semantic Alignments for Generating Image Descriptions
2015
[275]
Andrej Karpathy, Justin Johnson, Li Fei-Fei Paper
Visualizing and Understanding Recurrent Networks
2015
[276]
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei
Large-scale Video Classification with Convolutional Neural Networks
2014
[277]
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei
Large-scale video classification with convolutional neural networks
2014
[278]
Tero Karras, Samuli Laine, Timo Aila Paper
A Style-Based Generator Architecture for Generative Adversarial Networks
2019
[279]
Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, Timo Aila Paper
Alias-Free Generative Adversarial Networks
2021
[280]
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, Timo Aila Paper
Analyzing and Improving the Image Quality of StyleGAN
2020
[281]
Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen Paper
Progressive Growing of GANs for Improved Quality, Stability, and Variation
2018
[282]
Will Kay, João Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustapha Suleyman, Andrew Zisserman Paper
The Kinetics Human Action Video Dataset
2017
[283]
Amirhossein Kazemnejad Paper
Transformer Architecture: The Positional Encoding
2019
[284]
Guolin Ke, Di He
Rethinking Positional Encoding in Language Pre-training
2021
[285]
Lei Ke, Mingqiao Ye, Martin Danelljan, Yifan Liu, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu Paper
Segment Anything in High Quality
2023
[286]
Bernhard Kerbl, Georgios Kopanas, George Drettakis
Gaussian Surfels for Real-Time Rendering of Point Clouds
2024
[287]
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis Paper
3D Gaussian Splatting for Real-Time Radiance Field Rendering
2023
[288]
Justin Kerr, Chung\bibnamedelima Min Kim, Kenneth\bibnamedelima Y. Goldberg, Angjoo Kanazawa, Matthew Tancik Paper
LERF: Language Embedded Radiance Fields
2023
[289]
Nitish\bibnamedelima Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, Ping\bibnamedelimb Tak\bibnamedelima Peter Tang Paper
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
2017
[290]
Valentin Khrulkov, Ivan Oseledets Paper
Geometry Score: A Method for Comparing Generative Adversarial Networks
2018
[291]
Yannic Kilcher Paper
Flow Matching: A Unified Framework for Generative Models
2022
[292]
Diederik\bibnamedelima P Kingma, Max Welling Paper
Auto-Encoding Variational Bayes
2014
[293]
Diederik\bibnamedelima P. Kingma, Jimmy Ba Paper
Adam: A Method for Stochastic Optimization
2017
[294]
Diederik\bibnamedelima P. Kingma, Prafulla Dhariwal Paper
Glow: Generative Flow with Invertible 1x1 Convolutions
2018
[295]
Alexander Kirillov, Yuxin Wu, Kaiming He, Ross Girshick Paper
PointRend: Image Segmentation as Rendering
2020
[296]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander\bibnamedelima C. Berg, Wan-Yen Lo, Piotr Dollar, Ross\bibnamedelima B. Girshick
Segment Anything
2023
[297]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander\bibnamedelima C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick Paper
Segment Anything
2023
[298]
Günter Klambauer, Thomas Unterthiner, Andreas Mayr, Sepp Hochreiter
Self-normalizing neural networks
2017
[299]
Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, Vladlen Koltun Paper
Tanks and temples: benchmarking large-scale scene reconstruction
2017
[300]
Sebastian Koch, Nikolay Averkin, Tristan Lauth, Thomas\bibnamedelima R. Jones, Olaf Holtmannspötter, Ralf Habel, Hao\bibnamedelima R. Zhang, Nico Pietroni, Luigi Malomo, Niloy\bibnamedelima J. Mitra, Mario Botsch
ABC: A Big CAD Model Dataset For Geometric Deep Learning
2019
[301]
Jonas Kohler, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann, Ming Zhou, Klaus Neymeyr
Exponential convergence rates for batch normalization: The power of length-direction decoupling in non-convex optimization
2019
[302]
Nikita Kornilov, Petr Mokrov, Alexander Gasnikov, Alexander Korotin Paper
Optimal Flow Matching: Learning Straight Trajectories in Just One Step
2024
[303]
Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei Paper
3D Object Representations for Fine-Grained Categorization
2013
[304]
Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, Juan Carlos\bibnamedelima Niebles
Dense-Captioning Events in Videos
2017
[305]
Raghuraman Krishnamoorthi Paper
Quantizing deep convolutional networks for efficient inference: A whitepaper
2018
[306]
Alex Krizhevsky, Geoffrey Hinton
Learning Multiple Layers of Features from Tiny Images
2009
[307]
Alex Krizhevsky, Ilya Sutskever, Geoffrey\bibnamedelima E Hinton
ImageNet classification with deep convolutional neural networks
2012
[308]
Tejas\bibnamedelima D. Kulkarni, William Whitney, Pushmeet Kohli, Joshua\bibnamedelima B. Tenenbaum Paper
Deep Convolutional Inverse Graphics Network
2015
[309]
Ananya Kumar, Aditi Raghunathan, Robbie Jones, Tengyu Ma, Percy Liang Paper
Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution
2022
[310]
Rohit Kundu
The Beginner’s Guide to Contrastive Learning
2022
[311]
Weicheng Kuo, Yin Cui, Xiuye Gu, AJ Piergiovanni, Anelia Angelova Paper
F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
2023
[312]
Weicheng Kuo, Fred Bertsch, Wei Li, AJ Piergiovanni, Mohammad Saffar, Anelia Angelova Paper
FindIt: Generalized Localization with Natural Language Queries
2022
[313]
Hyunsu Kwon, Kangwook Lee, Xiaoli Li, Yogesh Balaji, Shang Wang, Alexei\bibnamedelima A. Efros, Jun-Yan Zhu
Diffusion-GAN: Training GANs with Diffusion
2023
[314]
Zihang Lai, Erika Lu, Weidi Xie
{MAST
2020
[315]
Magdalena Lazova, Thomas Müller, Edoardo Remelli, Pratul\bibnamedelima P. Srinivasan, Ben Mildenhall, Jonathan\bibnamedelima T. Barron Paper
Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation
2022
[316]
Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner
Gradient-Based Learning Applied to Document Recognition
1998
[317]
Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
2017
[318]
Doyup Lee, Chiheon Kim, Saehoon Kim, Minsu Cho, WOOK\bibnamedelima SHIN HAN, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh Paper
Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer
2022
[319]
Kuang-Huei Lee, Anurag Arnab, Sergio Guadarrama, John Canny, Ian Fischer Paper
Compressive Visual Representations
2021
[320]
Pil\bibnamedelima Sun Lee, Ye\bibnamedelima Na Paek, Hyoungseok Park, Sungmin Yoo, Hyunjung Shim
From Big to Small: Multi‑Scale Local Planar Guidance for Monocular Depth Estimation
2019
[321]
Sangwoo Lee, Junghyun Kwon, Minsu Cho, Jinwoo Shin
StyleGAN-T: Unlocking the Power of GANs with Transformer Backbones
2023
[322]
Jie Lei
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
2021
[323]
Jie Lei, Tamara\bibnamedelima L. Berg, Mohit Bansal
Q{\&
2021
[324]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
2020
[325]
Bo Li, Chunyuan Li, Qingyang Wu, Yong\bibnamedelima Jae Lee Paper
LLaVA-OneVision: Easy Visual Task Transfer
2024
[326]
Boyi Li, Kilian\bibnamedelima Q. Weinberger, Serge Belongie, Vladlen Koltun, René Ranftl Paper
Language-Driven Semantic Segmentation
2022
[327]
Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel\bibnamedelima M. Ni, Lei Zhang Paper
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
2022
[328]
Feng Li, Hao Zhang, Shilong Liu, Hang Su, Jun Zhu, Lei Zhang
DN-DETR: Accelerate DETR Training with Decoupled Denoising Anchor Boxes
2022
[329]
Feng Li, Renrui Zhang, Hao Zhang, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, Chunyuan Li Paper
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
2024
[330]
Feng Li, Hao Zhang, Huaizhe Xu, Shilong Liu, Lei Zhang, Lionel\bibnamedelima M. Ni, Heung-Yeung Shum Paper
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
2022
[331]
Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein Paper
Visualizing the Loss Landscape of Neural Nets
2018
[332]
Junnan Li, Ramprasaath\bibnamedelima R. Selvaraju, Akhilesh Gotmare, Shafiq Joty, Caiming Xiong, Steven\bibnamedelimb C.\bibnamedelimi H. Hoi Paper
Align Before Fuse: Vision and Language Representation Learning With Momentum Distillation
2021
[333]
Junnan Li, Dongxu Li, Silvio\bibnamedelima Savarese Hu, Steven\bibnamedelimb C.\bibnamedelimi H. Hoi
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
2023
[334]
Junnan Li, Dongxu Li, Caiming Xiong, Steven\bibnamedelimb C.\bibnamedelimi H. Hoi Paper
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
2022
[335]
KunChang Li, Yinan He, Yi Wang, Yizhuo Li, Wenhai Wang, Ping Luo, Yali Wang, Limin Wang, Yu Qiao Paper
VideoChat: Chat-Centric Video Understanding
2024
[336]
Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Yi Liu, Zun Wang, Jilan Xu, Guo Chen, Ping Luo, Limin Wang, Yu Qiao
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
2024
[337]
Kunchang Li, Yali Wang, Yizhuo Li, Yi Wang, Yinan He, Limin Wang, Yu Qiao Paper
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
2024
[338]
Liunian\bibnamedelima Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Jenq-Neng Hwang, Kai-Wei Chang, Jianfeng Gao Paper
Grounded Language-Image Pre-Training
2022
[339]
Xian Li
Temporal Aggregation Network for Video Recognition
2021
[340]
Xiang Li, Zhuang Li, Zehao Wu, Xiangyu Zhang, Jian Zhang
Understanding and Improving Layer Normalization
2022
[341]
Li, Yin, al. Paper
OSCAR: Object-Semantics Aligned Pre-training for Vision-Language Tasks
2020
[342]
Yali Li, Lei Ji, Xun Shi, Xiaolin Zhang, Yuchao Wang, Y. Zhang
TEA: Temporal Excitation and Aggregation for Action Recognition
2020
[343]
Yang Li, Si Si, Gang Li, Cho-Jui Hsieh, Samy Bengio Paper
Learnable Fourier Features for Multi-dimensional Spatial Positional Encoding
2021
[344]
Yanghao Li, Saining Xie, Jiteng Dai, Kaiming Lin, Piotr Dollar
Improved Multiscale Vision Transformers for Classification and Detection
2021
[345]
Yuhui Li, Yulin Wang, Shiji Song, Le Yang, Hong Zhang, Gao Huang Paper
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
2022
[346]
Yuxi Li
Detailed 2D-3D Joint Representation for Human Actions
2020
[347]
Yuzhe Li
TEINet: Towards an Efficient Architecture for Video Recognition
2021
[348]
Yuzhuo Li
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
2022
[349]
Zhengqi Li, Simon Niklaus, Noah Snavely, Oliver Wang Paper
Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes
2021
[350]
Zhizhong Li, Derek Hoiem
Learning without Forgetting
2016
[351]
Vladislav Lialin, Vijeta Deshpande, Xiaowei Yao, Anna Rumshisky Paper
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
2024
[352]
Chen-Hsuan Lin, Wei Gao, Ruiyu Li, Minh Vo, Yasutaka Furukawa Paper
BARF: Bundle-Adjusting Neural Radiance Fields
2021
[353]
Chen-Hsuan Lin, Jun Gao, Xueyan Zeng, Wei-Chiu Ma, Shichen Su, Jiahui Yu, Kevin Po, Xiaohui Shen, Simon Lucey, Angjoo Kanazawa
Magic3D: High-Resolution Text-to-3D Content Creation
2023
[354]
Ji Lin, Chuang Gan, Song Han
TSM: Temporal Shift Module for Efficient Video Understanding
2019
[355]
Jing Lin, Ailing Zeng, Haoqian Wang, Lei Zhang, Yu Li Paper
One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer
2023
[356]
Tianwei Lin
BMN: Boundary-Matching Network for Temporal Action Proposal Generation
2019
[357]
Tianwei Lin, Xiao Liu, Xin Li, Errui Ding, Shilei Wen
Learning Salient Boundary Feature for Anchor-free Temporal Action Localization
2021
[358]
Tsung-Yi Lin, Michael Maire, Serge Belongie
Microsoft COCO: Common objects in context
2014
[359]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie
Feature Pyramid Networks for Object Detection
2017
[360]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár Paper
Focal Loss for Dense Object Detection
2018
[361]
Yutong Lin, Yuhui Yuan, Zheng Zhang, Chen Li, Nanning Zheng, Han Hu Paper
DETR Doesn't Need Multi-Scale or Locality Design
2023
[362]
Zinan Lin, Ashish Khetan, Giulia Fanti, Sewoong Oh Paper
PacGAN: The power of two samples in generative adversarial networks
2018
[363]
Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky\bibnamedelimb T.\bibnamedelimi Q. Chen, David Lopez-Paz, Heli Ben-Hamu, Itai Gat Paper
Flow Matching Guide and Code
2024
[364]
Yotam\bibnamedelima Gideon Lipman, George Tucker, Shixiang Gu, Justin Gilmer Paper
Flow Matching for Generative Modeling
2022
[365]
Geert Litjens, Thijs Kooi, Babak\bibnamedelima E. Bejnordi, Arnaud\bibnamedelimb Arindra\bibnamedelima Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen\bibnamedelimb A.\bibnamedelimi W.\bibnamedelimi M. Van\bibnamedelimb Der\bibnamedelima Laak, Bram Van\bibnamedelima Ginneken, Clara\bibnamedelima I. Sánchez
A Survey on Deep Learning in Medical Image Analysis
2017
[366]
Hao Liu, Wilson Yan, Matei Zaharia, Pieter Abbeel Paper
World Model on Million-Length Video And Language With Blockwise RingAttention
2025
[367]
Liu, {et al.
LLaVA-NeXT-Video: {A
2024
[368]
Haotian Liu, Chunyuan Li, Qingyang Wu, Yong\bibnamedelima Jae Lee Paper
Visual Instruction Tuning
2023
[369]
Liu, Guo, Zollhöfer, al.
A Survey on Neural Radiance Fields
2023
[370]
Lingjie Liu, Jiatao Gu, Kyaw Zaw\bibnamedelima Lin, Tat-Seng Chua, Christian Theobalt, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, H. Lin Paper
Neural Sparse Voxel Fields
2020
[371]
Meng Liu
Temporal Adaptive Module for Video Recognition
2021
[372]
Ruyang Liu, Chen Li, Yixiao Ge, Ying Shan, Thomas\bibnamedelima H Li, Ge Li
One for all: Video conversation is feasible without video instruction tuning
2023
[373]
Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang\bibnamedelima Frank Wang, Kwang-Ting Cheng, Min-Hung Chen Paper
DoRA: Weight-Decomposed Low-Rank Adaptation
2024
[374]
Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Hang Su, Jun Zhu, Lei Zhang
DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR
2022
[375]
Shilong Liu, Yaoyuan Liang, Feng Li, Shijia Huang, Hao Zhang, Hang Su, Jun Zhu, Lei Zhang Paper
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding
2022
[376]
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang Paper
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
2023
[377]
Yu-Tao Liu
NeUDF: Learning Neural Unsigned Distance Fields with Volume Rendering
2023
[378]
Weiyang Liu, Yandong Wen, Zhiding Yu, Meng Yang, Li Hajime
Large-Margin Softmax Loss for Convolutional Neural Networks
2016
[379]
Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, Le Song Paper
SphereFace: Deep Hypersphere Embedding for Face Recognition
2017
[380]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov Paper
RoBERTa: A Robustly Optimized BERT Pretraining Approach
2019
[381]
Yuan Liu
TS2-Net: Token Shift and Selection Transformer for Video-Language Pretraining
2022
[382]
Yuan Liu, Sida Peng, Lingjie Liu, Qianqian Wang, Peng Wang, Christian Theobalt, Xiaowei Zhou, Wenping Wang Paper
Neural Rays for Occlusion-Aware Image-Based Rendering
2022
[383]
Ze Liu
End-to-End Temporal Action Detection with Transformer
2022
[384]
Liu, Ning, al.
Video Swin Transformer
2022
[385]
Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Ying Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong
Swin Transformer V2: Scaling Up Capacity and Resolution
2022
[386]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo Paper
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
2021
[387]
Zhaoyang Liu, Limin Wang, Wayne Wu, Chen Qian, Tong Lu Paper
TAM: Temporal Adaptive Module for Video Recognition
2021
[388]
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie Paper
A ConvNet for the 2020s
2022
[389]
Stephen Lombardi, Tomas Simon, Jason\bibnamedelima M. Saragih, Gabriel Schwartz, Andreas Lehrmann, Yaser Sheikh Paper
Neural Volumes: Learning Dynamic Renderable Volumes from Images
2019
[390]
Jonathan Long, Evan Shelhamer, Trevor Darrell
Fully Convolutional Networks for Semantic Segmentation
2015
[391]
William\bibnamedelima E. Lorensen, Harvey\bibnamedelima E. Cline Paper
Marching cubes: A high resolution 3D surface construction algorithm
1987
[392]
Ilya Loshchilov, Frank Hutter Paper
Decoupled Weight Decay Regularization
2019
[393]
David\bibnamedelima G Lowe
Three-dimensional object recognition from single two-dimensional images
1987
[394]
David\bibnamedelima G. Lowe
Object Recognition from Local Scale-Invariant Features
1999
[395]
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan LI, Jun Zhu, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh Paper
DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps
2022
[396]
Erika Lu, Forrester Cole, Tali Dekel, Weidi Xie, Andrew Zisserman, William\bibnamedelima T. Freeman, Frédo Durand Paper
Omnimatte: Associating Objects and Their Effects in Video
2021
[397]
Erika Lu, Forrester Cole, Tali Dekel, Weidi Xie, Andrew Zisserman, William\bibnamedelima T. Freeman, Frédo Durand Paper
OmnimatteRF: Dampened Global Transport for Layered Neural Rendering
2023
[398]
Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying\bibnamedelima Nian Wu, Song-Chun Zhu, Jianfeng Gao Paper
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
2023
[399]
Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, Ashwin Kalyan Paper
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
2022
[400]
Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, Olivier Bousquet Paper
Are GANs Created Equal? A Large-Scale Study
2018
[401]
Mario Lucic, Michael Tschannen, Manuel Ritter, Xiaohua Zhai, Neil Houlsby Paper
High-fidelity image generation with fewer labels
2019
[402]
Calvin Luo Paper
Diffusion Models from Scratch
2022
[403]
Honglu Luo
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
2022
[404]
Ruipu Luo, Ziwang Zhao, Min Yang, Zheming Yang, Minghui Qiu, Tao Wang, Zhongyu Wei, Yanhao Wang, Cen Chen Paper
Valley: Video Assistant with Large Language model Enhanced abilitY
2025
[405]
Wenjie Luo, Yujia Li, Raquel Urtasun, Richard Zemel Paper
Understanding the Effective Receptive Field in Deep Convolutional Neural Networks
2017
[406]
Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, Yi Liu Paper
{RT-DETRv2
2024
[407]
Bin Ma
X-CLIP: End-to-End Multi-Granular Contrastive Learning for Video-Text Retrieval
2022
[408]
Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, Jian Sun
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
2018
[409]
Laurens Maaten, Geoffrey Hinton Paper
Visualizing Data using t-SNE
2008
[410]
Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad\bibnamedelima Shahbaz Khan Paper
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
2024
[411]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu
Towards Deep Learning Models Resistant to Adversarial Attacks
2018
[412]
Mahin\bibnamedelima Khan Mahadi, Rummanur Rahad, Mohammad\bibnamedelima Ashraful Haque, Mirza\bibnamedelima Muntasir Nishat Paper
Gated recurrent unit (GRU)-based deep learning method for spectrum estimation and inverse modeling in plasmonic devices
2024
[413]
Turki\bibnamedelima Al Malki, Johannes\bibnamedelima L. Schönberger, Nathan Teteris, Noah Snavely Paper
Mega-NeRF: Scalable Construction of Large-Scale Neural Radiance Fields
2022
[414]
Karttikeya Mangalam, Raiymbek Akshulakov, Jitendra Malik Paper
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
2023
[415]
Kevis-Kokitsi Maninis, Kaifeng Chen, Soham Ghosh, Arjun Karpur, Koert Chen, Ye Xia, Bingyi Cao, Daniel Salz, Guangxing Han, Jan Dlabal, Dan Gnanapragasam, Mojtaba Seyedhosseini, Howard Zhou, Andre Araujo Paper
TIPS: Text-Image Pretraining with Spatial awareness
2025
[416]
Xudong Mao, Qing Li, Haoran Xie, Raymond\bibnamedelima YK Lau, Zhen Wang, Stephen Paul\bibnamedelima Smolley
Least Squares Generative Adversarial Networks
2017
[417]
David Marr
Vision: A Computational Investigation into the Human Representation and Processing of Visual Information
1982
[418]
Ricardo Martin-Brualla, Noha Radwan, Mehdi\bibnamedelimb S.\bibnamedelimi M. Sajjadi, Jonathan\bibnamedelima T. Barron, Alexey Dosovitskiy, Daniel Duckworth Paper
NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections
2021
[419]
Robert\bibnamedelima J. McCann
A Convexity Principle for Interacting Gases
1997
[420]
{Medium Contributor
NT–Xent Loss: Normalized Temperature‑Scaled Cross‑Entropy Loss
2021
[421]
Hongyuan Mei, Mohit Bansal, Matthew\bibnamedelima R. Walter
Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences
2016
[422]
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, Stefano Ermon Paper
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
2022
[423]
Fanxu Meng, Zhaohui Wang, Muhan Zhang Paper
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
2025
[424]
Lars Mescheder, Andreas Geiger, Sebastian Nowozin
Which training methods for GANs do actually converge?
2018
[425]
Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, Andreas Geiger
Occupancy networks: Learning 3d reconstruction in function space
2019
[426]
Moustafa Meshry, Dan\bibnamedelima B. Goldman, Sameh Khamis, Hugues Hoppe, Rohit Pandey, Noah Snavely, Ricardo Martin{-
Neural Rerendering in the Wild
2019
[427]
Gal Metzer, Elad Richardson, Or Patashnik, Raja Giryes, Daniel Cohen-Or Paper
Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures
2022
[428]
Ben Mildenhall, Pratul\bibnamedelima P. Srinivasan, Rodrigo Ortiz-Cayon, Nima\bibnamedelima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, Abhishek Kar Paper
Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines
2019
[429]
Ben Mildenhall, Pratul\bibnamedelima P. Srinivasan, Matthew Tancik, Jonathan\bibnamedelima T. Barron, Ravi Ramamoorthi, Ren Ng Paper
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
2020
[430]
Ben Mildenhall, Pratul\bibnamedelima P. Srinivasan, Matthew Tancik, Jonathan\bibnamedelima T. Barron, Emily\bibnamedelima L. Denton, Ravi Ramamoorthi, Ren Ng Paper
RawNeRF: Neural Radiance Fields from Noisy Raw Images
2022
[431]
Matthias Minderer, Alexey Dosovitskiy, Xiaohua Zhai Paper
Scaling Open-Vocabulary Object Detection
2024
[432]
Matthias Minderer, Alexey Gritsenko, Neil Houlsby Paper
Scaling Open-Vocabulary Object Detection
2024
[433]
Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby Paper
Simple Open-Vocabulary Object Detection with Vision Transformers
2022
[434]
Marvin Minsky, Seymour Papert
Perceptrons: An Introduction to Computational Geometry
1969
[435]
Mehdi Mirza, Simon Osindero Paper
Conditional Generative Adversarial Nets
2014
[436]
Ishan Misra, Laurens Maaten Paper
Self-Supervised Learning of Pretext-Invariant Representations
2019
[437]
Jovana Mitrovic, Brian McWilliams, Jacob Walker, Lars Buesing, Charles Blundell Paper
Representation Learning via Invariant Causal Mechanisms
2020
[438]
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida
Spectral Normalization for Generative Adversarial Networks
2018
[439]
Kaichun Mo
PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding
2019
[440]
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Pascal Frossard
Universal Adversarial Perturbations
2017
[441]
Alexander Mordvintsev, Chris Olah, Mike Tyka Paper
Inceptionism: Going Deeper into Neural Networks
2015
[442]
Tianxiang Mou, Pengcheng He, Weizhu Chen, Zicheng Liu, Yueting Zhuang Paper
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
2023
[443]
Thomas Müller, Alex Evans, Christoph Schied, Alexander Keller Paper
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
2022
[444]
Muhammad\bibnamedelima Ferjad Naeem, Yongqin Xian, Xiaohua Zhai, Lukas Hoyer, Luc\bibnamedelima Van Gool, Federico Tombari Paper
SILC: Improving Vision Language Pretraining with Self-Distillation
2023
[445]
Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Boaz Barak, Ilya Sutskever Paper
Deep Double Descent: Where Bigger Models and More Data Hurt
2021
[446]
Daniel Neimark, Omri Bar, Maya Zohar, Dotan Asselmann Paper
Video Transformer Network
2021
[447]
Anh Nguyen, Jason Yosinski, Jeff Clune Paper
Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks
2016
[448]
Bao Nguyen, Binh Nguyen, Viet\bibnamedelima Anh Nguyen Paper
Bellman Optimal Stepsize Straightening of Flow-Matching Models
2023
[449]
Alex Nichol, Prafulla Dhariwal Paper
Improved Denoising Diffusion Probabilistic Models
2021
[450]
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen Paper
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
2022
[451]
Michael Niemeyer, Lars Mescheder, Michael Oechsle, Andreas Geiger
Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision
2020
[452]
Michael Niemeyer, Jonathan\bibnamedelima T. Barron, Ben Mildenhall, Andreas Geiger, Noah Snavely
RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
2022
[453]
Jim Nilsson, Tomas Akenine-Möller Paper
Understanding SSIM
2020
[454]
Hyeonwoo Noh, Seunghoon Hong, Bohyung Han
Learning Deconvolution Network for Semantic Segmentation
2015
[455]
Michael Oechsle, Songyou Peng, Andreas Geiger Paper
UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction
2021
[456]
Seoung\bibnamedelima Wug Oh, Joon-Young Lee, Ning Xu, Seon\bibnamedelima Joo Kim, Bohyung Han
Video Object Segmentation using Space--Time Memory Networks
2019
[457]
Ozan Oktay, Jo Schlemper, Loic\bibnamedelima Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils\bibnamedelima Y Hammerla, Bernhard Kainz, Ben Glocker, Daniel Rueckert Paper
Attention U-Net: Learning Where to Look for the Pancreas
2018
[458]
Aaron Oord, Nal Kalchbrenner, Koray Kavukcuoglu Paper
Pixel Recurrent Neural Networks
2016
[459]
Aaron Oord, Yazhe Li, Oriol Vinyals Paper
Representation Learning with Contrastive Predictive Coding
2019
[460]
Aaron Oord, Oriol Vinyals, Koray Kavukcuoglu Paper
Neural Discrete Representation Learning
2018
[461]
Aaron Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu Paper
Conditional Image Generation with PixelCNN Decoders
2016
[462]
{OpenAI Paper
GPT-4V(ision) System Card
2023
[463]
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Armand Joulin, Piotr Bojanowski, Mathilde Caron, Hervé Jégou, Benjamin Lefaudeux, Julien Mairal, Patrick Labatut, Alexei Baevski, Ishan Misra, Nicolas Usunier, Hervé Jegou, Jeff Donahue, Benjamin Lefaudeux Paper
DINOv2: Learning Robust Visual Features without Supervision
2023
[464]
et\bibnamedelima al. Oriol\bibnamedelima Vinyals
Show and Tell: A Neural Image Caption Generator
2015
[465]
Junting Pan
Actor-Context-Actor Relation Network for Spatio-Temporal Action Detection
2021
[466]
Tianyu Pan
VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples
2021
[467]
Jeong\bibnamedelima Joon Park, Peter Florence, Julian Straub, Richard Newcombe, Steven Lovegrove Paper
DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation
2019
[468]
Keunhong Park, Utkarsh Sinha, Jonathan\bibnamedelima T. Barron, Sofien Bouaziz, Dan\bibnamedelima B. Goldman, Steven\bibnamedelima M. Seitz, Ricardo Martin-Brualla Paper
HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields
2021
[469]
Keunhong Park, Utkarsh Sinha, Peter Hedman, Jonathan\bibnamedelima T. Barron, Sofien Bouaziz, Dan\bibnamedelima B. Goldman, Steven\bibnamedelima M. Seitz, Ricardo Martin-Brualla Paper
Nerfies: Deformable Neural Radiance Fields
2021
[470]
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, Jun-Yan Zhu
Semantic Image Synthesis with Spatially-Adaptive Normalization
2019
[471]
Razvan Pascanu, Tomas Mikolov, Yoshua Bengio
On the Difficulty of Training Recurrent Neural Networks
2013
[472]
Deepak Pathak, Philipp Krähenbühl, Jeff Donahue, Trevor Darrell, Alexei\bibnamedelima A. Efros
Context Encoders: Feature Learning by Inpainting
2016
[473]
Kaushik Patnaik Paper
ROI Pool and Align: PyTorch Implementation
2020
[474]
Viorica Pătrăucean, Lucas Smaira, Ankush Gupta, Adrià Recasens
Perception Test: A Diagnostic Benchmark for Multimodal Video Models
2023
[475]
Mandela Patrick
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
2021
[476]
Mandela Patrick
MFormer: Masked Transformer for Video Action Recognition
2021
[477]
Judea Pearl
Causality
2009
[478]
William Peebles, Saining Xie Paper
Scalable Diffusion Models with Transformers
2023
[479]
Ethan Perez, Florian Strub, Harm Vries, Vincent Dumoulin, Aaron Courville Paper
FiLM: Visual Reasoning with a General Conditioning Layer
2017
[480]
Lawrence Perko
Differential Equations and Dynamical Systems
2013
[481]
Ed Pizzi, Sreya\bibnamedelima Dutta Roy, Sugosh\bibnamedelima Nagavara Ravindra, Priya Goyal, Matthijs Douze Paper
A Self-Supervised Descriptor for Image Copy Detection
2022
[482]
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach Paper
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
2023
[483]
Tomaso Poggio, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Miranda, Qianli Liao
Why and when can deep—but not shallow—networks avoid the curse of dimensionality: A review
2017
[484]
B.\bibnamedelimi T. Polyak, A.\bibnamedelimi B. Juditsky Paper
Acceleration of Stochastic Approximation by Averaging
1992
[485]
Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbeláez, Alexander Sorkine-Hornung, Luc Van\bibnamedelima Gool
The 2017 DAVIS Challenge on Video Object Segmentation
2017
[486]
Aram-Alexandre Pooladian, Heli Ben-Hamu, Carles Domingo-Enrich, Brandon Amos, Yaron Lipman, Ricky\bibnamedelimb T.\bibnamedelimi Q. Chen Paper
Multisample Flow Matching: Straightening Flows with Minibatch Couplings
2023
[487]
Ben Poole, Alex Nichol, Heewoo Jun, Kate Huang, David\bibnamedelima M. Sussillo, Sergey Levine, Pratul\bibnamedelima P. Srinivasan, Ben Mildenhall Paper
DreamFusion: Text-to-3D using 2D Diffusion
2022
[488]
Albert Pumarola, Enric Corona, Gerard Pons-Moll, Francesc Moreno-Noguer Paper
D-NeRF: Neural Radiance Fields for Dynamic Scenes
2021
[489]
Charles\bibnamedelima R. Qi, Hao Su, Kaichun Mo, Leonidas\bibnamedelima J. Guibas
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
2017
[490]
Charles\bibnamedelima R. Qi, Li Yi, Hao Su, Leonidas\bibnamedelima J. Guibas
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
2017
[491]
Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan\bibnamedelima A. Hammoud, Mohamed Elhoseiny, Bernard Ghanem
PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies
2022
[492]
Qian, Tighe, al.
Spatiotemporal Contrastive Video Representation Learning
2021
[493]
Danfeng Qin, Chas Leichner, Manolis Delakis, Marco Fornoni, Shixin Luo, Fan Yang, Weijun Wang, Colby Banbury, Chengxi Ye, Berkin Akin, Vaibhav Aggarwal, Tenghui Zhu, Daniele Moro, Andrew Howard Paper
MobileNetV4 -- Universal Models for the Mobile Ecosystem
2024
[494]
Qwen, :, Yang, Yang, Zhang, Hui, Zheng, Yu, Li, Liu, Huang, Wei, Lin, Yang, Tu, Zhang, Yang, Yang, Zhou, Lin, Dang, Lu, Bao, Yang, Yu, Li, Xue, Zhang, Zhu, Men, Lin, Li, Tang, Xia, Ren, Ren, Fan, Su, Zhang, Wan, Liu, Cui, Zhang, Qiu Paper
Qwen2.5 Technical Report
2025
[495]
Alec Radford, Luke Metz, Soumith Chintala Paper
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
2016
[496]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever
Language models are unsupervised multitask learners
2019
[497]
Alec Radford, Jong\bibnamedelima Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
Learning Transferable Visual Models From Natural Language Supervision
2021
[498]
Alec Radford, Jong\bibnamedelima Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever Paper
Learning Transferable Visual Models From Natural Language Supervision
2021
[499]
Ilija Radosavovic, Raj\bibnamedelima Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár Paper
Designing Network Design Spaces
2020
[500]
Ilija Radosavovic, Raj\bibnamedelima Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár Paper
Designing Network Design Spaces
2020
[501]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter\bibnamedelima J. Liu
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
2020
[502]
Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred Hamprecht, Yoshua Bengio, Aaron Courville
On the Spectral Bias of Neural Networks
2019
[503]
Prajit Ramachandran, Barret Zoph, Quoc\bibnamedelima V Le
Searching for activation functions
2017
[504]
Prajit Ramachandran, Barret Zoph, Quoc\bibnamedelima V Le
Swish: a self-gated activation function
2017
[505]
Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya, Jonathon Shlens Paper
Stand-Alone Self-Attention in Vision Models
2019
[506]
Sameera Ramasinghe, Lachlan Macdonald, Simon Lucey
On the Frequency Bias of Coordinate-MLPs
2022
[507]
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea\bibnamedelima Voss Agarwal, Alec Radford, Ilya Sutskever Paper
DALL-E: Creating Images from Text Descriptions
2021
[508]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen Paper
Hierarchical Text-Conditional Image Generation with CLIP Latents
2022
[509]
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever Paper
Zero-Shot Text-to-Image Generation
2021
[510]
René Ranftl, Alexey Bochkovskiy, Vladlen Koltun
Vision Transformers for Dense Prediction
2021
[511]
René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun
Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero‑Shot Cross‑Dataset Transfer
2022
[512]
Yongming Rao, Zuyan Liu, Wenliang Zhao, Jie Zhou, Jiwen Lu Paper
Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks
2022
[513]
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan\bibnamedelima Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, Christoph Feichtenhofer Paper
SAM 2: Segment Anything in Images and Videos
2024
[514]
Ali Razavi, Aaron Oord, Oriol Vinyals Paper
Generating Diverse High-Fidelity Images with VQ-VAE-2
2019
[515]
Albert Recasens, Paul Luc, Jean-Baptiste Alayrac, Luyu Wang, Florian Strub, Mateusz Malinowski, Michal Valko, Ivan Laptev, Josef Sivic, João Carreira Paper
Broaden Your Views for Self-Supervised Video Learning
2021
[516]
Joseph Redmon, Ali Farhadi
YOLO9000: better, faster, stronger
2017
[517]
Joseph Redmon, Ali Farhadi
Yolov3: An incremental improvement
2018
[518]
Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi Paper
You Only Look Once: Unified, Real-Time Object Detection
2016
[519]
Scott\bibnamedelima E. Reed, Zeynep Akata, Honglak Lee, Bernt Schiele
Generative Adversarial Text to Image Synthesis
2016
[520]
Scott\bibnamedelima E. Reed, Zeynep Akata, Honglak Lee, Bernt Schiele
Learning What and Where to Draw: Generative Adversarial What‑Where Networks
2016
[521]
Jeremy Reizenstein, Roman Shapovalov, David Novotny, Patrick Labatut, Natalia Neverova, Andrea Vedaldi
CO3D: Common Objects in 3D for Few-Shot View Synthesis
2021
[522]
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun
Faster {R-CNN
2015
[523]
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun Paper
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
2016
[524]
Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang Paper
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
2024
[525]
Tianhe Ren, Qing Jiang, Shilong Liu, Zhaoyang Zeng, Wenlong Liu, Han Gao, Hongjie Huang, Zhengyu Ma, Xiaoke Jiang, Yihao Chen, Yuda Xiong, Hao Zhang, Feng Li, Peijun Tang, Kent Yu, Lei Zhang Paper
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
2024
[526]
Jerome Revaud, Jon Almazan, Rafael\bibnamedelima Sampaio Rezende, Cesar\bibnamedelima Roberto Souza
Learning with Average Precision: Training Image Retrieval with a Listwise Loss
2019
[527]
Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, Silvio Savarese Paper
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
2019
[528]
Pierre\bibnamedelima H. Richemond, Jean-Baptiste Grill, Fabien Altché, Corentin Tallec, Florian Strub, Andrew Brock, Stephen Smith, Sumanth De, Razvan Pascanu, Bilal Piot, Michal Valko
BYOL works even without batch statistics
2020
[529]
Lawrence\bibnamedelima G. Roberts
Machine Perception of Three-Dimensional Solids
1963
[530]
Renan\bibnamedelima A. Rojas-Gomez, Karan Singhal, Ali Etemad, Alex Bijamov, Warren\bibnamedelima R. Morningstar, Philip\bibnamedelima Andrew Mansfield Paper
SASSL: Enhancing Self-Supervised Learning via Neural Style Transfer
2024
[531]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer Paper
High-Resolution Image Synthesis with Latent Diffusion Models
2022
[532]
Olaf Ronneberger, Philipp Fischer, Thomas Brox
U-Net: Convolutional Networks for Biomedical Image Segmentation
2015
[533]
Frank Rosenblatt
The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain
1958
[534]
Antoni Rosinol, Nicholas Hughes, Milad Ramezani, Samin Izadi, Abhinav Gupta, Frank Dellaert
SplaTAM: Splat, Track \& Map 3D Gaussians for Dense RGB-D SLAM
2024
[535]
Carsten Rother, Vladimir Kolmogorov, Andrew Blake Paper
"GrabCut": interactive foreground extraction using iterated graph cuts
2004
[536]
W. Rudin Paper
Principles of Mathematical Analysis
1976
[537]
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Assaf Shocher
DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation
2022
[538]
Rumelhart, Hinton, Williams, Rumelhart, McClelland, PDP\bibnamedelimb Research\bibnamedelima Group
Learning Representations by Back-Propagating Errors
1986
[539]
David Sage, Angelica Sage, Jose Solares, Philip Thomas
Logo-GAN-AE: Large-scale conditional generation from small data
2018
[540]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed\bibnamedelimb Kamyar\bibnamedelima Seyed Ghasemipour, Burcu\bibnamedelima Karagol Ayan, S.\bibnamedelimi Sara Mahdavi, Rapha\bibnamedelima Gontijo Lopes, Tim Salimans, Jonathan Ho, David\bibnamedelima J Fleet, Mohammad Norouzi Paper
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
2022
[541]
Mehdi\bibnamedelimb S.\bibnamedelimi M. Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, Sylvain Gelly Paper
Assessing Generative Models via Precision and Recall
2018
[542]
Tim Salimans, Jonathan Ho Paper
Progressive Distillation for Fast Sampling of Diffusion Models
2022
[543]
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen Paper
Improved Techniques for Training GANs
2016
[544]
Tim Salimans, Andrej Karpathy, Xi Chen, Diederik Kingma
PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications
2017
[545]
Unknown Author
SAM 2: Segment Anything in Images and Videos (official repository)
[546]
et\bibnamedelima al. Sander\bibnamedelima Dieleman
Rotation-Invariant Convolutional Neural Networks for Galaxy Morphology Prediction
2014
[547]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen
MobileNetV2: Inverted Residuals and Linear Bottlenecks
2018
[548]
Aditya Sanghi, Hang Chu, Joseph\bibnamedelima G. Lambourne, Kevin\bibnamedelima E. Moore, Kamal\bibnamedelima Rahimi Malekshan, Divyansh Aggarwal, Amir Jalal, Chin-Yi Cheng, Linjie Luo Paper
CLIP-Mesh: Generating {T
2022
[549]
Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, Aleksander Madry Paper
How Does Batch Normalization Help Optimization?
2018
[550]
Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, Aleksander Madry Paper
How Does Batch Normalization Help Optimization?
2019
[551]
Axel Sauer, Katja Schwarz, Andreas Geiger Paper
StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
2022
[552]
Andrew\bibnamedelima M. Saxe, James\bibnamedelima L. McClelland, Surya Ganguli
Exact Solutions to the Nonlinear Dynamics of Learning in Deep Linear Neural Networks
2014
[553]
Johannes\bibnamedelima L. Schonberger, Jan-Michael Frahm
Structure-From-Motion Revisited
2016
[554]
Florian Schroff, Dmitry Kalenichenko, James Philbin Paper
FaceNet: A unified embedding for face recognition and clustering
2015
[555]
Christoph Schuhmann, Richard Vencu, Romain Beaumont, Robert Kaczmarczyk, Clayton Mullis, Aarush Katta, Theo Coombes, Jenia Jitsev, Aran Komatsuzaki Paper
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
2021
[556]
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Shukla, Andrew Kurtz, Julia Kreutzer, Holger Schwenk, Thomas Deselaers, Wojciech Galuba, Andrew Brock
LAION-5B: An open large-scale dataset for training next generation image-text models
2022
[557]
Towards\bibnamedelima Data Science Paper
A Basic Introduction to Separable Convolutions
2023
[558]
Steven\bibnamedelima M. Seitz, Brian Curless, James Diebel, Daniel Scharstein, Richard Szeliski
A Comparison and Evaluation of Multi‐View Stereo Reconstruction Algorithms
2006
[559]
Ramprasaath\bibnamedelima R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017
[560]
Zhenwei Shao, Zhou Yu, Meng Wang, Jun Yu
Prompting Large Language Models with Answer Heuristics for Knowledge-based VQA
2023
[561]
Peter Shaw, Jakob Uszkoreit, Ashish Vaswani
Self-attention with relative position representations
2018
[562]
Noam Shazeer Paper
Fast Transformer Decoding: One Write-Head is All You Need
2019
[563]
Jianbo Shi, Jitendra Malik
Normalized cuts and image segmentation
2000
[564]
Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew\bibnamedelima P. Aitken, Rob Bishop, Daniel Rueckert, Zehan Wang Paper
Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network
2016
[565]
Daeyun Shin, Charless\bibnamedelima C. Fowlkes, Derek Hoiem Paper
Pixels, voxels, and views: A study of shape representations for single view 3D object shape prediction
2018
[566]
Yao Shu, Zhongxiang Dai, Zhaoxuan Wu, Bryan\bibnamedelimb Kian\bibnamedelima Hsiang Low Paper
Unifying and Boosting Gradient-Based Training-Free Neural Architecture Search
2022
[567]
Tim Shuttleworth, Ananya Vyas, Suriya Gunasekar, Pratik Chaudhari
An Illusion of Equivalence: Revisiting Low-Rank Adaptation and Full Fine-Tuning
2024
[568]
K. Simek Paper
Understanding Camera Calibration and Intrinsics
2013
[569]
Oriane Simeoni, Huy\bibnamedelima V. Vo, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julien Mairal, Hervé Jégou, Patrick Labatut, Piotr Bojanowski Paper
DINOv3
2025
[570]
Karen Simonyan, Andrea Vedaldi, Andrew Zisserman Paper
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
2014
[571]
Karen Simonyan, Andrew Zisserman
Two-Stream Convolutional Networks for Action Recognition in Videos
2014
[572]
Karen Simonyan, Andrew Zisserman Paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2015
[573]
Vincent Sitzmann, Michael Zollhöfer, Gordon Wetzstein Paper
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
2019
[574]
Edward\bibnamedelima J. Smith, Scott Fujimoto, Adriana Romero, David Meger Paper
GEOMetrics: Exploiting Geometric Structure for Graph-Encoded Objects
2019
[575]
Leslie\bibnamedelima N. Smith Paper
Cyclical Learning Rates for Training Neural Networks
2017
[576]
Leslie\bibnamedelima N. Smith, Nicholay Topin Paper
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
2018
[577]
Noah Snavely, Steven\bibnamedelima M. Seitz, Richard Szeliski
Modeling the World from {Internet
2008
[578]
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, Surya Ganguli Paper
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
2015
[579]
Pavithra Solai Paper
Convolutions and Backpropagation
2023
[580]
Jiaming Song, Chenlin Meng, Stefano Ermon Paper
Denoising Diffusion Implicit Models
2021
[581]
Yang Song, Sahaj Garg, Jascha Sohl-Dickstein, Diederik\bibnamedelima P. Kingma, Stefano Ermon
Consistency Models
2023
[582]
Yang Song, Stefano Ermon, Jonathan Ho, Tim Salimans, Mohammad Norouzi
From Signal to Noise: Understanding Diffusion Models
2022
[583]
Yang Song, Jascha Sohl-Dickstein, Diederik\bibnamedelima P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole Paper
Score-Based Generative Modeling through Stochastic Differential Equations
2021
[584]
Jost\bibnamedelima Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, Martin Riedmiller Paper
Striving for Simplicity: The All Convolutional Net
2015
[585]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov
Dropout: a simple way to prevent neural networks from overfitting
2014
[586]
Rupesh\bibnamedelima Kumar Srivastava, Klaus Greff, Jürgen Schmidhuber
Training Very Deep Networks
2015
[587]
{Stability AI
Stable Diffusion Image Variations (official release)
2022
[588]
{Stability AI
Stable Diffusion unCLIP
2022
[589]
Andreas Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob Uszkoreit, Lucas Beyer Paper
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
2022
[590]
Jane Street Paper
L2 Regularization and Batch Normalization: How They Interact
2020
[591]
Cheng Sun, Min Sun, Hwann-Tzong Chen Paper
Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction
2022
[592]
Cheng Sun, Min Sun, Hwann-Tzong Chen Paper
Improved Direct Voxel Grid Optimization for Radiance Fields Reconstruction
2022
[593]
Xingyuan Sun, Jiajun Wu, Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Tianfan Xue, Joshua\bibnamedelima B. Tenenbaum, William\bibnamedelima T. Freeman Paper
Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling
2018
[594]
Zhongyi Sun, Mengmeng Wang, Xinyu Gong, Yong Liu
NMS Strikes Back: Suppressing Overconfident Incorrect Queries in DETR
2023
[595]
Ilya Sutskever, Oriol Vinyals, Quoc\bibnamedelima V Le
Sequence to sequence learning with neural networks
2014
[596]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich
Going deeper with convolutions
2015
[597]
Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf
DeepFace: Closing the Gap to Human-Level Performance in Face Verification
2014
[598]
Hao Tan
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
2021
[599]
Mingxing Tan, Quoc Le, Marina Meila, Tong Zhang Paper
EfficientNetV2: Smaller Models and Faster Training
2021
[600]
Mingxing Tan, Quoc\bibnamedelima V. Le
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
2019
[601]
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc\bibnamedelima V. Le Paper
MnasNet: Platform-Aware Neural Architecture Search for Mobile
2019
[602]
Matthew Tancik, Vincent Casser, Xinchen Yan, Sabeek Pradhan, Ben Mildenhall, Pratul\bibnamedelima P. Srinivasan, Jonathan\bibnamedelima T. Barron, Henrik Kretzschmar Paper
Block-NeRF: Scalable Large Scene Neural View Synthesis
2022
[603]
Matthew Tancik, Pratul\bibnamedelima P. Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan\bibnamedelima T. Barron, Ren Ng Paper
Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
2020
[604]
Matthew Tancik, Pratul\bibnamedelima P. Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan\bibnamedelima T. Barron, Ren Ng
Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
2020
[605]
Matthew Tancik, Ben Mildenhall, Thomas Müller, Pratul\bibnamedelima P. Srinivasan Paper
Nerfacto: A Fast Hash-Grid NeRF Baseline
2023
[606]
Jiaxiang Tang, Tianxiang Shen, Yilun Chen, Jiajun Wu, Joshua\bibnamedelima B. Tenenbaum, Zhoutong Xu
DreamFields: Physically Plausible Text-to-3D Generation with Radiance Fields
2023
[607]
Jiaxiang Tang, Chuanxia Liu, Shihao Chen, Wentao Zhu, Yin Wang, Yujun Wang, Dahua Lin, Jingxiang Zhang, Ziwei Liu
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
2023
[608]
Ming Tao, Hao Tang, Fei Wu, Xiao-Yuan Jing, Bing-Kun Bao, Changsheng Xu Paper
DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis
2022
[609]
Maxim Tatarchenko, Alexey Dosovitskiy, Thomas Brox
Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs
2017
[610]
Maxim Tatarchenko, Alexey Dosovitskiy, Thomas Brox
Single-View to Multi-View: Reconstructing Unseen Views with a Convolutional Network
2015
[611]
Maxim Tatarchenko, Stephan\bibnamedelima R. Richter, René Ranftl, Zhuwen Li, Vladlen Koltun, Thomas Brox Paper
What Do Single-view 3D Reconstruction Networks Learn?
2019
[612]
DeepFloyd Team Paper
DeepFloyd IF: A Cascaded Diffusion Model for Text-to-Image Synthesis
2023
[613]
TensorFlow Team Paper
Higher Accuracy on Vision Models with EfficientNet-Lite
2020
[614]
Mikhail Telgarsky
Benefits of depth in neural networks
2016
[615]
Keyu Tian, Yi Jiang, Zehuan Yuan, Bingyue Peng, Liwei Wang Paper
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
2024
[616]
Yuandong Tian, Xinlei Chen, Surya Ganguli
Understanding self-supervised learning dynamics without contrastive pairs
2021
[617]
Zhi Tian, Chunhua Shen, Hao Chen, Tong He Paper
FCOS: Fully Convolutional One-Stage Object Detection
2019
[618]
Pavel Tokmakov, Jie Li, Adrien Gaidon
Breaking the ``Object'' in Video Object Segmentation
2022
[619]
Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy Paper
MLP-Mixer: An all-MLP Architecture for Vision
2021
[620]
Nenad Tomasev, Ioana Bica, Brian McWilliams, Lars Buesing, Razvan Pascanu, Charles Blundell, Jovana Mitrovic Paper
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?
2022
[621]
Alexander Tong, Kevin Lin, Hongyuan Zha, Michael Mahoney, Rose Yu Paper
TrajectoryNet: Learning Continuous Dynamics for Optimal Transport
2020
[622]
Zhan Tong, Yibing Song, Jue Wang, Limin Wang Paper
Video{MAE
2022
[623]
Alexander Toshev, Christian Szegedy
DeepPose: Human Pose Estimation via Deep Neural Networks
2014
[624]
H. Touvron, M. Cord, M. Douze, F. Massa, P. Huang
Training Data-efficient Image Transformers \& Distillation through Attention
2021
[625]
Hugo Touvron, Matthieu Cord, Hervé Jégou Paper
DeiT III: Revenge of the ViT
2022
[626]
Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou Paper
Fixing the train-test resolution discrepancy
2022
[627]
Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, Gabriel Synnaeve, Hervé Jégou Paper
Going deeper with Image Transformers
2021
[628]
Florian Tramèr, Nicholas Carlini, Wieland Brendel, Aleksander Madry Paper
On Adaptive Attacks to Adversarial Example Defenses
2020
[629]
Du Tran
Video Classification with Channel-Separated Convolutional Networks
2019
[630]
Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, Manohar Paluri Paper
A Closer Look at Spatiotemporal Convolutions for Action Recognition
2018
[631]
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri
Learning Spatiotemporal Features with 3D Convolutional Networks
2015
[632]
Alex Trevithick, Bo Yang Paper
GRF: Learning a General Radiance Field for 3D Representation and Rendering
2021
[633]
Prune Truong, Marie-Julie Rakotosaona, Fabian Manhardt, Federico Tombari Paper
SPARF: Neural Radiance Fields from Sparse and Noisy Poses
2023
[634]
Sh-Tsang Paper
Review: Group Normalization (GN) for Image Classification
2018
[635]
Sik-Ho Tsang
Review — BYOL: Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning
[636]
Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad\bibnamedelima Ferjad Naeem, Ibrahim Alabdulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, Olivier Hénaff, Jeremiah Harmsen, Andreas Steiner, Xiaohua Zhai Paper
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
2025
[637]
Vishaal Udandarao, Nikhil Parthasarathy, Muhammad\bibnamedelima Ferjad Naeem, Talfan Evans, Samuel Albanie, Federico Tombari, Yongqin Xian, Alessio Tonioni, Olivier\bibnamedelima J. Hénaff Paper
Active Data Curation Effectively Distills Large-Scale Multimodal Models
2025
[638]
J.\bibnamedelimi R.\bibnamedelimi R. Uijlings, K.\bibnamedelimi E.\bibnamedelimi A. Sande, T. Gevers, A.\bibnamedelimi W.\bibnamedelimi M. Smeulders Paper
Selective Search for Object Recognition
2013
[639]
{Ultralytics Paper
Data Augmentation using {Ultralytics YOLO
2025
[640]
Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky Paper
Instance Normalization: The Missing Ingredient for Fast Stylization
2017
[641]
Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, Ali Ghodsi
DyLoRA: Parameter-Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
2022
[642]
VantAI Paper
Flow Matching and Generative Modeling: Deep Dive and Code Walkthrough
2022
[643]
VantAI
Training Flows with the Continuity Equation
2023
[644]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan\bibnamedelima N Gomez, Łukasz Kaiser, Illia Polosukhin
Attention is all you need
2017
[645]
Andreas Veit, Michael Wilber, Serge Belongie Paper
Residual Networks Behave Like Ensembles of Relatively Shallow Networks
2016
[646]
C. Villani Paper
Optimal Transport: Old and New
2008
[647]
Pascal Vincent
A connection between score matching and denoising autoencoders
2011
[648]
Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan
Show and tell: A neural image caption generator
2015
[649]
Paul Viola, Michael Jones
Rapid object detection using a boosted cascade of simple features
2001
[650]
Elena Voita, Jean Talbot, Fedor Moiseev, Rico Sennrich, Ivan Titov
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
2019
[651]
Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, Serge Belongie
The Caltech-UCSD Birds-200-2011 Dataset
2011
[652]
Bo Wan, Michael Tschannen, Yongqin Xian, Filip Pavetic, Ibrahim Alabdulmohsin, Xiao Wang, André\bibnamedelima Susano Pinto, Andreas Steiner, Lucas Beyer, Xiaohua Zhai Paper
LocCa: Visual Pretraining with Location-aware Captioners
2024
[653]
Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le\bibnamedelima Cun, Rob Fergus, Sanjoy Dasgupta, David McAllester Paper
Regularization of Neural Networks using DropConnect
2013
[654]
Boyang Wang, Peter Wonka, Hancheng Ge, Ravi Garg, Vishal\bibnamedelima M. Patel Paper
IBRNet: Learning Multi-View Image-Based Rendering
2021
[655]
Feng Wang, Weiyang Liu, Haijun Liu, Jian Cheng Paper
Additive Margin Softmax for Face Verification
2018
[656]
Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, Wei Liu
CosFace: Large Margin Cosine Loss for Deep Face Recognition
2018
[657]
Heng Wang, Du Tran, Lorenzo Torresani, Matt Feiszli Paper
Video Modeling with Correlation Networks
2019
[658]
Jianfeng Wang
All-in-One: Exploring Unified Video-Language Pre-Training
2022
[659]
Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin
GIT: A Generative Image-to-Text Transformer for Vision and Language
2022
[660]
Jiaqi Wang, Pan Zhang, Tao Chu, Yuhang Cao, Yujie Zhou, Tong Wu, Bin Wang, Conghui He, Dahua Lin
V3Det: Vast Vocabulary Visual Detection Dataset
2023
[661]
Jinghao Wang
BEVT: BERT Pretraining of Video Transformers
2022
[662]
Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Luowei Zhou, Yucheng Zhao, Yujia Xie, Ce Liu, Yu-Gang Jiang, Lu Yuan Paper
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
2022
[663]
Limin Wang
TDN: Temporal Difference Networks for Efficient Video Understanding
2021
[664]
Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, Yu Qiao Paper
Video{MAE
2023
[665]
Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, Yu-Gang Jiang Paper
Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images
2018
[666]
Peng Wang, Lingzhe Zhao, Ruijie Ma, Peidong Liu Paper
BAD-NeRF: Bundle Adjusted Deblur Neural Radiance Fields
2023
[667]
Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, Wenping Wang
Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction
2021
[668]
Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou Paper
Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution
2024
[669]
Rui Wang, Dongdong Chen, Zuxuan Wu, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Lu Yuan, Yu{- Jiang Paper
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-Supervised Video Representation Learning
2023
[670]
Weiyao Wang, Matt Feiszli, Heng Wang, Du Tran
Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation
2021
[671]
Wenhao Wang, Zihang Lai, Peng Gao, Shuzhe Wang, Ke Li, Hongsheng Li, Yu Qiao Paper
Intern{V
2022
[672]
Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan\bibnamedelima Mohammed, Saksham Singhal, Subhojit Som, Furu Wei Paper
Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks
2023
[673]
Xiao Wang, Guo‑Jun Qi
Contrastive Learning with Stronger Augmentations
2021
[674]
Xiaolong Wang, Abhinav Gupta Paper
Videos as Space-Time Region Graphs
2018
[675]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He Paper
Non-local Neural Networks
2018
[676]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He
Non-local Neural Networks
2018
[677]
Yi Wang, Yinan He, Yizhuo Li, Kunchang Li, Jiashuo Yu, Xin Ma, Xinhao Li, Guo Chen, Xinyuan Chen, Yaohui Wang, Conghui He, Ping Luo, Ziwei Liu, Yali Wang, Limin Wang, Yu Qiao Paper
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
2024
[678]
Yi Wang, Kunchang Li, Xinhao Li, Jiashuo Yu, Yinan He, Guo Chen, Baoqi Pei, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, Songze Li, Hongjie Zhang, Yifei Huang, Yu Qiao, Yali Wang, Limin Wang Paper
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
2024
[679]
Yi Wang, Xinhao Li, Ziang Yan, Yinan He, Jiashuo Yu, Xiangyu Zeng, Chenting Wang, Changlian Ma, Haian Huang, Jianfei Gao, Min Dou, Kai Chen, Wenhai Wang, Yu Qiao, Yali Wang, Limin Wang Paper
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
2025
[680]
Yifan Wang, Wenqi Sun, Sai Bi, Zexiang Xu, Hao Su, Ravi Ramamoorthi, Zhan Xu
GS-IR: 3D Gaussian Splatting for Inverse Rendering
2024
[681]
Yiming Wang
NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction
2023
[682]
Yiqun Wang
HF-NeuS: Improved Surface Reconstruction Using High-Frequency Details Robust to Noise
2022
[683]
Yiqun Wang, Ivan Skorokhodov, Peter Wonka
PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces
2023
[684]
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay\bibnamedelima E. Sarma, Michael\bibnamedelima M. Bronstein, Justin\bibnamedelima M. Solomon
Dynamic Graph CNN for Learning on Point Clouds
2019
[685]
Zhizhong Wang, Lei Zhao, Wei Xing Paper
StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models
2023
[686]
Zirui Wang, Jiahui Yu, Adams\bibnamedelima Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao Paper
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
2021
[687]
Chen Wei, Haoqi Fan, Saining Xie, Caiming Wu
Masked Feature Prediction for Self-Supervised Visual Pre-Training
2022
[688]
Li‑Yi Wei, Marc Levoy
Fast Texture Synthesis using Tree‑structured Vector Quantization
2000
[689]
Chao Wen, Yinda Zhang, Zhuwen Li, Yanwei Fu Paper
Pixel2Mesh++: Multi-View 3D Mesh Generation via Deformation
2019
[690]
Yuetian Weng, Mingfei Han, Haoyu He, Xiaojun Chang, Bohan Zhuang
LongVLM: Efficient Long Video Understanding via Large Language Models
2024
[691]
{Wikipedia contributors Paper
Aliasing --- {W
2004
[692]
{Wikipedia contributors Paper
Sine and cosine --- Wikipedia{,
2024
[693]
Ronald\bibnamedelima J Williams
Simple statistical gradient-following algorithms for connectionist reinforcement learning
1992
[694]
Ronald\bibnamedelima J. Williams, Jing Peng
An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories
1990
[695]
Samuel Williams, Andrew Waterman, David Patterson Paper
Roofline: an insightful visual performance model for multicore architectures
2009
[696]
Sarah Wolf
ProGAN – How NVIDIA Generated Images of Unprecedented Quality
2019
[697]
Sanghyun Woo, Jongchan Park, Joon-Young Lee, In\bibnamedelima So Kweon
CBAM: Convolutional Block Attention Module
2018
[698]
Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In\bibnamedelima So Kweon, Saining Xie Paper
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
2023
[699]
Mitchell Wortsman, Gabriel Ilharco, Jong\bibnamedelima Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gontijo-Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, Ludwig Schmidt Paper
Robust fine-tuning of zero-shot models
2022
[700]
Chao{- Wu, Ross Girshick, Kaiming He, Christoph Feichtenhofer Paper
MeMViT: Memory-Augmented Multiscale Vision Transformers for Efficient Long-Term Video Recognition
2022
[701]
Chao-Yuan Wu
Multi-Scale Feature Aggregation for Spatio-Temporal Action Detection
2020
[702]
Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, Hengshuang Zhao
Point Transformer V2: Grouped Vector Attention and Partition-based Pooling
2022
[703]
Zhen Wu, Sida Peng, Yunzhi Lin, Yikai Liu, Haotian Lin, Yiyi Liao, Xiaowei Zhou
4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
2023
[704]
Zheng Wu
AIA++: Advanced Interaction Aggregation for Spatio-Temporal Action Detection
2022
[705]
Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao Paper
3D ShapeNets: A Deep Representation for Volumetric Shapes
2015
[706]
et\bibnamedelima al. Xiaoxiao\bibnamedelima Guo
Deep Reinforcement Learning for Playing Atari Games
2014
[707]
Junyuan Xie, Ross Girshick, Ali Farhadi Paper
Unsupervised Deep Embedding for Clustering Analysis
2019
[708]
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He Paper
Aggregated Residual Transformations for Deep Neural Networks
2017
[709]
Huijuan Xu, Kate Saenko
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
2016
[710]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard\bibnamedelima S. Zemel, Yoshua Bengio
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
2015
[711]
Mengmeng Xu
Boundary-Sensitive Pre-Training for Temporal Localization in Videos
2021
[712]
Mingze Xu
G-TAD: Sub-Graph Localization for Temporal Action Detection
2020
[713]
Ning Xu, Linjie Yang, Yuchen Fan, Jianchao Yang, Dingcheng Yue, Yuchen Liang, Brian Price, Scott Cohen, Thomas\bibnamedelima S. Huang
YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark
2018
[714]
Qiangeng Xu, Zexiang Xu, Julien Philip, Sai Bi, Zhixin Shu, Kalyan Sunkavalli, Ulrich Neumann Paper
Point-NeRF: Point-based Neural Radiance Fields
2022
[715]
Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks
2018
[716]
Xingqian Xu, Jiayi Guo, Zhangyang Wang, Gao Huang, Irfan Essa, Humphrey Shi Paper
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models
2023
[717]
Xingqian Xu, Zhangyang Wang, Eric Zhang, Kai Wang, Humphrey Shi Paper
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model
2024
[718]
Youssef Xu, Alain Durmus, Youssef Mroueh Paper
Sliced Wasserstein Generative Models
2018
[719]
Zhiliang Xu, Hang Zhou, Xintao Wang, Lianli Gao, Jingkuan Song, Ke Li Paper
LanguageBind: Extending Video-Language Pretraining to Multiple Modalities
2023
[720]
Seung-Hwan Yan, Saining Xie, Kaiming He
Multiview Transformers for Video Recognition
2022
[721]
Shen Yan, Tao Zhu, Zirui Wang, Yuan Cao, Mi Zhang, Soham Ghosh, Yonghui Wu, Jiahui Yu Paper
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
2023
[722]
Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid Paper
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
2022
[723]
Cheng-Yen Yang, Hsiang-Wei Huang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang Paper
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
2024
[724]
Ling Yang, Zixiang Zhang, Zhilong Zhang, Xingchao Liu, Minkai Xu, Wentao Zhang, Chenlin Meng, Stefano Ermon, Bin Cui Paper
Consistency Flow Matching: Defining Straight Flows with Velocity Consistency
2024
[725]
Linjie Yang, Yuchen Fan, Ning Xu
Video Instance Segmentation
2019
[726]
Lewei Yao, Jianhua Han, Youpeng Wen, Xiaodan Liang, Dan Xu, Wei Zhang, Zhenguo Li, Chunjing Xu, Hang Xu Paper
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
2022
[727]
Lewei Yao, Jianhua Han, Xiaodan Liang, Dan Xu, Wei Zhang, Zhenguo Li, Hang Xu Paper
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment
2023
[728]
Lewei Yao, Runhui Huang, Lu Hou, Guansong Lu, Minzhe Niu, Hang Xu, Xiaodan Liang, Zhenguo Li, Xin Jiang, Chunjing Xu Paper
FILIP: Fine-grained Interactive Language-Image Pre-Training
2022
[729]
Zhiyuan Yao, Dengxin Zhao, Chang Xu
Improving parallel decoding for text generation with speculation
2022
[730]
Zhuyu Yao, Jiangbo Ai, Boxun Li, Chi Zhang Paper
Efficient DETR: Improving End-to-End Object Detector with Dense Prior
2021
[731]
Lior Yariv, Yoni Kasten, Dror Moran, Meirav Galun, Matan Atzmon, Ronen Basri, Yaron Lipman Paper
Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance
2020
[732]
Lior Yariv, Jiatao Gu, Yoni Kasten, Yaron Lipman Paper
Volume Rendering of Neural Implicit Surfaces
2021
[733]
Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, Wei Yang Paper
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
2023
[734]
Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou Paper
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
2024
[735]
Okan Yenigün Paper
Overfitting vs. Underfitting
[736]
Xin Yi, Ekta Walia, Paul Babyn
Generative Adversarial Network in Medical Imaging: A Review
2019
[737]
Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, Hod Lipson Paper
Understanding Neural Networks Through Deep Visualization
2015
[738]
Yang You, Igor Gitman, Boris Ginsburg Paper
Large batch training of convolutional networks
2017
[739]
Alex Yu, Vickie Ye, Matthew Tancik, Angjoo Kanazawa Paper
pixelNeRF: Neural Radiance Fields from One or Few Images
2020
[740]
Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, Angjoo Kanazawa Paper
PlenOctrees for Real-Time Rendering of Neural Radiance Fields
2021
[741]
Jiahui Yu
CoCa: Contrastive Captioners are Image-Text Foundation Models
2022
[742]
Jiahui Yu, Yuanzhong Xu, Jing\bibnamedelima Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu\bibnamedelima Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu Paper
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
2022
[743]
Jiahui Yu, Xin Li, Jing\bibnamedelima Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu Paper
Vector-quantized Image Modeling with Improved VQGAN
2022
[744]
Meng Yu
GSDF: 3DGS Meets SDF for Improved Neural Rendering
2024
[745]
Yuan, al. Paper
Florence: A New Foundation Model for Computer Vision
2021
[746]
Zijian Yue
V4D: 4D Convolutional Neural Networks for Video
2020
[747]
Sangdoo Yun, Dongyoon Han, Seong\bibnamedelima Joon Oh, Sanghyuk Chun, Junsuk Choe, Youngjoon Yoo
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
2019
[748]
Sergey Zagoruyko, Nikos Komodakis
DiracNets: Training Very Deep Neural Networks Without Skip-Connections
2017
[749]
Kevin Zakka Paper
Understanding Batch Normalization Backpropagation
2016
[750]
Yuchen Zang, Mingyu Ding, Nanxuan Wang, Song-Hai Zhang, Yali Liu, Yu Wang, Yu Qiao
Open-{V
2022
[751]
Jure Zbontar, Li Jing, Ishan Misra
Barlow Twins: Self-Supervised Learning via Redundancy Reduction
2021
[752]
Matthew\bibnamedelima D. Zeiler, Rob Fergus
Visualizing and Understanding Convolutional Networks
2014
[753]
Rowan Zellers
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
2022
[754]
Rowan Zellers
MERLOT: Multimodal Neural Script Knowledge Models
2021
[755]
Runhao Zeng
DCAN: Dual Context Aggregation Network for Temporal Action Detection
2021
[756]
Xiang Zhai, Aravind Srinivas, Stella\bibnamedelima X. Yu
Large-Scale Evaluation of {Self-Supervised
2019
[757]
Xiaohua Zhai, Alexander Wu, Yufei Yu, Hu Hu, Tsung-Yi Lin, Saining Xie, Alexander Kolesnikov, Lucas Beyer
LiT: Zero-Shot Transfer with Locked-Image Text Tuning
2022
[758]
Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer Paper
Sigmoid Loss for Language Image Pre-Training
2023
[759]
Boqiang Zhang, Kehan Li, Zesen Cheng, Zhiqiang Hu, Yuqian Yuan, Guanzheng Chen, Sicong Leng, Yuming Jiang, Hang Zhang, Xin Li, Peng Jin, Wenqi Zhang, Fan Wang, Lidong Bing, Deli Zhao Paper
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
2025
[760]
Chen-Lin Zhang
ActionFormer: Localizing Moments of Actions with Transformers
2022
[761]
Chen-Lin Zhang
TALLFormer: Temporal Action Localization with Long-range Transformer
2022
[762]
et\bibnamedelima al. Zhang
AdaLoRA: Adaptive Low-Rank Adaptation via Gradient-Based Rank Selection
2022
[763]
Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena
Self-Attention Generative Adversarial Networks
2019
[764]
Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena, Kamalika Chaudhuri, Ruslan Salakhutdinov Paper
Self-Attention Generative Adversarial Networks
2019
[765]
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Dimitris Metaxas
StackGAN: Text to Photo‑Realistic Image Synthesis with Stacked Generative Adversarial Networks
2017
[766]
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
2018
[767]
Hang Zhang, Xin Li, Lidong Bing Paper
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
2023
[768]
Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, Alexander Smola Paper
ResNeSt: Split-Attention Networks
2020
[769]
Haotian Zhang, Pengchuan Zhang, Xiaowei Hu, Yen-Chun Chen, Liunian\bibnamedelima Harold Li, Xiyang Dai, Lijuan Wang, Lu Yuan, Jenq-Neng Hwang, Jianfeng Gao Paper
GLIPv2: Unifying Localization and Vision-Language Understanding
2022
[770]
Hongyi Zhang, Yann\bibnamedelima N. Dauphin, Tengyu Ma Paper
Fixup Initialization: Residual Learning Without Normalization
2019
[771]
Hongyi Zhang, Moustapha Cisse, Yann\bibnamedelima N. Dauphin, David Lopez-Paz Paper
mixup: Beyond Empirical Risk Minimization
2018
[772]
Kai Zhang, Gernot Riegler, Noah Snavely, Vladlen Koltun Paper
NeRF++: Analyzing and Improving Neural Radiance Fields
2020
[773]
Lvmin Zhang, Maneesh Agrawala
Adding conditional control to text-to-image diffusion models
2023
[774]
Zhang, Li, al. Paper
VinVL: Making Visual Representations Matter in Vision-Language Models
2021
[775]
Renrui Zhang, Jiaming Han, Aojun Zhou, Xiangfei Hu
LLaMA-Adapter: Efficient Fine-Tuning of Language Models with Zero-Initiated Attention
2023
[776]
Richard Zhang Paper
Making Convolutional Networks Shift-Invariant Again
2019
[777]
Richard Zhang, Phillip Isola, Alexei\bibnamedelima A. Efros
Colorful Image Colorization
2016
[778]
Richard Zhang, Phillip Isola, Alexei\bibnamedelima A. Efros, Eli Shechtman, Oliver Wang Paper
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
2018
[779]
Shuhan Zhang, Brian Curless, Steven\bibnamedelima M. Seitz, Qi Huang
Revisiting Photometric Consistency in Multi‐View Stereo
2015
[780]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
2018
[781]
Yiming Zhang, Yifan Wang, Wenqi Sun, Zexiang Xu, Zhan Xu, Hao Su
SuGaR: Surface-Aligned Gaussian Reconstruction for High-Fidelity Meshes
2024
[782]
Youcai Zhang, Xinyu Huang, Jinyu Ma, Zhaoyang Li, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, Yandong Guo, Lei Zhang Paper
Recognize Anything: A Strong Image Tagging Model
2023
[783]
Yuxin Zhang, Nisha Huang, Fan Tang, Haibin Huang, Chongyang Ma, Weiming Dong, Changsheng Xu Paper
Inversion-Based Style Transfer with Diffusion Models
2023
[784]
Zhengxu Zhang, Qingjie Liu, Yunhong Wang
Road Extraction by Deep Residual U-Net
2018
[785]
Zhengyou Zhang, Olivier Faugeras
3‐D Shape and Motion Recovery Under Varying Illumination
2001
[786]
Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, Alex Smola Paper
Multimodal Chain-of-Thought Reasoning in Language Models
2024
[787]
Chen Zhao
TubeR: Tubelet Transformer for Action Detection
2022
[788]
Hang Zhao, Zhicheng Yan, Lorenzo Torresani
{HACS
2019
[789]
Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip\bibnamedelima H.\bibnamedelimi S. Torr, Vladlen Koltun
Point Transformer
2021
[790]
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia Paper
Pyramid Scene Parsing Network
2017
[791]
Long Zhao, Nitesh\bibnamedelima B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer\bibnamedelima J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang, David\bibnamedelima A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu, Boqing Gong Paper
VideoPrism: A Foundational Visual Encoder for Video Understanding
2025
[792]
Shihao Zhao, Dongdong Chen, Yen-Chun Chen, Jianmin Bao, Shaozhe Hao, Lu Yuan, Kwan-Yee\bibnamedelima K. Wong Paper
Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
2023
[793]
Tiancheng Zhao, Peng Liu, Xuan He, Lu Zhang, Kyusong Lee Paper
Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head
2024
[794]
Yue Zhao, Ishan Misra, Philipp Krähenbühl, Rohit Girdhar Paper
Learning Video Representations from Large Language Models
2022
[795]
Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, Chunyuan Li, Noel Codella, Liunian\bibnamedelima Harold Li, Luowei Zhou, Xiyang Dai, Lu Yuan, Yin Li, Jianfeng Gao Paper
RegionCLIP: Region-based Language-Image Pretraining
2021
[796]
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba
Learning Deep Features for Discriminative Localization
2016
[797]
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, Antonio Torralba
Places: A 10 Million Image Database for Scene Recognition
2017
[798]
Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Xuezhe Ma, Luke Zettlemoyer, Omer Levy Paper
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
2024
[799]
Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong Paper
iBOT: Image BERT Pre-Training with Online Tokenizer
2022
[800]
Kaiyang Zhou, Jingkang Yang, Chen\bibnamedelima Change Loy, Ziwei Liu
Conditional prompt learning for vision-language models
2022
[801]
Luowei Zhou, Hamid Palangi, Lei Zhang, Houdong Hu, Jason\bibnamedelima J. Corso, Jianfeng Gao Paper
Unified Vision-Language Pre-Training for Image Captioning and VQA
2019
[802]
Zongwei Zhou, Md\bibnamedelima Mahfuzur\bibnamedelima Rahman Siddiquee, Nima Tajbakhsh, Jianming Liang
Unet++: A nested u-net architecture for medical image segmentation
2018
[803]
Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny Paper
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
2023
[804]
Feng Zhu, Feng Li, Hao Zhang, Hang Su, Jun Zhu, Lei Zhang
RE-DETR: Accelerating Deformable DETR with Reference Points
2023
[805]
Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei\bibnamedelima A. Efros
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
2017
[806]
Minfeng Zhu, Pingbo Pan, Wei Chen, Yi Yang
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis
2019
[807]
Wanhua Zhu
Enriching Local and Global Contexts for Temporal Action Detection
2021
[808]
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai Paper
Deformable DETR: Deformable Transformers for End-to-End Object Detection
2021
[809]
Yihang Zhu, Mingyang Zhu, Yuxin Chen, Yueming Ma, Jue Wang, Yuanzhi Li, Chi Zhang, Xiaoguang Li, Lu Yuan Paper
Transfusion: Cross-modal Diffusion Models for Reference-based Image Editing and Generation
2023
[810]
Yuchen Zhu, Rodrigo Loza, Sichen Zhu, Chunting Zhou, Luyu Yu, Arka Sadhu, Karan Goel, Xuezhi Wang, Soroush Vosoughi, William Fedus, Zihang Dai, Luke Zettlemoyer, Orhan Firat Paper
Chameleon: Mixed-Modal Early-Fusion Foundation Models
2024
[811]
C\bibnamedelima Lawrence Zitnick, Piotr Dollár
Edge boxes: Locating object proposals from edges
2014
[812]
Barret Zoph, Quoc\bibnamedelima V. Le
Neural Architecture Search with Reinforcement Learning
2017
[813]
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc\bibnamedelima V. Le
Learning Transferable Architectures for Scalable Image Recognition
2018
[814]
Xueyan Zou, Zi-Yi Dou, Jianwei Yang, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, Jianfeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong\bibnamedelima Jae Lee, Jianfeng Gao Paper
Generalized Decoding for Pixel, Image, and Language
2023