Pritam Sarkar (প্রীতম সরকার)
Postdoc at UBC and Vector Institute
Email: pritam.sarkar@queensu.ca

Google Scholar GitHub LinkedIn

I am interested in advancing safe multimodal intelligence and in designing algorithms that require minimal human supervision. Currently, I am a postdoc at the University of British Columbia and Vector Institute, working with Leonid Sigal as part of the Computer Vision Group in the CS department. I completed my PhD in 2025 at Queen’s University, where I worked with Ali Etemad. During my PhD, I interned at Google and RBC Borealis, and was affiliated with the Vector Institute, Ingenuity Labs, and Aiim Lab. I received the First Prize in the IEEE Research Excellence Award, in 2023. I frequently review for the following venues: NeurIPS, ICLR, ICML, CVPR, ICCV, ECCV, PAMI. Outside of research, I am passionate about photography and film-making, enjoy coffee, long walks, and playing badminton.

Open to collaboration
If you are interested in collaborating on research ideas of mutual interest, please feel free to email me.

Pro-bono activity for students (click to read more)

    I am dedicating a weekly time slot (45 minutes) to speak with graduate students who might lack access to a strong mentorship or peer support. Whether you want to brainstorm, refine your research direction, or get an external perspective on your work, I am happy to help.

    Why am I doing this? I have been incredibly fortunate to have supportive mentors and peers throughout my career, but I also know the difficulty of working without it. This is my way of paying forward the support I have received, aiming to reach students outside my immediate network who might benefit from such support.

    To express your interest, please send a brief message through this form, and I will get back to you. Before reaching out, please review my research interests to see if they (at least somewhat) align with yours.

    To make the best use of our time, let us plan the discussion in the following format:

    • 5 minutes: Quick introductions.
    • 10 minutes: You describe your research problem, current progress, and specific challenges. Ideally some slides with specific points.
    • 25 minutes: Brainstorming ideas and some actionable feedback.
    • 5 minutes: Wrap up and any final questions/suggestions for me.

    Please be assured that I will periodically review the submitted responses and reach out to selected candidates. However, due to limited time, I may not be able to respond to everyone. Based on my availability, I may schedule sessions during the weekends. Thank you for your understanding.

News


  • [Dec 25] I joined UBC and Vector as a postdoc.
  • [Oct 25] I am serving as an AC for WACV 2026.
  • [Sep 25] Our proposed Self-alignment with RRPO got accepted in NeurIPS 2025.
  • [Sep 25] I successfully defended my PhD thesis! Here are the slides and Thesis.
Click to see old news
  • [May 25] Introduced VCRBench, the first video-based multi-step causal reasoning benchmark.
  • [Apr 25] Introduced RRPO, a fine-grained self-alignment recipe to align Multimodal LLMs.
  • [Jan 25] DPA got accepted in ICLR 2025.
  • [Dec 23] XKD and RDDM got accepted in AAAI 2024.
  • [Nov 23] I won first prize in IEEE Research Excellence Award (PhD).
  • [Sep 23] Our paper on Video SSL in OOD got accepted in NeurIPS 2023 as a Spotlight.
  • [Aug 23] Accepted an offer from Google to join as a Student Researcher.
  • [Nov 22] AVCAffe and CrissCross (Oral) got accepted in AAAI 2023.
  • [Oct 22] We organized AAAI 2023 Workshop on R2HCAI.
  • [Oct 22] Honourable Mention in poster competitions at Robotics and AI Symposium 2022 and FEAS Research Symposium 2022 at Queen's University, Canada.
  • [Jun 22] Accepted an offer from Borealis AI for a fall internship as a Machine Learning Research Intern.
  • [Oct 21] Best poster award at Robotics and AI Symposium, Ingenuity Labs, 2021.
  • [Aug 21] We organized AAAI 2022 Workshop on HC-SSL.
  • [Mar 21] I received postgraduate affiliation award from Vector Institute. News
  • [Dec 20] Our paper CardioGAN got accepted in AAAI 2021.
  • [Aug 20] My first transaction as a first author got accepted in IEEE Transactions on Affective Computing.
  • [Apr 20] Successfully defended my MASc thesis.
  • [Jan 20] Conference paper on ECG-based SSL got accepted in IEEE ICASSP 2020 for oral presentation.
  • [Jun 19] My first paper got accepted for oral presentation in IEEE ACII 2019.
  • [Sep 18] Joined Queen's for a master's degree.

Research


My current research focus is multimodal AI with video, image, audio, and language.

Broadly interested in: generative models (LLMs, multimodal LLMs, diffusion models), foundation models (video, image, vision-language, audio-visual), self-supervised learning, alignment, reasoning, AI agents, computer vision.

VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models
P. Sarkar, A. Etemad
Preprint. Under review.
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
P. Sarkar, A. Etemad
Neural Information Processing Systems, (NeurIPS), 2025.
Mitigating Object Hallucination in MLLMs via Data-augmented Phrase-level Alignment
P. Sarkar, S. Ebrahimi, A. Etemad, A. Beirami, S. Ö. Arık, T. Pfister
International Conference on Learning Representations, (ICLR), 2025.
Uncovering the Hidden Dynamics of Video Self-supervised Learning under Distribution Shifts
P. Sarkar, A. Beirami, A. Etemad
Neural Information Processing Systems, (NeurIPS), 2023. Spotlight
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
P. Sarkar, A. Etemad
AAAI Conference on Artificial Intelligence, (AAAI), 2024.
Self-supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
P. Sarkar, A. Etemad
AAAI Conference on Artificial Intelligence, (AAAI), 2023. Oral
AVCAffe: A Large Scale Audio-Visual Dataset of Cognitive Load and Affect for Remote Work
P. Sarkar, A. Posen, A. Etemad
AAAI Conference on Artificial Intelligence, (AAAI), 2023.
Region-Disentangled Diffusion Model for High-Fidelity PPG-to-ECG Translation
D. Shome, P. Sarkar, A. Etemad
AAAI Conference on Artificial Intelligence, (AAAI), 2024.
CardioGAN: Attentive Generative Adversarial Network with Dual Discriminators for Synthesis of ECG from PPG
P. Sarkar, A. Etemad
AAAI Conference on Artificial Intelligence, (AAAI), 2021.
Happy Driver: Investigating the Effect of Mood on Preferred Style of Driving in Self-Driving Cars
R. Phinnemore, G. Cimolino, P. Sarkar, A. Etemad, T.C. Nicholas Graham
International Conference on Human-Agent Interaction, (HAI), 2021.
Detection of Maternal and Fetal Stress from ECG with Self-supervised Representation Learning
P. Sarkar*, S. Lobmaier*, B. Fabre, G. Berg, A. Mueller, M. G. Frasch, M. C. Antonelli, A. Etemad
Nature Scientific Reports, 2021.
Self-supervised ECG Representation Learning for Emotion Recognition
P. Sarkar, A. Etemad
IEEE Transactions on Affective Computing, (TAFFC), 2020.
Self-supervised Learning for ECG-based Emotion Recognition
P. Sarkar, A. Etemad
IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP), 2020. Oral
The Future of Simulation-based Medical Education: Adaptive Simulation Utilizing a Deep Multitask Neural Network
A. Ruberto, D. Rodenburg, K. Ross, P. Sarkar, P. Hungler, A. Etemad, D. Howes, D. Clarke, J. McLellan, D. Wilson, A. Szulewski
AEM Education and Training (AEMET), 2021.
Classification of Cognitive Load and Expertise for Adaptive Simulation using Deep Multitask Learning
P. Sarkar, K. Ross, A. Ruberto, D. Rodenburg, P. Hungler, A. Etemad
IEEE Affective Computing and Intelligent Interaction, (ACII), 2019. Oral
Toward Dynamically Adaptive Simulation: Multimodal Classification of User Expertise using Wearable Devices
K. Ross, P. Sarkar, D. Rodenburg, A. Ruberto, P. Hungler, D. Howes, A. Szulewski, A. Etemad
Sensors, 2019
Computer-Aided Diagnosis using Class-Weighted Deep Neural Networks
P. Sarkar, V. Davoodnia, A. Etemad
IEEE International Conference on Machine Learning and Applications, (ICMLA), 2019.