I am an MS (By Research) student in the department of Computer Science and Engineering at Indian Institute of Technology, Kanpur (IIT Kanpur). I have recently joined as a Senior Engineer in the Visual Intelligence group at Samsung R&D Institute - Bangalore, India. For more details, please take a look at my CV.

My research interest lies in Machine Learning and Computer Vision, I am particularly interested to work on multi-modal learning, representation learning, and creative image/video manipulation using generative AI. For my master's thesis I am working with Prof. Gaurav Sharma on audio-guided face expression editing by leveraging the latent space of powerful generative models.

Education

  • MS (By Research) | 2021-Present
    • Computer Science and Engineering (CSE), Indian Institute of Technology, Kanpur
  • Bachelor of Technology (BTech) | 2016-2020
    • Computer Science and Engineering (CSE), Pranveer Singh Institute of Technology (PSIT), Kanpur

Experience

  • Samsung R&D Institute Bangalore, India | July 2024 – Present
    Senior Engineer (Research)
    • Part of the Visual Intelligence Team tasked with the development of a proof-of-concept (PoC) for an A-grade patent on ‘Image Panning Generation with Mobile Camera’
    • Developing an automated image panning generator for pre-recorded videos, ensuring that the salient object remains in focus while dynamically applying motion blur to the background
    • Implemented object tracking and frame fusion techniques to accurately select panning targets and seamlessly blend frames for creating the panning effect
  • Audio-Guided Image Manipulation | Jan 2022 – Present
    MS Thesis, IIT Kanpur
    • Advisor: Prof. Gaurav Sharma
    • Developing an audio-visual stylization framework that modifies image styles based on audio semantics
    • Designed a hierarchical VQVAE model to invert input images into StyleGAN’s latent space, then perturbed the latent codes using a unified audio-visual feature space to generate stylized results
    • Implemented StyleGAN2 inversion, audio-visual feature alignment, and latent code editing
    • Built a pipeline to extract training data for audio-visual feature learning utilizing large-scale audio-visual datasets
  • TensorTour (acquired by Typeface.ai) | May 2022 – July 2022
    Research Intern | Remote
    • Managed image metadata using SQLite, performed CRUD operations, and built a Flask API for content-based image retrieval, providing top-k similar images from user queries
    • Explored and conducted a comparative study of available neural image compression models
  • Indian Statistical Institute, Kolkata | Jan 2019 – Sep 2019
    Research Intern | Internship Letter
    • Advisor: Prof. Nikhil R. Pal
    • Conducted experiments leveraging self-organizing maps (SOMs) for 2D lattice projection with Sammon’s structure-preserving loss, selecting significant features while preserving lattice visualization
    • Experimented on t-distributed stochastic neighbor embedding (t-SNE) and autoencoder-based latent representation methods to enhance visualization for datasets with complex manifolds