Shivam Tripathi

I am an MS (By Research) student in the department of Computer Science and Engineering at Indian Institute of Technology, Kanpur (IIT Kanpur). I have recently joined as a Senior Engineer in the Visual Intelligence group at Samsung R&D Institute - Bangalore, India. For more details, please take a look at my CV.

My research interest lies in Machine Learning and Computer Vision, I am particularly interested to work on multi-modal learning, representation learning, and creative image/video manipulation using generative AI. For my master's thesis I am working with Prof. Gaurav Sharma on audio-guided face expression editing by leveraging the latent space of powerful generative models.

Education

MS (By Research) | 2021-Present
- Computer Science and Engineering (CSE), Indian Institute of Technology, Kanpur
Bachelor of Technology (BTech) | 2016-2020
- Computer Science and Engineering (CSE), Pranveer Singh Institute of Technology (PSIT), Kanpur

Experience

Samsung R&D Institute Bangalore, India | July 2024 – Present
Senior Engineer (Research)
- Part of the Visual Intelligence Team tasked with the development of a proof-of-concept (PoC) for an A-grade patent on ‘Image Panning Generation with Mobile Camera’
- Developing an automated image panning generator for pre-recorded videos, ensuring that the salient object remains in focus while dynamically applying motion blur to the background
- Implemented object tracking and frame fusion techniques to accurately select panning targets and seamlessly blend frames for creating the panning effect
Audio-Guided Image Manipulation | Jan 2022 – Present
MS Thesis, IIT Kanpur
- Advisor: Prof. Gaurav Sharma
- Developing an audio-visual stylization framework that modifies image styles based on audio semantics
- Designed a hierarchical VQVAE model to invert input images into StyleGAN’s latent space, then perturbed the latent codes using a unified audio-visual feature space to generate stylized results
- Implemented StyleGAN2 inversion, audio-visual feature alignment, and latent code editing
- Built a pipeline to extract training data for audio-visual feature learning utilizing large-scale audio-visual datasets
TensorTour (acquired by Typeface.ai) | May 2022 – July 2022
Research Intern | Remote
- Managed image metadata using SQLite, performed CRUD operations, and built a Flask API for content-based image retrieval, providing top-k similar images from user queries
- Explored and conducted a comparative study of available neural image compression models
Indian Statistical Institute, Kolkata | Jan 2019 – Sep 2019
Research Intern | Internship Letter
- Advisor: Prof. Nikhil R. Pal
- Conducted experiments leveraging self-organizing maps (SOMs) for 2D lattice projection with Sammon’s structure-preserving loss, selecting significant features while preserving lattice visualization
- Experimented on t-distributed stochastic neighbor embedding (t-SNE) and autoencoder-based latent representation methods to enhance visualization for datasets with complex manifolds