
Senior Applied Scientist @ Microsoft
Five years’ experience in AI (as of October, 2024):
Three with Microsoft USA post a specialisation (Masters) in Machine Learning from Carnegie Mellon University, USA; and
Two at TCS Research & Innovation Labs India, as a Researcher in Reinforcement Learning and Robotics, designing algorithms for robotic target-reaching tasks, post an undergraduate degree in Electronics and Communications Engineering from IIIT Delhi, India.
At Microsoft, I develop custom machine learning models for diverse industries, including finance, sustainability, and energy.
Published work includes seven research papers; contributions to IEEE transactions, journals, and conferences.
My interest lies in leveraging data to address challenges in finance, healthcare, robotics, and education. I have committed myself to developing impactful solutions, and regularly share my knowledge through talks at conferences.
My expertise includes Machine Learning, Artificial Intelligence, Reinforcement Learning, Natural Language Processing, Image Processing, and Data Structures & Algorithms.
I welcome connections and invite you to reach out via the contact form on my site.
Masters in Electrical and Computer Engineering
Bachelors in Electronics and Communication Engineering
Abstract: In this work, we review research studies that combine Reinforcement Learning (RL) and Large Language Models (LLMs), two areas that owe their momentum to the development of Deep Neural Networks (DNNs). We propose a novel taxonomy of three main classes based on the way that the two model types interact with each other. The first class, RL4LLM, includes studies where RL is leveraged to improve the performance of LLMs on tasks related to Natural Language Processing (NLP). RL4LLM is divided into two sub-categories depending on whether RL is used to directly fine-tune an existing LLM or to improve the prompt of the LLM. In the second class, LLM4RL, an LLM assists the training of an RL model that performs a task that is not inherently related to natural language. We further break down LLM4RL based on the component of the RL training framework that the LLM assists or replaces, namely reward shaping, goal generation, and policy function. Finally, in the third class, RL+LLM, an LLM and an RL agent are embedded in a common planning framework without either of them contributing to training or fine-tuning of the other. We further branch this class to distinguish between studies with and without natural language feedback. We use this taxonomy to explore the motivations behind the synergy of LLMs and RL and explain the reasons for its success, while pinpointing potential shortcomings and areas where further research is needed, as well as alternative methodologies that serve the same goal.
Read the full paper here
Abstract: Electrocardiogram (ECG) signals are known to encode unique signatures based on the geometrical characteristics of the heart. Due to other advantages - such as continuity and accessibility (now via smartwatch technology) - ECG could make for a robust biometric ID system. We show that single-node ECG measurements through an Apple Watch would suffice to identify an individual. Apart from the Apple Watch ECG data, we have also performed analysis on two other ECG datasets from PhysioNet to test the robustness of our methods in two situations: in particular, we tested how it holds up against high volume (across a large number of individuals) and high variability (across different states of activity). We have also compared multiple classifier models in combination with different feature sets to identify the most superior combination. We observed Equal Error Rate (EER) values that were consistently <; 3%. Our results show that ECG proves to be very effective and robust.
Read the full paper here
Abstract: This paper presents a feature agnostic and model-free visual servoing (VS) technique using deep reinforcement learning (DRL) which exploits two new architectures of experience replay buffer in deep deterministic policy gradient (DDPG). The proposed architectures are significantly fast and converge in a few numbers of steps. We use the proposed method to learn an end-to-end VS with eye-in-hand configuration. In traditional DDPG, the experience replay memory is randomly sampled for training the actor-critic network. This results in a loss of useful experiences when the buffer contains very few successful examples. We solve this problem by proposing two new replay buffer architectures: (a) min-heap DDPG (mH-DDPG) and (b) dual replay buffer DDPG (dR-DDPG). The former uses a min-heap data structure to implement the replay buffer whereas the latter uses two buffers to separate “good” examples from the “bad” examples. The training data for the actor-critic network is created as a weighted combination of the two buffers. The proposed algorithms are validated in simulation with the UR5 robotic manipulator model. It is observed that as the number of good experiences increases in the training data, the convergence time decreases. We find 27.25% and 43.25% improvements in the rate of convergence respectively by mH-DDPG and dR-DDPG over state-of-the-art DDPG.
Read the full paper here
Abstract: Video acquisition using dashboard-mounted cameras has recently achieved massive popularity around the world. One of the major developments following the dash-cam’s popularity is that videos captured by them can be used as testimony during scenarios like traffic violations and accidents. The widespread deployment of dash-cam’s brings new problems ranging from the compromise of privacy by uploading of these videos on public websites to using videos captured from other cars for making fraudulent claims. Therefore, there is a compelling need to address the problems associated with the usage of dash-cam videos. In this paper, we discuss and highlight the importance of the emerging area of multimedia vehicle forensics. We propose an algorithm for linking a dash-cam video to a specific car. The proposed algorithm is useful for various applications for example, insurance companies can authenticate the origin of video before processing the claim. In a different scenario of illegitimate video upload on the web, the video can be traced back to the car it originated from. To this end, we make use of motion blur extracted from dash-cam videos for generating a discriminative feature. We observe that the subtle motion pattern of every vehicle can serve as its unique signature. We extract motion blur from dash-cam videos and use random forest trees for classifying the vehicle correctly. Experimental results on thousands of frames obtained from dash-cam videos of several cars show the effectiveness of our approach. We further investigate the process of forging the signature of a car and propose a counter forensics method to detect such forgery. Also, we discuss the application of our technique to other potential platforms where the camera can be mounted, for example, on the chest of a person. We believe that ours is the first work that describes this new area of research.
Read the full paper here
Abstract: In multi-echo imaging, multiple T1/T2 weighted images of the same cross section is acquired.
Acquiring multiple scans is time consuming. In order to accelerate, compressed sensing based techniques have been proposed. In recent times, it has been observed in several areas of traditional compressed sensing, that instead of using fixed basis (wavelet, DCT etc.), considerably better results can be achieved by learning the basis adaptively from the data. Motivated by these studies, we propose to employ such adaptive learning techniques to improve reconstruction of multi-echo scans. This work will be based on two basis learning models – synthesis (better known as dictionary learning) and analysis (known as transform learning). We modify these basic methods by incorporating structure of the multi-echo scans. Our work shows that we can indeed significantly improve multi-echo imaging over compressed sensing based techniques and other unstructured adaptive sparse recovery methods
Read the full paper here
Abstract: This work addresses the problem of denoising multiple measurement vectors having a common sparse support. Such problems arise in a variety of imaging problems, e.g. color imaging, multi-spectral and hyper-spectral imaging, multi-echo and multi-channel magnetic resonance imaging, etc. For such cases, denoising them piecemeal, one channel at a time, is not optimal; since it does not exploit the full structure (joint sparsity) of the problem. Joint-sparsity based methods have been used for solving such problems when the sparsifying transform is assumed to be fixed. In this work, we learn the sparsifying basis following the dictionary learning paradigm. Results on multi-spectral denoising and multi-echo MRI denoising demonstrates the superiority of our method over existing ones based on KSVD and BM4D.
Read the full paper here
Electric Network Frequency(ENF) is a recently developed technique for the authentication of audio signals. ENF gets embedded in audio signals due to electromagnetic interferences from power lines and hence can be used as a measure to find the geographical location and time of recording. Given an audio signal, this paper presents a technique of finding the power grid the audio belongs to. Towards this, we first extract the ENF sinusoid using a very narrow bandwidth filter centered around the nominal frequency. The filter is designed using a frequency response masking approach. The ENF is estimated from the ENF sinusoid using Short time Fourier Transform technique. In order to classify a given audio, we first estimate ENF from the ENF sinusoids obtained from audio and power signals. Further, we use a matched filter in order to decide the corresponding power grid of the audio.
Read the full paper here
Industry AI Team(July 2022-Present):
Bing Search Team ( June 2021- June 2022):
Robustness of ECG as a biometric identifier (Jan 2020 - May 2020)
Neural Differential Equation for Drug Responses (Mar 2020 - Dec 2020)
Robotics Lab (Oct 2017 - Dec 2019)
SALSA Lab (Jun 2016- Oct 2017)
Guide - Prof. Angshul Majumdar
Multimedia and Vision Lab (Aug 2016 - Oct 2017)
Guide - Prof A.V.Subramanium
Next Generation Voucher Server Team (Jan 2016 - May 2016)
Senior Applied Scientist @ Microsoft
Research Assistant @ CMU
Researcher @ TCS Research Labs
Intern @ Ericsson R&D
Conducted a comprehensive, day-long workshop on Deep Learning, covering key concepts such as Recurrent Neural Networks (RNN), GPT models, Responsible AI, and Reinforcement Learning. The workshop featured hands-on coding exercises utilizing Python, PyTorch, and TensorFlow to reinforce these concepts.
Delivered a 30-minute technical presentation on "Designing Custom Classifiers for Hallucination Detection in Large Language Models."
Presented a 30-minute technical talk on "Enhancing Document Q&A with Chart Understanding," attended by an audience of 400.
Led a full-day workshop titled "Deep Learning in Practice: A Hands-On Introduction" at the Machine Learning Week conference in Las Vegas. The workshop encompassed a broad array of topics, including Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), GPT models, Responsible AI, and foundational reinforcement learning algorithms such as multi-arm bandits and Q-learning.
Invited by the Alumni Association of IIIT Delhi as a guest speaker to share my journey of applying to and studying at Carnegie Mellon University. The talk, titled "Applying for Higher Education in the USA," provided insights into the application process and academic experience.
At Microsoft, I created a custom visual language model for understanding charts in documents. I developed end-to-end pipeline for data generation, finetuning and evaluation. This work was accepted and presented at the Microsoft's Machine Learning and Data Science Conference (MLADS) 2023.
Presented my research work titled ''Pulse ID: The Case for Robustness of ECG as a Biometric Identifier' at the conference.
Lead a team in conducting research on the intersection of Reinforcement Learning and Large Language models at Microsoft. Gave a talk along with other team members on the topic 'The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models' at MLADS 2023
At TCS Research, I designed novel Experience Replay architectures called mH-DDPG and dR-DDPG, which improved the convergence rates by 25% and 43% over state-of-art DDPG algorithm. I presented the results at the 28th IEEE Ro-man Conference.
At TCS Research, I lead a team of 2 researchers and developed a pipeline for transferring a RL based policy learned in simulation to a real UR5 robot. Presented the developed algorithm at IEEE Translearn 2019 workshop.
Delivered a talk on 'Reinforcement Learning : Application and Challenges in Robotics'. The talk was attended by 200+ students and professionals.
Delivered a talk on the role of 'Artificial Intelligence for Robotics' at the Women in Machine Learning & Data Science's Delhi chapter. The talk was attended by 80+ students and professionals.
Received the coveted IEEE Signal Processing Travel Grant for presenting my research paper titled 'Joint-sparse dictionary learning: Denoising multiple measurement vectors' at GlobalSIP 2017 conference in Montreal, Canada.