Professional Experience

Data Engineer

TikTok

San Jose, US • Jan 2026 – Present • Full-Time

Designing and building large-scale data pipelines to support enterprise-wide analytics, monitoring, and governance use cases across multiple TikTok product domains.
Developing robust data integrations via APIs and streaming frameworks to ensure reliable ingestion and processing of high-volume, heterogeneous data sources.
Building scalable batch and real-time ETL workflows using distributed big data systems to automate metrics reporting and improve operational efficiency.
Conducting deep data investigations and mining to identify root causes, patterns, and trends across large datasets, supporting data-driven decision-making.
Designing standardized data solutions, documentation, and runbooks to improve reliability, reproducibility, and response time for data workflows.
Collaborating with cross-functional engineering, product, and analytics teams to define data requirements, ensure data quality, and deliver production-ready systems.

Technologies: Python, SQL, Spark, Flink, Kafka, Hive, ClickHouse, Airflow, Hadoop, Distributed Systems

TikTok

San Jose, US • May 2025 – August 2025 • Internship

Developed and optimized data ingestion pipelines performance, enabling ingestion of over 12TB/day of production data from MySQL and Kafka, using Flink into TikTok's centralized data lake and data warehouse.
Engineered real-time and batch ETL data workflows for 12+ Hive tables on HDFS, enabling pipeline integration for production-grade AI data models for TikTok's product features downstream analytics via Airflow-orchestrated DAG.
Collaborated with 3 cross-functional teams to consolidate revenue metrics for TikTok creators by replacing 8 separate Redis structures with a unified table in Hive, optimizing Redis space; integrated outputs into Grafana for BI reporting.
Enhanced pipeline reliability by implementing data warehouse health checks, reducing data latency incidents by 6% and ensuring SLAs across real-time streaming and OLAP analytics workloads powered by ClickHouse and Python.

Technologies: Flink, RabbitMQ, MySQL, Kafka, Redis, Hive, Spark, Python, Airflow, Grafana, ClickHouse

American Express

Gurgaon, India • July 2022 – July 2024 • Full-Time

Spearheaded the development of data modeling as tech lead in the Credit Limit Management team, leveraging distributed frameworks using Python, Spark, Hadoop and HiveQL to optimize credit risk workflows, achieving 32% YOY growth.
Partnered with the platform team to design ETL pipelines for scalable data processing. Integrated data quality checks to ensure analysis-ready datasets, driving a revenue impact exceeding $84 million across key markets.
Managed capacity management product by providing borrowing capacity calculations to customer management journeys using Kafka based streaming solutions. Leveraged Jenkins for CI/CD deployment and performance monitoring.
Enhanced system efficiency by 23% by automating regression report generation using a scalable API framework developed using Python and Javascript. Conducted end-to-end validation of production rules by authoring complex SQL scripts.
Led a comparative evaluation of three potential cloud partners AWS, Azure, and GCP benchmarking cost, scalability, and compliance factors to inform American Express's hybrid cloud strategy.

Technologies: Python, Spark, Hadoop, HiveQL, Kafka, Jenkins, SQL, JavaScript, AWS, Azure, GCP, Rest API

Ernst & Young

Mumbai, India • April 2021 – June 2021 • Internship

Integrated Amazon Rekognition into the content moderation pipeline to auto‐detect explicit user videos in the Dhak Dhak app with Java, enhancing safety for 720K+ users and logging events to Amazon DynamoDB.
Revamped platform responsiveness, cutting processing time by 14% using AWS Lambda, Amazon S3 for video storage, and Amazon SQS to buffer workloads; monitored performance with CloudWatch dashboards.

Technologies: Java, AWS (Lambda, S3, SQS, CloudWatch), DynamoDB, Amazon Rekognition, Spring Boot, Docker, Git

NYU Visualization and Data Analytics Research Center (VIDA)

New York, US • Sep 2024 – Present • ResearchAdvisor: Prof. Dr. Robert Krueger

Maintaining an open-source Python–Flask application enabling semi-automatic gating in multiplex immunofluorescence (IF) imaging. The tool combines computational scalability with interactive visualization, empowering biomedical researchers to refine Gaussian mixture model–based auto-gating through visual feedback.
Designing an LLM-based explanation module to generate natural language summaries of spatial and cellular phenomena within 2D and 3D biomedical images. Exploring multimodal prompt strategies (zero-shot, few-shot, and fine-tuned) tailored for cancer imaging and spatial expression analysis.

Centre for Advanced Data Science, VIT Chennai

Chennai, IN • Oct 2021 – Dec 2021 • ResearchAdvisor: Prof. Dr. Radhika Selvamani

Conducted research on structural credit risk modeling, developing a hybrid analytical framework combining Merton and Monte Carlo simulation models to quantify company assets and default probabilities.
Co-authored a publication titled "Hybrid Approach for Quantifying Company Assets Using Structural Credit Risk Models," later featured in Sustainable Development in AI, Blockchain, and E-Governance Applications (Springer).

New York University | CGPA: 3.9/4.0

2024 - 2025

Relevant Coursework: Algorithm, Distributed Systems, Big Data, Software Engineering, Data Science, Machine Learning, Deep Learning

Vellore Institute of Technology | CGPA: 3.8/4.0

2018 - 2022

Relevant Coursework: Operating System, Database Management System, Computer Architecture and Organization