Harshith Pasumarthi

Data Engineer / Linux Systems Engineer / Software Engineer

I build cloud-native backend systems and data workflows to move from manual processes to scalable, automated systems.

About Me

I am a software engineer focused on building clean backend systems and robust data infrastructure. I spent three years working at Amazon/AWS as an SDE 1, where I focused on managing virtual desktop infrastructure and enterprise system health, cloud optimization, and server workflows. That experience taught me how to write code designed to stay reliable under production workloads.

Right now, I am pursuing my graduate studies in Artificial Intelligence at the University of the Cumberlands. This lets me approach data systems with a dual lens combining core backend infrastructure with the engineering practices required to process data and stream it effectively into production-ready analytical pipelines.

Experience
Amazon - Software Development Engineer 1
Jul 2022 – May 2025
Current Academic Focus
M.S. Artificial Intelligence
Institution
University of the Cumberlands

Skills

Languages

Python Java SQL JavaScript Bash-Scripting

Data Engineering

Apache-Airflow ETL-Pipelines PostgreSQL MySQL Data-Modeling Batch-Processing

Cloud & Infrastructure

AWS Docker Linux Linux-Server Headless-Ubuntu Git CI/CD Cloud-Automation

AI & Frameworks

Hugging-Face TensorFlow PyTorch Predictive-Modeling Data-Analysis

Projects

Automated Batch Ingestion & ETL Pipeline

Built a structured data pipeline that automatically extracts, transforms, and loads high-volume daily records from decentralized external REST APIs into a local database system.

How it works: Used Apache Airflow to orchestrate data fetches, applied isolation structures to clean the streaming dataset, and saved the values into partitioned database tables.

Result: Successfully processed and loaded over 500,000 records onto a headless Ubuntu server completely error-free.

Apache Airflow // PostgreSQL // Docker // Headless Ubuntu // Analytics

FraudShield Framework

Developed an open-source real-time fraud detection infrastructure framework engineered to intercept anomalies and evaluate threats securely at execution time.

How it works: Connected streaming message data pipelines directly to real-time feature storage layers and optimized model-serving engines for fast lookups.

Result: Released openly to the developer ecosystem, scaling up traction to hit over 500+ stars on GitHub.

Kafka Streams // Feature Store // Model Serving // Real-time Ingestion

LLM Eval Toolkit

Designed a rigorous open-source evaluation framework framework tailored specifically for reviewing the accuracy and outputs of Large Language Model integrations.

How it works: Created automated assessment layers running user-defined parameters, regression checking routines, and hooks designed for standard deployment integrations.

Result: Standardized structural evaluation tasks within development pipelines to intercept performance drops before staging updates live.

Python // CI/CD Integration // Regression Testing // Model Performance