Staff Machine Learning Engineer
AI copilot originator and DRI, org-wide LLM gateway owner, investigative agent contributor.
Staff ML engineer | AI products | Agentic systems
Staff-level ML engineer in Bengaluru with 8+ years across agentic AI, AI product platforms, production ML systems, and edge computer vision. At SolarWinds, I work in the Platform ML Team under Platform Engineering, building AI capabilities for ITOM and ITSM products across SaaS and self-hosted deployments.
Proof points
My strongest work sits where AI ideas have to become real product systems: workflows, APIs, auth, rate limits, evals, dashboards, SLOs, docs, and on-call ownership.
Profile
I work at SolarWinds, an ITOM/ITSM company that builds observability, monitoring, and service-management products across SaaS and self-hosted deployments. I sit in the Platform ML Team under Platform Engineering, where I originated and drive an AI copilot strategy across multiple product lines. The work spans multi-agent orchestration, MCP-based tool integration, RAG, SSE APIs, conversation management, reliability, and evaluation.
Before the current AI product work, I built AIOps systems for metric anomaly detection, alert noise reduction, RCA signal correlation, and log processing. Earlier roles covered retail edge computer vision at Infilect and CV/ML R&D at Tata Elxsi.
AI copilot originator and DRI, org-wide LLM gateway owner, investigative agent contributor.
AIOps, service-desk AI, metric anomaly detection, RCA correlation, log pattern mining.
TensorFlow Lite pipelines, retail loss detection, SKU recognition, image stitching.
Computer vision R&D for traffic-sign recognition, lane recognition, text detection, and video metadata.
Selected work
These are not toy demos. The through-line is turning AI capability into useful product experiences with the service boundaries, controls, and feedback loops needed to operate them.
Conceived and prototyped a multi-agent AI copilot in a one-week internal hackathon, then drove it into central product strategy across 5+ ITOM/ITSM product lines. Designed the agent framework, MCP tool layer, conversation APIs, RAG layer, SSE streaming contracts, and evaluation tooling.
Built and continue to own a centralized provider-agnostic LLM proxy on LiteLLM. It handles auth, model policy, distributed Redis rate limiting, Presidio-based PII masking, audit logs, spend governance, metrics, and client onboarding for engineering teams.
Shipped a time-series metric anomaly detection service that forecasted expected metric behavior to understand entity health and reduce alert noise. The service ran at roughly 10-15M requests per day, around 150 RPS, with an approximate per-request cost of one hundred-thousandth of a US dollar.
Built log processing systems around Drain3 template mining to compress repetitive logs, surface unusual log patterns, and select higher-signal context for LLM-driven incident analysis. The goal was smaller context windows with more diagnostic value, not clustering for its own sake.
Co-designed and built an LLM/RAG-powered assistant for ITSM: recommended resolutions, ticket summaries, and actionable runbooks from process documentation. Supported PII-safe routing with on-demand and auto-trigger modes.
Open source
I kept this section grounded in my public GitHub repositories rather than presenting private company work as open source.
Installable CLI for diagnosing, diffing, generating, running, and testing Node.js/TypeScript MCP servers from OpenAPI specs.
Local realtime voice app for low-latency STT to LLM to TTS conversations, persona control, saved runs, and a browser evaluation harness on Apple Silicon.
Generalized university timetable solver using constraint optimization with configurable hard constraints, soft preferences, YAML/JSON input, and a planned desktop UI.
Earlier university timetable generator for a highly constrained NP-hard scheduling problem, built as a Python desktop application during engineering.
Skills
Contact
Best fit: Staff or Senior Staff IC roles building AI products, agentic workflows, AI-assisted developer or customer experiences, and ML systems that need to operate reliably at scale.