We are seeking an experienced and highly specialized Senior CloudOps Engineer to manage, automate, and secure our production cloud infrastructure and Machine Learning (ML)/Large Language Model (LLM) operational pipelines. This role is strictly focused on the operations and infrastructure that supports our data science and engineering teams—it is not a data science or core LLM development position.
Key Responsibilities and Required Expertise
The successful candidate will be an expert in all the following areas, driving high availability, scalability, and security.
I. Cloud Infrastructure & Automation
Infrastructure as Code (IaC): Deep expertise in managing and provisioning infrastructure using Terraform.
Containerization & Orchestration: Advanced deployment, scaling, and management of services using Docker/Kubernetes.
Networking & Services: Architecting and maintaining high-performance API Layers & Microservices.
AWS CloudOps: Expert proficiency in AWS operational services, including EventBridge and Step Functions, for building robust automation flows.
Data Storage: Managing and optimizing critical AWS data services, including S3, DynamoDB, Redshift, and Kinesis.
II. MLOps Tooling & Monitoring
ML/LLM Tooling Support: Provide and maintain the operational infrastructure for ML/LLM systems, including Model Registry/Versioning tools like MLflow/SageMaker.
Pipeline Automation (CI/CD): Designing and implementing robust CI/CD pipelines for ML/LLM deployments using tools like GitHub Actions/Jenkins.
Model Operations: Building the infrastructure to support Drift Detection & Retraining capabilities.
Monitoring & Alerting: Implementing comprehensive observability stacks using Prometheus/Grafana/CloudWatch.
Incident Management: Leading resolution efforts for production issues, including expertise with PagerDuty and On-call responsibilities.
III. Security & Compliance (FinOps)
Cloud Security: Establishing and enforcing strong security policies and best practices across the cloud environment (IAM, VPC, Secrets).
AWS Security Services: Expert knowledge and application of specific AWS security tools like IAM, KMS, and Secrets Manager.
Cost Optimization: Leading initiatives for Cost Optimization (FinOps), balancing performance and efficiency across all cloud resources.
Security and Compliance Rapid7 is committed to keeping customers secure. As a first line of defense, all employees are expected to uphold the highest standards of security and privacy, ensuring the protection of sensitive information and compliance with relevant regulations.
Are you a Product Leader who thrives on turning complex challenges into customer value?
Do you enjoy collaborating closely with engineering teams to build scalable, impactful solutions?
Are you motivated by validating ideas, exploring opportunitie...
Staff Platform Operations Engineer
Rapid7 is a publicly traded Cybersecurity company headquartered in Boston, MA with 17 offices around the world. We are excited to be expanding our Global footprint into India and as we build out our Product & En...
🌟 Principal LLM Engineer
Join Rapid7: Secure the Future with AI
Are you ready to lead the charge in integrating cutting-edge Large Language Models (LLMs) into world-class Cyber Security products?
Rapid7 is looking for a Principal LLM Engineer wi...
Senior Network Engineer
Location: Pune, India
Experience: 5+ years
About the Role
We are seeking a Senior Network Engineer to help scale, secure, and support Rapid7’s global network infrastructure. This is a hands-on role focused on operational e...
Apply Now
Application loading...
Not ready to apply?
Before you leave, complete the form below to join our talent community so we can stay in touch.