Principal AI Engineer - ML Ops
About the Team
The AI Center of Excellence team includes Data Scientists and AI Engineers that work together to conduct research, build prototypes, design features and build production AI components and systems. Our mission is to leverage the best available technology to protect our customers' attack surfaces. We partner closely with Detection and Response teams, including our MDR service, to leverage AI/ML for enhanced customer security and threat detection. We operate with a creative, iterative approach, building on 20+ years of threat analysis and a growing patent portfolio. We foster a collaborative environment, sharing knowledge, developing internal learning, and encouraging research publication. If you’re passionate about AI and want to make a major impact in a fast-paced, innovative environment, this is your opportunity.
The technologies we use include:
AWS for hosting our research environments, data, and features
EKS to deploy applications
Terraform to manage infrastructure
Python for analysis and modeling, taking advantage of numpy and pandas for data wrangling.
Jupyter notebooks (locally and remotely hosted) as a computational environment
Sci-kit learn for building machine learning models
Anomaly detection methods to make sense of unlabeled data
About the Role
Rapid7 is seeking a Principal AI Engineer to join our team as we expand and evolve our growing AI and MLOps efforts. You should have a strong foundation in applied AI R&D, software engineering, and MLOps and DevOps systems and tools. Further, you’ll have a demonstrated track record of taking models created in the AI R&D process to production with repeatable deployment, monitoring and observability patterns. In this intersectional role, you will combine your expertise in AI/ML deployments, cloud systems and software engineering to enhance our product offerings and streamline our platform's functionalities.
In this role, you will:
Architect and manage the end-to-end design of ML production systems, including project scoping, data requirements, modeling strategies, and deployment
Develop and maintain data pipelines, manage the data lifecycle, and ensure data quality and consistency throughout
Assure robust implementation of ML guardrails and manage all aspects of service monitoring
Develop and deploy accessible endpoints, including web applications and REST APIs, while maintaining steadfast data privacy and adherence to security best practices and regulations
Share expertise and knowledge consistently with internal and external stakeholders, nurturing a collaborative environment and fostering the development of junior engineers
Embrace agile development practices, valuing constant iteration, improvement, and effective problem-solving in complex and ambiguous scenarios
The skills you’ll bring include:
15 years experience as a Software Engineer with 3-5 years focused on gaining expertise in ML deployment (especially in AWS)
Solid technical experience in the following is required:
Software engineering: developing APIs with Flask or FastAPI, paired with strong Python knowledge
DevOps and MLOps: Designing and integrating scalable AI/ML systems into production environments, CI/CD tooling, Docker, Kubernetes, cloud AI resource utilization and management
Pipelines, monitoring, and observability: Data pre-processing and feature engineering, model monitoring and evaluation
A growth mindset - welcoming the challenge of tackling complex problems with a bias for action
Strong written and verbal communication skills - able to effectively communicate technical concepts to diverse audiences and creating clear documentation of system architectures and implementation details
Proven ability to collaborate effectively across engineering, data science, product, and other teams to drive successful MLOps initiatives and ensure alignment on goals and deliverables.
Established track record of mentoring and guiding junior engineers, fostering their technical growth and promoting engineering excellence within the organization
Experience with the following would be advantageous:
AI and ML models, understanding their operational frameworks and limitations
Deploying resources that enable data scientists to fine tune and experiment with LLMs
Implementing model risk management strategies, including model registries, concept/covariate drift monitoring, and hyperparameter tuning
We know that the best ideas and solutions come from multi-dimensional teams. That’s because these teams reflect a variety of backgrounds and professional experiences. If you are excited about this role and feel your experience can make an impact, please don’t be shy - apply today.
About Rapid7
At Rapid7, we are on a mission to create a secure digital world for our customers, our industry, and our communities. We do this by embracing tenacity, passion, and collaboration to challenge what’s possible and drive extraordinary impact.
Here, we’re building a dynamic workplace where everyone can have the career experience of a lifetime. We challenge ourselves to grow to our full potential. We learn from our missteps and celebrate our victories. We come to work every day to push boundaries in cybersecurity and keep our 10,000 global customers ahead of whatever’s next.
Join us and bring your unique experiences and perspectives to tackle some of the world’s biggest security challenges.
#LI-MS
Security and Compliance
Rapid7 is committed to keeping customers secure. As a first line of defense, all employees are expected to uphold the highest standards of security and privacy, ensuring the protection of sensitive information and compliance with relevant regulations.
Application loading...
Thank you
Application loading...
Before you leave, complete the form below to join our talent community so we can stay in touch.
Thank you
We use cookies.
Some are necessary to operate the website and its functions. Others help personalize, improve content and services to show you the most relevant job opportunities. With the decision "Accept essential only" we will respect your privacy and will not set cookies that aren't necessary for the operation of the site.