Senior Site Reliability Engineer
We are looking for a talented Site Reliability Engineer (SRE) with a deep interest in distributed systems, cloud computing and the architecture of large-scale systems. The SRE lead will ensure our InsightIDR services have the ultra-high reliability and uptime necessary to meet our customers’ needs. As SRE, you will work closely with our engineering team and partner teams throughout Rapid7 to help solve extremely challenging problems at a massive scale.
About the Team
InsightIDR helps identify and address key cybersecurity risks to our customers. We apply machine learning, threat intelligence, and business intelligence to event sources, including desktops, servers, network switches, firewalls, cloud services, directory servers, DHCP servers, and SIEMs in order to distill hundreds or thousands of daily events per customer into the few real, high priority threats that need attention. Our systems ingest large amounts of data that need to be highly available and performant at all times.
Some of the technologies we use include: Java, Python, Cassandra, Dynamo, MySQL/RDS, Redis, ElasticSearch, AWS (EC2, S3, CloudFormation, etc.), Terraform and Jenkins.
At Rapid7, we value intellectual curiosity, problem solving ability, initiative, and team spirit.
About the Role
We are looking for a talented Site Reliability Engineer (SRE) with a deep interest in distributed systems, cloud computing and the architecture of large-scale systems. The SRE lead will ensure our InsightIDR services have the ultra-high reliability and uptime necessary to meet our customers’ needs. As SRE, you will work closely with our engineering team and partner teams throughout Rapid7 to help solve extremely challenging problems at a massive scale.
In this role, you will:
Establish a new Site Reliability Engineering function within Engineering
Work closely with Engineering teams, Architecture, Infrastructure and Product teams to improve the lifecycle of the InsightIDR services - from inception, design, deployment, operations, monitoring, security, upgrade and maintenance
Support services before they go live through activities such as design, deployment, migration strategy, monitoring, and playbook reviews
Maintain services once they are live by measuring and monitoring availability, latency, and overall system health
Scale systems through automation and driving service and infrastructure improvements
Troubleshoot production issues and liaising with relevant Engineering, product deployment, and platform teams for a resolution
Manage and participate in on-call support, and incident response follow-ups such as post-mortems
Mentor and coach team members
The skills you’ll bring include:
Previous experience in a lead engineering role
5+ years of experience scaling SaaS services and infrastructure
Expert knowledge of developing, scaling, automating, and troubleshooting large-scale systems
Expert knowledge of deployment and monitoring frameworks
Ability to debug, optimize code and automate routine tasks
Advanced understanding of System Performance and tuning
Strong knowledge of NoSQL and SQL concepts
Strong knowledge of OOP languages such as Java
Experience with scripting languages such as Shell, Python
Extensive experience with database operation and optimization
Strong knowledge of RESTFul architectures
Understanding of Unix/Linux operating systems
Proficient in AWS services, including EC2, RDS, S3, streaming data, etc.
Systematic problem-solving approach
Excellent communication & influencing skills
Strong technical writing skills
We know that the best ideas and solutions come from multi-dimensional teams. Teams reflecting a variety of backgrounds and professional experiences. If you are excited about this role and feel your experience can make an impact, please don’t be shy - apply today.
About Rapid7
Rapid7 is creating a more secure digital future for all by helping organizations strengthen their security programs in the face of accelerating digital transformation. Our portfolio of best-in-class solutions empowers security professionals to manage risk and eliminate threats across the entire threat landscape from apps to the cloud to traditional infrastructure to the dark web. We foster open source communities and cutting-edge research–using these insights to optimize our products and arm the global security community with the latest in attackers methods. Trusted by more than 10,000 customers worldwide, our industry-leading solutions and services help businesses stay ahead of attackers, ahead of the competition, and future-ready for what’s next.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status.
Application loading...
Thank you
Application loading...
Rapid7 uses cookies and similar technologies as strictly necessary to make our site work. We and our partners would also like to set additional cookies to analyze your use of our site, to personalize and enhance your visit to our site and to show you more relevant content and advertising.
These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will not then work.
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site. If you do not allow these cookie we will not know when you have visited our site, and will not be able to monitor its performance.