Site Reliability Engineering Manager
Apply NowLocation:
US
Company:
Articulate is the leading SaaS provider of creator platforms for online workplace training, dedicated to enhancing employee skills and engagement.
Summary:
The SRE Manager will lead and guide the Site Reliability Engineering team to enhance the platform's reliability and performance. Candidates should have significant experience in DevOps and demonstrate strong leadership capabilities.
Requirements:
Technology: AWS, Terraform, Kubernetes, Docker, Datadog, Grafana
Hard Skills: DevOps, Site Reliability Engineering, Platform engineering, AWS, Terraform, Kubernetes, Docker, Python, Go, Datadog, Grafana
Experience: 8+ years in DevOps, Site Reliability Engineering or Platform engineering, 5+ years in a leadership role as a senior developer, team lead, or engineering manager
Job Description:
Site Reliability Engineering Manager
United StatesEngineering /Full-time- Remote /Remote
Articulate is looking for a SRE Manager to join our amazing Platform team!
As a SRE Manager, you'll be a tactical leader in the organization who leads and guides the Site Reliability Engineering team in delighting our customers with a world-class platform that is reliable, scalable, and performant.
What You'll Do
- Be an example of our Human Centered Organization (HCO) philosophy by fostering a culture of collaboration, openness and personal responsibility both within our team and across Engineering.
- Drive continuous improvement through automation to enhance the reliability, scalability and maintainability of our systems and code.
- Work with the team to guide the definition and implementation of industry-leading standards for using infrastructure as code, setting high reliability requirements, monitoring and reporting on the performance of the platform, and more
- Use your extensive experience in both site reliability and software deployment systems to grow and mentor the team in site reliability best practices
- Manage 1:1s with the team to set goals, conduct performance reviews, and provide professional development opportunities specific to the needs of each team member
- Provide coaching, feedback, and plans to your reports to both improve team capabilities and to facilitate career growth
- Work cross-functionally to understand the needs of our customers (the engineering organization) so that the infrastructure and supporting platform systems provide the necessary means for our applications to operate at optimal performance and reliably at scale
- Ensure your team’s implementations meet reliability and maintainability standards as defined by broader engineering leadership
- Define and implement platform quality standards for services to ensure they meet our requirements for monitoring, security, scalability and maintainability.
- Collaborate with security and development to continually refine our testing processes to ensure a high standard of qualityIn collaboration with development experience engineering, manage and maintain internal test environments and implement policies for access, infrastructure, data and costs
- Provide operating SLAs and KPIs to inform our continuous improvement process
- Collaborate with peer teams to improve shared workflows that impact your team’s day-to-day operations
- Support product engineering and development experience engineering teams to align on long-term outcomes, set team goals and ensure accountability to those goals and outcomes
- Spend at least 50% of your time managing team administrative and growth opportunities and no more than 50% of your time delivering technical solutions and tasks. The actual split will be dependent on the needs of the team.
- Apply existing best practices, to your team’s work and remain informed of updates relevant to your team’s operational efficiency
- Guide the team to become self-organizing and self-healing so as to empower them to move quickly and iterate
What You Should Have
- 8+ years of experience in DevOps, Site Reliability Engineering or Platform engineering on a modern cloud-based infrastructure, preferably supporting a SaaS product
- 5+ years in a leadership role as a senior developer, team lead, or engineering manager
- Demonstrated ability to set high standards and hold others accountable to those standards
- A strong focus on delivering results and the ability to balance business and technical requirements
- Hands-on experience in building and maintaining infrastructure in AWS with infrastructure as code tools
- Hands-on experience in using Terraform or other similar infrastructure management technologies
- Demonstrable experience with container orchestration technologies including Kubernetes and Docker
- Proficient in programming and scripting languages, such as Python or Go
- Demonstrated experience with cloud-native monitoring and logging tools such as Datadog or Grafana
- Proven ability to review technical designs and identify opportunities for automation and delivery improvements
- Proven record of strong leadership, communication, team development, problem solving and organization skills
- Self-starter with the ability to work independently and manage multiple priorities effectively while leading your team to be self-healing and self-organizing.