$154,600 – $232,000/yr

Site Reliability Engineering Manager

Full-time Remote 13d ago

Location:

Company:

Articulate is the leading SaaS provider of creator platforms for online workplace training, dedicated to enhancing employee skills and engagement.

Summary:

The SRE Manager will lead and guide the Site Reliability Engineering team to enhance the platform's reliability and performance. Candidates should have significant experience in DevOps and demonstrate strong leadership capabilities.

Requirements:

Technology: AWS, Terraform, Kubernetes, Docker, Datadog, Grafana

Hard Skills: DevOps, Site Reliability Engineering, Platform engineering, AWS, Terraform, Kubernetes, Docker, Python, Go, Datadog, Grafana

Experience: 8+ years in DevOps, Site Reliability Engineering or Platform engineering, 5+ years in a leadership role as a senior developer, team lead, or engineering manager

Job Description:

Site Reliability Engineering Manager

United StatesEngineering /Full-time- Remote /Remote
Articulate is looking for a SRE Manager to join our amazing Platform team!
As a SRE Manager, you'll be a tactical leader in the organization who leads and guides the Site Reliability Engineering team in delighting our customers with a world-class platform that is reliable, scalable, and performant.

What You'll Do

Be an example of our Human Centered Organization (HCO) philosophy by fostering a culture of collaboration, openness and personal responsibility both within our team and across Engineering.
Drive continuous improvement through automation to enhance the reliability, scalability and maintainability of our systems and code.
Work with the team to guide the definition and implementation of industry-leading standards for using infrastructure as code, setting high reliability requirements, monitoring and reporting on the performance of the platform, and more
Use your extensive experience in both site reliability and software deployment systems to grow and mentor the team in site reliability best practices
Manage 1:1s with the team to set goals, conduct performance reviews, and provide professional development opportunities specific to the needs of each team member
Provide coaching, feedback, and plans to your reports to both improve team capabilities and to facilitate career growth
Work cross-functionally to understand the needs of our customers (the engineering organization) so that the infrastructure and supporting platform systems provide the necessary means for our applications to operate at optimal performance and reliably at scale
Ensure your team’s implementations meet reliability and maintainability standards as defined by broader engineering leadership
Define and implement platform quality standards for services to ensure they meet our requirements for monitoring, security, scalability and maintainability.
Collaborate with security and development to continually refine our testing processes to ensure a high standard of qualityIn collaboration with development experience engineering, manage and maintain internal test environments and implement policies for access, infrastructure, data and costs
Provide operating SLAs and KPIs to inform our continuous improvement process
Collaborate with peer teams to improve shared workflows that impact your team’s day-to-day operations
Support product engineering and development experience engineering teams to align on long-term outcomes, set team goals and ensure accountability to those goals and outcomes
Spend at least 50% of your time managing team administrative and growth opportunities and no more than 50% of your time delivering technical solutions and tasks. The actual split will be dependent on the needs of the team.
Apply existing best practices, to your team’s work and remain informed of updates relevant to your team’s operational efficiency
Guide the team to become self-organizing and self-healing so as to empower them to move quickly and iterate

What You Should Have

8+ years of experience in DevOps, Site Reliability Engineering or Platform engineering on a modern cloud-based infrastructure, preferably supporting a SaaS product
5+ years in a leadership role as a senior developer, team lead, or engineering manager
Demonstrated ability to set high standards and hold others accountable to those standards
A strong focus on delivering results and the ability to balance business and technical requirements
Hands-on experience in building and maintaining infrastructure in AWS with infrastructure as code tools
Hands-on experience in using Terraform or other similar infrastructure management technologies
Demonstrable experience with container orchestration technologies including Kubernetes and Docker
Proficient in programming and scripting languages, such as Python or Go
Demonstrated experience with cloud-native monitoring and logging tools such as Datadog or Grafana
Proven ability to review technical designs and identify opportunities for automation and delivery improvements
Proven record of strong leadership, communication, team development, problem solving and organization skills
Self-starter with the ability to work independently and manage multiple priorities effectively while leading your team to be self-healing and self-organizing.

Location:

Company:

Summary:

Requirements:

Job Description:

Site Reliability Engineering Manager

What You'll Do

What You Should Have

Footer

Your Side Hustle Story

Company