Principal Site Reliability Engineer
Apply NowLocation:
Redwood City, CA, US
Company:
Nevro is a global medical device company focused on life-changing solutions for chronic pain treatment.
Summary:
The Principal Site Reliability Engineer will build and maintain AWS infrastructure and developer tooling while defining best practices. Candidates require a Bachelor's degree and 10 years of relevant experience.
Requirements:
Credentials: Bachelor’s degree in Computer Science, Electronics Engineering, or Computer Applications
Experience: 10 years of progressive experience as a development engineer, site reliability engineer, or any occupation in development engineering, 10 years of experience with CI/CD tools (including Jenkins, GitHub Actions), 10 years of experience with scripting technologies, including Powershell, Python, and bash, 10 years of experience with Object oriented Languages, including C# and Java, 10 years of experience with networking, load balancing, DNS, and security configurations, 4 years of experience with Git, 4 years of experience with Infra-as-Code (including Terraform and AWS CloudFormation), 4 years of experience with production systems monitoring using tools such as Datadog and Grafana
Job Description:
Build and maintain AWS infrastructure and developer tooling. Write terraform infrastructure modules. Perform Github actions. Support, monitor, and manage cloud infrastructure via Infrastructure as Code (terraform). Use Object oriented Languages, including C# and Java. Contribute to Nevro’s Cloud roadmap and strategy that scales horizontally and provide balance between quality, efficiency, and usability through automation and developer efficiency. Define Site Reliability Engineer (SRE) Handbook and best practices. Define Runbooks and developer processes. Participate in on-call rotation to resolve site incidents and document findings into repeatable procedures. Work cross-functionally with departments such as Marketing, Regulatory and Quality. Guide implementation of SRE practices in total product development cycle. Assist in CI/CD tooling development and guide the software development lifecycle. Prepare reports and presentations and document progress with senior management. Share ownership with Web Services team to create shared responsibility where SRE owns availability of service (SLOs/KPIs) and establishing monitoring and alerting practices. Establish SLOs/KPIs. Define Service Level Objectives to assess release readiness of all services. Lead and define significant/whole portions of planning, developing, coordinating, and directing development across complex products, directing internal and external resources. Contribute software/scripts to enable easier operational support for other SREs and developer using scripting technologies, including Powershell, Python, and bash. Perform networking, load balancing, DNS, and security configurations Identify, document, and help improve performance and operational efficiency challenges. Monitor production systems using tools such as Datadog and Grafana. Provide and support computing infrastructure (Infra-as-Code), including Terraform and AWS CloudFormation. Document developer processes. Configure and manage monitoring using tools like Datadog, Grafana. Create dashboards, monitors, alerts. Position allows for telecommuting from anywhere in US.
#LI-DNI