Location: Remote (near Portland, OR or Cypress, CA)
Are you passionate about software development? Would you like to make a real-life impact to connected families around the world? Our client is looking for a collaborative, skilled Site Reliability Software Engineer with expertise in Kubernetes and AWS. If you’re interested in this role, please do submit your resume to email@example.com. We value diversity in the workplace and encourage women, minorities, and veterans to apply. Thank you!
Job Type: FTE
We are looking for someone to help design, implement, and maintain software and infrastructure to support operations and teams. You will work closely with the other team members located at both offices. Due to safety concerns regarding COVID-19 they will be working fully remote for the foreseeable future.
A Site Reliability Software Engineer is responsible for working with fellow engineers to develop and maintain the infrastructure that our client's products and internal tools run on. Based on experience level, a Site Reliability Software Engineer may be responsible for anything from managing individual servers to improving the overall design of our cloud service architecture.
Site Reliability Engineers are expected to work effectively within their team and across the organization in a sustainable manner. Within their team, they are expected to contribute to training new hires and ensure distribution of knowledge by regularly requesting and providing code reviews.
Aside from technical contributions, SRE's are expected to contribute to a healthy and professional work culture. Candidates must be comfortable working in a fast paced, small team startup environment.
● Must demonstrate initiative and ability to adapt
● Must be driven to constantly learn new things
● Must be comfortable working in a team environment with minimal supervision
● Experience managing infrastructure hosted on AWS, Linode, and Digital Ocean
● Experience managing a Kubernetes cluster in production
● Experience participating and running incident post incident reviews
● Experience implementing and evolving a CI/CD pipeline
● Experience implementing cloud service monitoring and alerting
● Familiarity with variants of fundamental network concepts such as load balancing, high availability, and scalability
● Experience with improving and optimizing the incident response process
● Experience with REST architecture for cloud services
● Experience applying computer science fundamentals in data structures, problem solving, and complexity analysis
● Proficient in source control systems, especially git
● Great communication and collaboration skills, in person, online, and by video conference
● Experience developing cloud services using Golang a plus
● Experience with test automation frameworks and writing automated tests
● Experience with actively documenting “tribal” knowledge a plus
● Completes development assignments on time with high quality
● Anticipates technical challenges and propose/evaluate solutions
● Collaborates with other teams to implement solutions effectively
● Reviews pull requests promptly
● Innovates on company products
● Researches and proposes new technologies
● Makes sure code is testable and well-tested
● Keeps tech debt in check
● Identifies problems with development processes
● Proposes and implements new development processes and process improvements
● Provides input with regards to technology strategy
● On call rotation for production issues, along with the rest of engineering