Work with a strong, global engineering team to maintain and enhance world-class cloud-based-applications with cutting edge technologies on a fresh project with a great mission.
We are looking for a great site reliability engineer to maintain and continually improve our cloud-based applications!!!
- Deep understanding of Cloud Platforms like AWS and GCP and how to leverage them for compute, storage, and managed services including, but not limited to databases, managed Kubernetes, and content delivery networks.
- Experienced with modern DEVOPS engineering practices and comfortable with diverse technical problem sets, across the entire technology stack, including the virtualized hardware.
- Possess a deep understanding of the Linux Operating System and are at home on the command line / terminal at your workstation.
- Versed in Infrastructure as Code practices using technologies like Terraform, Cloud Formation, etc.
- Familiar with tools like Ansible, Puppet, Chef, and leveraging those tools for configuration automation.
- Proficient in scripting and developing automation in Python and BASH, or similar programming languages.
- Used to keeping everything you do in source control (git) and automating (scripting) any task you have to do more than once.
- Understand modern approaches to software security – and know what needs to be done to secure software systems and cloud-based infrastructure.
- Equipped with a proactive security mindset, and a solid understanding of information security and privacy principles
- Experienced in protecting modern, cloud-hosted operating environments using defense-in-depth strategies
- Comfortable operating in environments subject to regulatory, compliance, and risk-based security requirements
- Able to effectively trouble-shoot issues across the entire technology stack from (UI-> API -> Application -> Database) including the operating system and the underlying (virtual) hardware.
- Enthusiastic about cutting-edge technologies and fresh challenges that come with them.
And ideally you are:
- Experienced using Kubernetes and related technologies (such as Docker) for application orchestration.
- Excited about monitoring technologies (such as Prometheus, TICK stack), the metrics they provide, and using the data to extract information about the performance characteristics, and error modes of a cloud-based software stack.
- Proficient as a developer, experienced writing code and solving problems in at least one main-stream programming language (such as Python, Java, Go, C#, etc.).
- Have experience developing and maintaining globally deployed applications, in multiple languages, with many, many users.
- Very familiar with agile frameworks, such as Scrum and Kanban, and how to operate within these frameworks to continually deliver value.
- Experienced with mono-repo concepts and tools like bazel or pants.
- experienced developing and maintaining feature-rich applications using modern software frameworks such as Spring-Boot, Flask, .NET, etc.?
- Take pride in the quality of your code, the work it takes to make great software, and the value delivered to the end-user.
- Understand computer networking, and how it applies in cloud environments.
- The type of person that gets excited when you merge a pull request you authored.
- Hold a Bachelors or Master’s Degree in Computer Science, Electrical Engineering, or another scientific or technical discipline.
So that you can:
- Operate, monitor, and maintain high availability of software service for multiple products running in a multi-region cloud environment.
- Work with team to establish service level objectives and monitor to ensure the objective are met.
- Continually improve cloud operations automation and tooling to monitor and maintain enterprise cloud-based applications.
- Troubleshoot infrastructure and application issues, and work with the engineering team to resolve issues.
- Execute run-books for known cloud-operations tasks, and create new run-books for new situations or issues you encounter. Automate everything.
- Collaborate with a great team to maintain, monitor, and improve amazing cloud-based-applications that solve real-world problems for end users.
- Facilitate blame-free root cause analysis meetings in the event of a production-systems incident so the team can learn from mistakes and improve our systems and run books.
- Participate in stress, security, and performance testing.
- Participate in continually improving best-practices security posture.
To apply for this job email your details to firstname.lastname@example.org