Why join Casper Labs?
With vast growth opportunities, projects with global impact, high-level industry professionals, and game-changing technologies, Casper Labs is an ideal destination to take the next step in your career. Join our team of builders and innovators to drive the next wave of blockchain technology!
Lead the era of distributed computing by giving every organization access to state-of-the-art technologies and services.
different countries where our team members are located
93 percent annual team member retention
The Benefits We Offer
We offer a competitive salary combined with employment in a stable, long-term-oriented, and well-founded company.
Casper Labs offers long-term incentive plans in the form of both the Casper token and share options.
Casper Labs offers a fully remote working environment that allows team members to work when, where, and how they want.
Customized IT Equipment
We provide a generous budget for IT equipment of your choice.
Flexible Time Off
We understand that things happen, from sickness to just needing an extra day off to recuperate or deal with family issues.
We provide an education budget and dedicated time off to pursue the education you prefer - whatever allows you to grow is fine with us.
Open to all
Should you be interested in opportunities at CasperLabs and do not see a relevant open role on our careers page please submit a Cv and cover letter to our talent pool. We will review and if we have an opportunity we will reach out or we will store your details and as soon as a relevant position is open we will match you to it and reach out.
We are a team of builders, entrepreneurs, academics, and leaders who believe strongly in the potential of a blockchain-enabled world. We’ve come together to steward the development of the Casper Network, a blockchain protocol built from the ground up to remain true to core Web3 principles and adapt to the needs of our evolving world. Come join us and help build the future.
The role of a Site Reliability Manager is especially critical due to the complex nature of large-scale software systems and the high demand for reliability and performance. Some of the specific purposes and responsibilities are ensure system reliability, lead / develop and mentor SRE team members, collaborate with development teams, define and monitor SLOs and SLIs and take a hands-on approach to incident management, performance monitoring and optimization, automation and tooling and security and compliance.
- Team Leadership: Lead and manage a team of Site Reliability Engineers (SREs) in maintaining the operational aspects of our software infrastructure. Foster a culture of collaboration, reliability, and continuous improvement within the SRE teams.
- Collaboration with Development Teams: Work closely with software development teams to integrate reliability practices into the software development life cycle. Ensure a seamless collaboration between SRE and development teams (DevOps).
- Incident Management: Oversee incident management activities, including coordinating responses to incidents, leading post-incident reviews, and implementing preventive measures.
- Automation and Tooling: Implement automation and tooling to streamline operational processes, including deployment, monitoring, and recovery.
- Performance Monitoring and Optimization: Monitor and optimize the performance of enterprise software systems, collaborating with development teams to address performance bottlenecks.
- Capacity Planning: Conduct capacity planning to ensure that our infrastructure can handle current and future workloads.
- Security and Compliance: Collaborate with security teams to ensure the security and compliance of our software systems, implementing security best practices and conducting regular audits.
- Communication and Reporting: Effectively communicate with executive leadership and stakeholders, providing updates on system reliability, improvement initiatives, and potential risks.
- Define and Monitor SLOs and SLIs: Set and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure and maintain the reliability and performance of our software systems.
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- Proven experience in a leadership role overseeing Site Reliability Engineering teams in an enterprise software development environment.
- Strong background in software development and operations, with a focus on reliability, scalability, and performance.
- Experience with incident management, automation, and performance monitoring tools.
- Excellent communication and interpersonal skills, with the ability to collaborate effectively with cross-functional teams.
- Knowledge of security best practices and experience ensuring compliance with industry standards.
- Experience in one or more of the following: C, C++, Rust, Java, Python, Go, Perl, Ruby or shell scripting.
- Experience with Unix/Linux operating systems internals and administration (e.g., filesystems, inodes, system calls) or networking (e.g., TCP/IP, routing, network topologies and hardware, SDN).
- Experience with monitoring and aggregating systems, such as Prometheus, Graphite, etc, and visualizing systems such as Grafana.
- Experience with CI/CD systems such as Travis, Harness, Drone, CircleCI, etc.
- Experience with version control systems such as Git, Perforce, etc.
- Experience with log aggregator systems such as ELK, Splunk, etc.
- Experience with configuration management systems such as Puppet, Ansible, Chef, Salt, etc.
- Expertise in designing, analyzing and troubleshooting large-scale distributed systems.
- Experience with a cloud based infrastructure platform (i.e. AWS).
- Ability to debug and optimize code and automate routine tasks.
- Experience with managing a fully remote workforce with time zone variation.
- Fully remote, work from home environment
- Flexible working hours
- Paid Time-Off
- Periodic in-person offsites globally (travel permitting)
- Long-term incentive programs
- Continued education support
- Advancement opportunity
Our five stage recruitment process
We start by reviewing application materials in search of the best talent.
In this call, we tell you all about the company and get to know you.
Once we determine you’re a fit, you’ll have a 1-on-1 interview with your prospective department and may also be asked to present your skills in a sample task.
After interviews are complete, we’ll send you a competitive offer.
Next up is getting you ready for the employee onboarding process and answering any further questions you may have.
Sign up for our newsletter!