Site Reliability Engineer @ Red Hat Software - Sacramento, CA

Job Overview

7 days ago

Site Reliability Engineer

Red Hat Software - Sacramento, CA

About the job:
The Red Hat Engineering team is looking for a Site Reliability Engineer to develop, scale, and operate our OpenShift managed cloud services; Red Hat OpenShift Container Platform is our enterprise Kubernetes distribution. In this role, you will contribute to running OpenShift at scale by enabling customer self-service, making our monitoring system more sustainable, and eliminating work through automation. As part of the Site Reliability Engineering (SRE) team, you will have the opportunity to inspire the complex challenges of scale which are unique to Red Hat managed cloud services, while using your skills in coding, operations, and large-scale distributed system design.

Red Hat relies on teamwork and openness for its success. We are a global team and strive to cultivate a transparent environment that makes room for different voices. We learn from our failures in a blameless environment to support the continuous improvement of the team. At Red Hat, your individual contributions have more visibility than most large companies, and visibility means career opportunities and growth. Successful applicants must reside in a state where Red Hat is registered to do business.
What you will do:
  • Work with live systems and coding automation
  • Contribute code to increase the scalability and reliability of the service
  • Contribute software tests and participate in peer review to increase the quality of our codebase
  • Help and develop peers’ capabilities through knowledge sharing, mentoring, and collaboration
  • Participate in a regular on-call schedule, including occasional paid weekends and holidays
  • Practice sustainable incident response and blameless postmortems
  • Resolve customer issues escalated from the Red Hat Global Support team
  • Work within a small agile team to develop and improve SRE software, support your peers, plan, and self-improve
What you will bring:
  • Bachelor's degree in computer science or a related technical field involving software or systems engineering; direct experience that demonstrates your ability and interest in SRE will also be considered
  • 3+ years of software engineering experience using object-oriented languages, preferably Golang
  • 3+ years of experience managing Linux-based systems in a public cloud like Amazon Web Service (AWS), Google Cloud Platform (GCP), or Microsoft Azure
  • 3+ years of experience with enterprise systems monitoring; knowledge of Prometheus is a plus
  • 1+ year(s) of experience delivering hosted cloud services
  • 1+ year(s) of experience with Kubernetes
  • 1+ year(s) of experience with containers on Linux
  • Ability to collaboratively troubleshoot and solve problems in a team environment
  • Experience troubleshooting an Anything-as-a-service offering (XaaS) and some experience working with complex distributed systems
  • Demonstrated ability to debug, optimize code, and automate routine tasks
  • Basic understanding of Unix or Linux operating systems
  • Excellent communications skills in a global team environment
  • Demonstrated ability to quickly and accurately troubleshoot systems issues
  • Solid understanding of standard TCP and IP networking and common protocols like domain name system (DNS) and HTTP
  • Direct experience with Kubernetes or Red Hat OpenShift Container Platform is a plus


#LI-REMOTE #LI-SM3


About Red Hat:
Red Hat is the world’s leading provider of enterprise open source software solutions, using a community-powered approach to deliver reliable and high-performing Linux, hybrid cloud, container, and Kubernetes technologies. Red Hat helps customers integrate new and existing IT applications, develop cloud-native applications, standardize on our industry-leading operating system, and automate, secure, and manage complex environments. Award-winning support, training, and consulting services make Red Hat a trusted adviser to the Fortune 500. As a strategic partner to cloud providers, system integrators, application vendors, customers, and open source communities, Red Hat can help organizations prepare for the digital future.

Benefits
  • Comprehensive medical, dental, and vision coverage
  • Flexible Spending Account - healthcare and dependent care
  • Health Savings Account - high deductible medical plan
  • Retirement 401(k) with employer match
  • Paid time off and holidays
  • Paid parental leave plans for all new parents
  • Leave benefits including disability, paid family medical leave, and paid military leave
  • Additional benefits including employee stock purchase plan, family planning reimbursement, tuition reimbursement, transportation expense account, employee assistance program, and more!
Note: These benefits are only applicable to full time, permanent associates at Red Hat located in the United States.

Similar Jobs

Senior Site Reliability Engineer (SRE)

Databricks

San Francisco, CA

The Service Reliability Engineer (SRE) will monitor critical project management systems and services to minimize downtime and ensure their availability.

Site Reliability Engineer

JPMorgan Chase Bank, N.A.

Palo Alto, CA

As a Site Reliability Engineer (SRE), you'll help build a meaningful engineering discipline, combining software and systems to develop creative engineering…

Sr. Manager, Site Reliability - Opportunity for Working Remotely Palo Alto, CA

VMware

Palo Alto, CA

Lead the team in automating the setup and configuration of the UEM data centers that host our software through infrastructure as code practices.

DevOps Engineer

NOKIA

Sunnyvale, CA

Experience in software development, operations, and/or site reliability. The DevOps Engineer manages new software version releases based on scrum or agile…

Senior Site Reliability Engineer

Angi

San Francisco, CA

Use service level information to determine reliability on our Telemetry Platform. Experience identifying changes that improve processes from a reliability and…

Site Reliability Engineer (SRE) OR Senior Site Reliability Engineer

Foursquare

San Francisco, CA

Evaluate and implement technologies that improve efficiency, performance and reliability. Drive improvements in capacity, reliability, availability and…

Site Reliability Engineer (SRE)

Intuit

Mountain View, CA

As a member of our engineering team, you will tackle technical problems and implement creative solutions to delight our users in ways they have not imagined.

Senior Site Reliability Engineer

Privacera, Inc.

San Francisco, CA

You must have demonstrated and be capable of an extreme ownership mentality. A successful Sr. SRE at this company must have strong facility coding in Java …

AWS SAP DevOps / SRE Engineer, AWS Professional Services

Amazon Web Services, Inc.

San Francisco, CA

Bachelor’s degree in Computer Science, engineering, software engineering, or related field. 8+ years of experience in development and operations, or related IT,…

Staff Site Reliability Engineer, Security

Okta

San Francisco, CA

Triaging and troubleshooting complex production issues to ensure reliability and performance. Designing, building, running, and monitoring Okta's production…

Senior SRE (Site Reliability Engineer)

Ennuviz

San Francisco, CA

Ennuviz offers Digital Transformation solutions to businesses to Streamline, Optimize Operating costs, and Delivering better Customer Experience and Employee…

Sr. Site Reliability Engineer

Algonomy

San Francisco, CA

This position will help RichRelevance, an Algonomy company, to scale to utilize the petabytes of data retailers bring into our systems, all while increasing…

Sr Site Reliability Engineer-Public Cloud Platform-1479780

JPMorgan Chase Bank, N.A.

Palo Alto, CA

The public cloud team is responsible for engineering and operating the cloud infrastructure and platforms of JPMC ensuring reliability, resiliency, and security…

Principal Site Reliability Engineer (Cloud Delivered Security Services)

Palo Alto Networks

Santa Clara, CA

This includes automation, architecture, performance, metrics, troubleshooting, security, and reliability. Contribute to the success of SRE and DevOps.

Staff Site Reliability Engineer, Network

Okta

San Francisco, CA

Lead a team that designs and build Okta's production infrastructure with a focus on networking and security at scale.

Staff Site Reliability Engineer, General

Okta

San Francisco, CA

Design, build and monitor Okta's global production infrastructure. Respond to production incidents and determine preventive solutions.

Senior Site Reliability Engineer, Developer & Service Infrastructure

TuSimple

San Francisco, CA

Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems.

Director, Chief of Staff, Site Reliability Engineering, Core

Google

Sunnyvale, CA

Bachelor's degree in a technical field or equivalent practical experience. 15 years of experience in operations, strategy, management, or product operations…

Site Reliability/DevOps Engineer - Opportunity for Working Remotely Palo Alto, CA

VMware

Palo Alto, CA

You will be responsible for improving the reliability and resiliency of microservices by enforcing DevOps/SRE best practices across engineering org.

Site Reliability/DevOps Engineer - Opportunity for Working Remotely San Francisco, CA

VMware

San Francisco, CA

You will be responsible for improving the reliability and resiliency of microservices by enforcing DevOps/SRE best practices across engineering org.

Sr DevOps Engineer - Digital Marketing

ServiceNow

Santa Clara, CA

Employ infrastructure as code paradigm to increase automation, scalability, and reliability. Collaborate with marketing and marketing operations team to build…

Site Reliability Engineer, Americas

Canonical - Jobs

Sacramento, CA

Our site reliability engineers bring Python software-engineering skills and rigour to the operations domain. A wide range of engineering disciplines and career…

Site Reliability Engineer, Cloud Native Platform

TikTok

Mountain View, CA

We use Kubernetes to manage on-prem/cloud nodes and build an eco-system around it, including tools for monitoring, alerting, logging, CI/CD, etc.

Site Reliability Engineer, Americas

Canonical - Jobs

San Francisco, CA

Our site reliability engineers bring Python software-engineering skills and rigour to the operations domain. A wide range of engineering disciplines and career…