Talent.com
This job offer is not available in your country.
Site Reliability Engineer III

Site Reliability Engineer III

GuidewireRemote, Ireland
30+ days ago
Job description

ESSENTIAL DUTIES AND RESPONSIBILITIES

  • Take a purist SRE approach to shared multi-tenant infrastructure for a resilient SaaS microservice-based containerized systems in addition to customer-centric application environments
  • Oversee and automate the team’s growing presence in AWS
  • Contribute to core infrastructure systems development with features, bug fixes, reliability improvements, etc
  • Platform reliability engineering of a complex single sign-on SAML / OAuth-based central authentication platform
  • Creatively build and develop tooling to aid in driving 24x7x365 follow-the-sun operations of critical production systems
  • Automate deployment tasks for core product and infrastructure tools and maintain automation infrastructure
  • Create system documentation and training materials to empower and educate our fellow team members
  • Build and maintain observability tooling, metrics, and dashboarding for a global platform product infrastructure
  • Improve our incident management lifecycle to identify, mitigate, and learn from reliability risks and issues
  • Enhance platform observability with helping create a self-healing approach to platform reliability
  • Collaborate with engineering teams, providing product feedback and where necessary contribute code to the product

REQUIRED SKILLS AND EXPERIENCE

  • Education and Work Experience :
  • Bachelor’s Degree in Computer Science or related field.
  • Software engineering and task automation skills with Bash, Python, and / or Go are a must.
  • Familiarity with the Agile software development lifecycle.
  • Deep background with Linux systems and engineering.
  • Highly experienced with engineering and automating on Amazon Web Services (AWS).
  • Experience supporting web applications running on Java / Apache / Tomcat in a live production environment.
  • Prior experience with IaC tools like Terraform / Terragrunt / Terraspace.
  • Prior experience with devops / gitops tools (Git, Bitbucket, Flux CD, Teamcity) for gate promotions.
  • Production-At-Scale support background in a heavily microservice-based world.
  • Hands-on engineering and ops expertise in containerization (Docker, Helm, Kubernetes / EKS, CNI and Ingress networking).
  • Strong understanding of Single-Sign On, SAML, OAuth (Bonus if hands-on experience with Okta).
  • Seasoned expertise around certificate technology and basic concepts of encryption.
  • Experience working with Relational Databases such as Aurora Postgres and / or Oracle RDS.
  • Advanced exposure to application development, web UI (design and development), JSON, application architecture.
  • Experience strongly utilizing observability tools (logging / APM) like Datadog, CloudWatch, and PagerDuty.
  • Familiarity with event store / stream-processing technologies like Kafka or AWS SQS.
  • Understanding of Open Application Model systems such as KubeVela or Crossplane.
  • Personal Qualities and Soft Skills :
  • You greatly prefer writing code than clicking a GUI.
  • You enjoy teaching, being a mentor to others, and working across boundaries.
  • Outstanding troubleshooting skills; ability to think critically and display an aptitude for problem solving.
  • Strong analytical mind with a penchant for process development and enhancement.
  • A highly positive can-do attitude with desire for being a team player.
  • Great communication skills and ability to explain complex technical concepts to a varied audience.
  • Demonstrate strong follow-through, a strong work ethic and consistently keep and meet commitments.
  • Other Requirements :
  • Ability to read, write, and speak English.
  • We provide 24x7 support to our customers, so we expect you to take turns with your teammates being on-call for weekend production emergencies or to provide rotating weekend operational support.
  • Travel – Expect occasional travel (less than 5%) to other Guidewire offices for training and team meetings.
  • Create a job alert for this search

    Reliability Engineer • Remote, Ireland