Back to all jobs

SRE Architect at qode.world

Lead Hybrid Posted about 3 hours ago RemoteFirstJobs Product
Engineer

AI summary: SRE Architect designs and scales reliable cloud-native systems, defines SRE strategy, manages observability and incident response, and mentors engineering teams on reliability best practices.

Description

Job Description: SRE Architect

📍 Location: Austin, TX Hybrid)

🕒 Employment Type: Full-Time

🎯 Experience Level: Architect

Role Overview

We are seeking an experienced Site Reliability Engineer (SRE) Architect to design, build, and scale highly reliable, resilient, and observable systems. This role is ideal for a hands-on architect who can define SRE strategy, influence engineering practices, and partner closely with development, platform, and security teams.

The position requires onsite or hybrid presence in Austin, TX, with collaboration across distributed teams.

Key Responsibilities

Architecture & Reliability

  • Define and own the SRE architecture strategy, including reliability, availability, scalability, and performance standards.
  • Design resilient, fault-tolerant systems for cloud-native and hybrid environments.
  • Establish and govern SLIs, SLOs, and error budgets across platforms and services.
  • Lead capacity planning, resilience testing, and chaos engineering initiatives.

Platform & Cloud Engineering

  • Architect and operate platforms on AWS/GCP/Azure (multi-cloud or hybrid setups).
  • Design and manage Kubernetes-based platforms (EKS/GKE/AKS).
  • Drive Infrastructure as Code (IaC) practices using Terraform, Ansible, or similar tools.
  • Standardize environments, deployment patterns, and runtime configurations.

Operational Excellence

  • Build and maintain observability frameworks using tools such as Prometheus, Grafana, Datadog, ELK, Splunk, or equivalent.
  • Lead incident management, root cause analysis (RCA), and post-incident reviews.
  • Reduce MTTR through automation, tooling, and process improvements.
  • Participate in and improve on-call models, escalation policies, and runbooks.

DevOps & Automation

  • Partner with engineering teams to embed CI/CD best practices.
  • Drive automation across provisioning, deployments, testing, and operations.
  • Improve system reliability by eliminating manual operational toil.

Security & Governance

  • Architect secure platforms aligned with enterprise security standards.
  • Implement best practices for secrets management, access control, compliance, and audits.
  • Collaborate with Security and Compliance teams on governance models.

Leadership & Collaboration

  • Act as a technical mentor and thought leader within SRE and platform teams.
  • Influence engineering culture toward reliability-focused design.
  • Partner with product, application, and infrastructure teams to deliver business outcomes.

Required Qualifications

  • 10+ years of experience in SRE, DevOps, Platform Engineering, or Systems Architecture.
  • Strong experience designing and operating large-scale distributed systems.
  • Deep hands-on expertise with cloud platforms (AWS/GCP/Azure).
  • Advanced experience with Kubernetes and containerized workloads.
  • Strong knowledge of Linux internals, networking, storage, and system performance.
  • Proven experience implementing IaC and configuration management.
  • Proficiency in one or more programming/scripting languages (Python, Go, Bash, etc.).
  • Strong understanding of observability, monitoring, and alerting strategies.
  • Excellent communication and stakeholder management skills.

Preferred Qualifications

  • Experience in multi-cloud or regulated environments.
  • Background supporting high-throughput, high-availability, or data-intensive systems.
  • Experience with Kafka, Spark, or large-scale data platforms.
  • Exposure to fintech, healthcare, enterprise SaaS, or hyperscale platforms.
  • Prior experience as Principal Engineer, Architect, or Lead SRE.

Work Model

  • Hybrid / Onsite role based in Austin, TX
  • Requires regular collaboration with local and global teams

Why Join Us

  • Architect systems at enterprise scale
  • Influence platform and reliability strategy across teams
  • Work with modern cloud-native technologies
  • High-impact role with strong visibility and ownership