We’re building a high‑performance, microservices-based REST API platform and modernizing our stack. You’ll lead the design, automation, and operation of our cloud platform on Google Cloud (primary) with exposure to AWS. You’ll use Kubernetes, Terraform, and robust observability to deliver a secure, scalable, and cost‑efficient platform.
Responsibilities:
- Design, build, and operate secure, highly available Kubernetes clusters (GKE primarily; EKS exposure)
- Defi ne and maintain IaC with Terraform and Helm; establish module standards, remote state, CI quality gates, and policy as code
- Implement and optimize CI/CD for containerized workloads with progressive delivery and reliable rollback
- Own observability: defi ne SLIs/SLOs, build dashboards, create metric/log-based alerts, and add tracing; minimize alert noise and document runbooks
- Lead networking architecture: VPC design, private clusters, NAT, ingress/gateway, NetworkPolicy; introduce service mesh where it adds value
- Champion security-by-default: IAM guardrails, Workload Identity/IRSA, secrets management, image scanning/provenance
- Drive reliability: capacity planning, autoscaling strategies, DR/backup/restore, chaos and failover testing
- Mentor engineers and promote platform best practices across teams
- 5+ years in DevOps/Platform/SRE roles, including 3+ years running Kubernetes in production
- Google Cloud (primary): GKE, Cloud Monitoring and Cloud Logging
- AWS (secondary): EKS fundamentals, RDS, IAM
- Terraform expert: modules, remote state (GCS/S3), environment strategy, CI validation (fmt/validate/tfl int/checkov), policy as code
- Helm for packaging/deployments; container image lifecycle (Artifact Registry/ECR), image scanning
- Observability: Prometheus (incl. Managed Service for Prometheus), Grafana, Cloud Monitoring SLOs/alerts, log-based metrics; basic tracing with OpenTelemetry
- Networking: VPC/VPC peering, subnets/CIDR, routing, NAT, load balancers (L4/L7), DNS (Cloud DNS/Route 53), TLS/mTLS Policy Kubernetes NetworkPolicy, Ingress/Gateway API
- CI/CD: GitHub Actions and/or CircleCI; blue/green and canary strategies; safe rollback or similar;
- Scripting/coding: profi ciency in Bash and either Go or Python
- Security: IAM design (GCP/AWS), Workload Identity/IRSA, RBAC, Secret Manager/Secrets Manager, KMS; least-privilege by default
- Incident operations: on-call experience, SLO burn-rate alerting, runbooks, and postmortems
- Strong problem-solving and troubleshooting skills, especially within Kubernetes environments.
- Excellent communication and collaboration capabilities, with the ability to mentor and provide guidance to junior team members and other departments on best practices in Kubernetes and DevOps.
- Eff ective time management skills with the ability to handle multiple projects simultaneously.
- Leadership qualities and experience in leading projects or teams.
- High adaptability and continuous learning mindset, especially in new Kubernetes features and DevOps tools.
Certifications (Preferred but not required):
- Certifi ed Kubernetes Administrator (CKA)
- Certifi ed Kubernetes Application Developer (CKAD)
- AWS Certifi ed DevOps Engineer
- Google Professional DevOps Engineer
What We Offer:
- Collaborate with top-tier professionals in the field.
- Competitive salary.
- Work in a supportive and innovative environment.
- 100% remote work.
- Proficiency in English at a B2 level or higher will be considered a valuable asset.
Please share your Telegram and we will contact you promtly!