Yesterday
Senior Site Reliability Engineer
Cloudbeds
Argentina
GenOps Responsibility Profile
Rationale: This is a clear GenOps role with ownership of production AWS/Kubernetes infrastructure serving millions of hospitality transactions globally. The role combines control plane responsibility (EKS clusters, ArgoCD, Terraform IaC) with incident response ownership and observability systems, meeting all GenOps criteria.
Job Description
What Makes Us Unique
At Cloudbeds, we're not just building software, weâre transforming hospitality. Our intelligently designed platform powers properties across 150 countries, processing billions in bookings annually. From independent properties to hotel groups, we help hoteliers transform operations and uplevel their commercial strategy through a unified platform that integrates with hundreds of partners. And we do it with a completely remote team. Imagine working alongside global innovators to build AI-powered solutions that solve hoteliers' biggest challenges. Since our founding in 2012, we've become the World's Best Hotel PMS Solutions Provider and landed on Deloitte's Technology Fast 500 again in 2024 â but we're just getting started.
As a Sr. Site Reliability Engineer, you'll be the guardian of our platform's reliability and performance, ensuring millions of hospitality transactions flow seamlessly across the globe. You'll architect and implement scalable AWS cloud solutions that keep the most ambitious hotels running 24/7, while fostering a culture of automation, resilience, and continuous improvement across our engineering teams.
Our SRE Team:
We're a bottom-up, collaborative team that thrives on healthy debate and shared ownership of our infrastructure. You'll have endless opportunities to influence architecture decisions while working with cutting-edge cloud technologies at scale. We believe the best solutions come from engineers who are empowered to innovate, experiment, and challenge the status quo.
What You Bring to the Team:
- Design and implement reliable and scalable AWS architecture to meet the needs of the organization.
- Maintain and support highly loaded Kubernetes (EKS) clusters and infrastructure-related components.
- Support the CICD process with ArgoCD and GitOps.
- Automate the platform deployments with Terraform infrastructure-as-code.
- Develop and continuously improve product Observability and Monitoring systems based on the Grafana, Prometheus, DataDog, and Cloudwatch.
- Respond and participate with Incident Management and Root Cause Analysis, ensuring minimal impact on services.
- Optimize system performance and troubleshoot issues as they arise.
Please mention the word **INSIGHTFUL** and tag RMTA4LjE0LjI0My41OQ== when applying to show you read the job post completely (#RMTA4LjE0LjI0My41OQ==). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they're human.