GenOps.jobs

Runtime-Critical Platform & Governance Roles

Yesterday

Site Reliability Engineer

Point72

Bengaluru

GenOps Responsibility Profile

Runtime Ownership ✓ Yes
Control Plane ✓ Yes
Governance / Policy ✓ Yes
Observability / Telemetry ✓ Yes
Incident / Reliability ✓ Yes
Regulated Context ✓ Yes
AI / Model Runtime ✗ No
Primary Domain cloud platform
Production Environment prod
Classification Confidence 95%

Rationale: This is a clear GenOps role with production ownership at a financial services firm (Point72). The SRE owns reliability, incident response, observability infrastructure, CI/CD pipelines, and enforces SLOs/error budgets while partnering with security and platform teams on governance.

Job Description

 JOB TITLE

Site Reliability Engineer

A Career with Point72's Technology Team

As Point72 reimagines the future of investing, our Technology group is constantly improving our company's IT infrastructure, positioning us at the forefront of a rapidly evolving technology landscape. We're a team of experts experimenting, discovering new ways to harness the power of open source solutions, and embracing enterprise agile methodology. We encourage professional development to ensure you bring innovative ideas to our products while satisfying your own intellectual curiosity.

 

What you'll do

- Design and implement automated operational workflows to improve system reliability and reduce manual intervention 

- Build and maintain observability solutions using tools such as Datadog, to deliver metrics, monitoring, alerting, and dashboards 

- Partner with development teams to improve application reliability, deployment safety, and performance through SRE best practices 

- Develop and maintain CI/CD pipelines and deployment automation using Bitbucket/Jenkins, GitHub Actions, and related tooling 

- Engineer scalable solutions for production environments across Linux and Windows systems 

- Automate infrastructure and operational tasks using Python, PowerShell, Bash, or similar scripting languages 

- Support and enhance reliability of database platforms such as SQL Server and MongoDB from an SRE perspective

- Participate in incident response, drive root cause analysis, and implement long‑term reliability improvements 

- Define and enforce SLOs, SLIs, and error budgets in partnership with application teams 

- Collaborate with Networking, Platform, and Security teams to ensure end‑to‑end system reliability 

- Enable self‑service and standardized operational patterns for development teams 

 

 What's required

- Strong hands‑on experience with Linux and Windows operating systems 

- Proven experience building automation and tooling using Python or similar languages 

- Deep understanding of observability and monitoring, preferably with Datadog 

- Experience with CI/CD pipelines and deployment automation (Bitbucket, GitHub Actions, Jenkins, etc.) 

- Operational and performance knowledge of SQL Server and MongoDB 

- Familiarity with cloud platforms (AWS or similar) and hybrid architectures 

- Solid understanding of networking concepts such as DNS, load balancing, and TCP/IP 

- Experience working closely with application development teams in an SRE or DevOps role 

- Experience with Kubernetes, OpenShift, and containerized workloads 

- Knowledge of infrastructure‑as‑code tools (Terraform, CloudFormation, AR

Please mention the word **SUCCEED** and tag RMTA4LjE0LjI0My41OQ== when applying to show you read the job post completely (#RMTA4LjE0LjI0My41OQ==). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they're human.

View & Apply for this role Back to jobs