# Cloud Platform Engineer

**Company**: Capgemini
**Location**: Hyderabad
**Work arrangement**: onsite
**Experience**: senior
**Job type**: full-time
**Salary**: ",   "salaryMin": "",   "salaryMax": "",   "salaryCurrency": "",   "salaryPeriod": "
**Category**: Engineering
**Industry**: Technology
**Wikidata**: https://www.wikidata.org/wiki/Q1034621

**Apply**: https://jobs.workable.com/view/gex59wjtxcW6rAerXXryMK/hybrid-cloud-platform-engineer-in-hyderabad-at-capgemini
**Canonical**: https://yubhub.co/jobs/job_91ed7128-01e

## Description

"" ## Job Description  We are seeking a skilled Cloud Platform Engineer to join our team. As a Cloud Platform Engineer, you will design and implement cloud-native database infrastructure using Terraform /Ansible to provision managed DB instances in multi-clouds (RDS/Azure DB /Cloud SQL) and self-managed clusters.  ## Responsibilities  * Design and implement cloud-native database infrastructure using Terraform /Ansible to provision managed DB instances in multi-clouds (RDS/Azure DB /Cloud SQL) and self-managed clusters * Automate Configuration Management, security hardening, and patching of database instances across all environments. Automate workflows to reduce manual effort and improve reliability * Develop internal tools and scripts (Python/Bash) to enable production support teams to manage their own database instances and environments safely. Develop scripts for routine operational tasks like backups, health checks, etc. * Integrate advanced observability platforms (Dynatrace, CloudWatch) with AIOps tools to establish SLOs and train models for anomaly detection and proactive forecasting of database degradation like predicting slow queries or imminent connection pool exhaustion * Design, deploy, and govern AI-powered agents (using Azure Copilot /AWS Bedrock) to achieve autonomous self-healing capabilities and automated resource management * Implement advanced monitoring (CloudWatch, Dynatrace) for key database metrics (SLIs/SLOs) like latency, throughput, error rates, and connection pools. Develop and train predictive ML models to analyze historical telemetry and forecast potential system outages or performance bottlenecks and configure proactive monitoring and alerting for critical services * Respond to alerts and create self-healing actions based on alerts * Design and implement cross-region/multi-AZ replication, automated failover strategies, and point-in-time recovery (PITR) procedures for mission-critical databases. Disaster recovery planning and DR drills * Execute backup strategies and validate recovery procedures using Rubrik and Perform restores as needed * Work closely with application operations / Production support teams to troubleshoot issues on database layer (performance, locks, schema) and the platform layer (multi-cloud /middleware /network, resource limits) to find the root causes * Lead incident response and root cause analysis (RCA) for database outages, performance degradations, and data integrity issues. Collaborate with DBAs and application teams for root cause analysis * Implement AI tools to perform real-time Root Cause Analysis (RCA), correlate complex event data (logs, metrics) and auto-generate runbooks * Define and automate scaling strategies (read replicas, sharding, auto-scaling) based on predicted load and business growth. Provide input for capacity planning and resource optimization * Implement cost management policies, including rightsizing instances, managing storage tiers, and defining lifecycle rules for backups and snapshots * Proactively analyze query performance, index usage, and database configuration, making and automating changes to optimize throughput and reduce latency. Support DBA teams in performance tuning initiatives * Implement robust secrets management solutions (AWS Secrets Manager, HashiCorp Vault) for database credentials, ensuring applications retrieve secrets securely at runtime * Define and enforce least-privilege access policies (IAM roles, service accounts) for databases * Implement encryption and data masking policies as directed * Manage security and compliance by utilizing AI agents to detect configuration drift and auto-generate compliant updates for IAM, network, and security policies * Apply patches and perform upgrades in coordination with DBA teams. Validate post-upgrade functionality and compliance ""

## Skills

### Required
- Terraform
- Ansible
- Python
- Bash
- Dynatrace
- CloudWatch
- Azure Copilot
- AWS Bedrock
- Rubrik
- AWS Secrets Manager
- HashiCorp Vault
- IAM roles
- service accounts
- encryption
- data masking
- AI agents
- configuration drift
- compliance updates
- patches
- upgrades
