
DevOps / SRE Team Lead (Remote, Latin America)
Remote - US
|Contract
A leading company in the media technology space is seeking a DevOps / SRE Team Lead to join and help lead its North American SRE team. The company provides tailored infrastructure and deployment solutions to major media organizations across the globe, with a strong customer presence in the United States and Canada.
This role is ideal for a seasoned Site Reliability or DevOps engineer who's ready to step into a leadership position. You'll remain hands-on while guiding a small team of engineers, ensuring best practices around automation, monitoring, and uptime for complex cloud-native platforms. The team is responsible for supporting the deployment and reliability of two major in-house platforms, each consisting of ~20 microservice-based components. These platforms are deployed in either hybrid or SaaS configurations, primarily using AWS EKS. All infrastructure is fully automated with Terraform and Ansible.
You'll collaborate closely with internal engineering, support, and customer-facing teams, and be involved in system deployments, observability, platform reliability, and continuous improvement. There is also an expectation to participate in technical onboarding processes and develop a deep understanding of the in-house platforms and their deployment lifecycles.
It's a long-term contract opportunity without an end-date. The engagement is fully remote. The candidate can be located anywhere in Latin America. You'll need to have good English communication skills (at least B2+ or higher) for this role.
Responsibilities:
This role is ideal for a seasoned Site Reliability or DevOps engineer who's ready to step into a leadership position. You'll remain hands-on while guiding a small team of engineers, ensuring best practices around automation, monitoring, and uptime for complex cloud-native platforms. The team is responsible for supporting the deployment and reliability of two major in-house platforms, each consisting of ~20 microservice-based components. These platforms are deployed in either hybrid or SaaS configurations, primarily using AWS EKS. All infrastructure is fully automated with Terraform and Ansible.
You'll collaborate closely with internal engineering, support, and customer-facing teams, and be involved in system deployments, observability, platform reliability, and continuous improvement. There is also an expectation to participate in technical onboarding processes and develop a deep understanding of the in-house platforms and their deployment lifecycles.
It's a long-term contract opportunity without an end-date. The engagement is fully remote. The candidate can be located anywhere in Latin America. You'll need to have good English communication skills (at least B2+ or higher) for this role.
Responsibilities:
- Provide technical leadership to a small team of SRE and DevOps engineers, offering mentorship and direction on operational best practices.
- Oversee the deployment, monitoring, and support of large-scale, microservice-based platforms deployed across hybrid and SaaS environments.
- Manage and scale Kubernetes environments (primarily AWS EKS), including multi-component system orchestration.
- Support platform training and onboarding activities for new team members.
- Collaborate with cross-functional teams, including R&D, customer support, and professional services, to ensure system reliability and uptime.
- Work extensively with automation tools (Terraform, Ansible) to manage infrastructure-as-code for cloud and on-premises environments.
- Assist with AWS FTR (Foundational Technical Review) processes, including improving observability and security using AWS-native tools.
- Respond to incidents, conduct root cause analysis, and implement long-term solutions for system resilience.
- 7+ years of professional experience in infrastructure, DevOps, or Site Reliability Engineering roles.
- Proven track record in leading or mentoring engineering teams or complex technical projects.
- Strong hands-on experience with AWS (EC2, S3, RDS, ALB/NLB, CloudFront, EFS, Elasticsearch, etc.).
- Proficiency in managing Kubernetes environments, particularly AWS EKS and Rancher-based clusters.
- Deep knowledge of infrastructure-as-code using Terraform and configuration management with Ansible.
- Familiarity with CI/CD pipelines and observability tools (e.g., Prometheus/Grafana, Graylog, Percona Monitoring).
- Experience supporting production systems with high-availability and performance requirements.
- Strong troubleshooting skills, including performance tuning and incident management.
- Comfortable working in a remote-first, distributed engineering team.
- Experience with hybrid cloud or on-premises environments.
- Familiarity with complex networking or VPN migration scenarios.
- Experience with enterprise backup and recovery tools (e.g., HYCU for Nutanix).
- French language proficiency is a plus.
- Completed background checks will be required before the start date if you are selected as a winning candidate.
- As a winning candidate, you will be required to disclose your engagement with DevEngine as a primary client on your professional LinkedIn profile.
- While we strive to respond to all applicants, please understand that due to the high volume of applications we receive, providing individual feedback or responses to every candidate may not be feasible. Rest assured that your application will be carefully reviewed and considered. We appreciate your understanding and interest in joining our team.