DevOps Engineer
Overview: We are seeking a highly skilled DevOps Engineer with extensive experience in cloud infrastructure, Kubernetes, CI/CD pipelines, and monitoring tools. The ideal candidate should have a solid background in AWS services, Terraform, Prometheus, Grafana, and possess excellent communication skills to collaborate effectively with cross-functional teams. Professionalism and the ability to handle on-call responsibilities during night-time incidents are essential.
Key Responsibilities:
• Design, deploy, and maintain Kubernetes clusters on AWS EKS.
• Implement infrastructure as code using Terraform or Terragrunt for provisioning AWS resources (e.g., RDS, S3).
• Develop and maintain CI/CD pipelines to automate software delivery.
• Set up monitoring and alerting using Prometheus and Grafana for proactive system management.
• Ensure high availability, scalability, and security of production systems.
• Participate in incident response and on-call rotation to resolve system outages during nighttime hours.
• Collaborate with development teams to optimize application performance and infrastructure efficiency.
• Document processes and procedures for maintaining and scaling infrastructure.
Required Skills:
• 5+ years of experience in a DevOps role with a focus on Kubernetes, Helm, AWS (EKS, RDS, S3), Terraform/Terragrunt, Prometheus, and Grafana.
• Proficiency in setting up and maintaining CI/CD pipelines (e.g., Jenkins, GitLab CI).
• Excellent communication skills and the ability to work effectively in a team environment.
• Strong troubleshooting skills and the ability to handle stressful situations calmly and effectively.
• Professionalism demonstrated through previous work experience and achievements.
• Availability for on-call duty to respond to critical incidents during nighttime hours.
• Demonstrated ability to stay in a role for at least 2 years per position.
Nice to Have:
• Experience with NoSQL databases (e.g., Cassandra, MongoDB) and relational databases (preferably PostgreSQL).
• Familiarity with Kafka for streaming data platforms.
• Knowledge of Docker containerization and orchestration.
• Ability to contribute to software development efforts (e.g., scripting, automation).
• Understanding of Site Reliability Engineering (SRE) principles.
• Exposure to Clickhouse for analytics and data warehousing.